Affinity Workshop
Tiny Papers Oral Session 3
Krystal Maughan · Thomas F Burns
Halle A 3
Live content is unavailable. Log in and register to view live content
Schedule
-
|
Exploring the Limits of Semantic Image Compression at Micro-bits per Pixel
(
Oral
)
>
link
Traditional methods, such as JPEG, perform image compression by operating on structural information, such as pixel values or frequency content. These methods are effective to bitrates around one bit per pixel (bpp) and higher at standard image sizes. However, to compress further text-based semantic compression directly stores concepts and their relationships using natural language, which has evolved with humans to efficiently represent these salient concepts. These methods can operate at extremely low bitrates by disregarding structural information like location, size, and orientation. In this work, we use GPT-4V and DALL-E3 from OpenAI to explore the quality-compression frontier for image compression and identify the limitations with current technology. We push semantic compression as low as 100 μbpp (up to 10,000× smaller than JPEG) by introducing an iterative reflection process to improve the decoded image. We further hypothesize this 100 μbpp level represents a soft limit on semantic compression at standard image resolutions. |
Bahaa Kotb · Jordan Dotzel · James Dotzel · Mohamed Abdelfattah · Zhiru Zhang 🔗 |
-
|
KFC: Knowledge Reconstruction and Feedback Consolidation Enable Efficient and Effective Continual Generative Learning
(
Oral
)
>
link
To address the issues of catastrophic forgetting in Continual Generative Learning (CGL), dominant methods leverage the generative replay strategy. However, they often suffer from high time complexity and inferior generative sample quality. In this work, we develop an efficient and effective CGL method via Knowledge reconstruction and Feedback Consolidation (KFC). KFC extends the inherent data reconstruction properties of the variational autoencoder framework to historical knowledge reconstruction and re-encodes the current task's reconstructed data to the same posterior distribution as the original data. Experiments showcase that KFC achieves state-of-the-art performances in time complexity, sample quality, and accuracy on various CGL tasks. Code is available in Supplementary Materials. |
Libo Huang · Zhulin An · Yan Zeng · xiang zhi · Yongjun Xu 🔗 |
-
|
Colorful Cutout: Enhancing Image Data Augmentation with Curriculum Learning
(
Oral
)
>
link
Data augmentation is one of the regularization strategies for the training of deep learning models, which enhances generalizability and prevents overfitting, leading to performance improvement. Although researchers have proposed various data augmentation techniques, they often lack consideration for the difficulty of augmented data. Recently, another line of research suggests incorporating the concept of curriculum learning with data augmentation in the field of natural language processing. In this study, we adopt curriculum data augmentation for image data augmentation and propose colorful cutout, which gradually increases the noise and difficulty introduced in the augmented image. Our experimental results highlight the possibility of curriculum data augmentation for image data. We publicly released our source code to improve the reproducibility of our study. |
Juhwan Choi · Youngbin Kim 🔗 |
-
|
Policy Optimization in RLHF: The Impact of Out-of-preference Data
(
Oral
)
>
link
Aligning agents with human preferences is important. This paper examines two types of alignment methods: Direct Preference Optimization (DPO) and Reward-Model-Based Policy Optimization (RMB-PO). A variant of RMB-PO, referred to as RMB-PO+ is also considered. These methods, either explicitly or implicitly, learn a reward model from preference data and differ in the data used for policy optimization to unlock the generalization ability of the reward model. In particular, compared with DPO, RMB-PO additionally uses policy-generated data, and RMB-PO+ further leverages new, preference-free data (i.e., prompts or so-called states). We examine the impact of such out-of-preference data through synthetic contextual bandit problems. Our study suggests that RMB-PO+ outperforms the other two approaches. In particular, even when providing the policy model with a good feature representation, we find that policy optimization with adequate out-of-preference data significantly improves performance by harnessing the reward model's generalization capabilities. We present an analysis based on stochastic approximation and relate our results with other research, including imitation learning and reinforcement learning. |
Ziniu Li · Tian Xu · Yang Yu 🔗 |
-
|
Dissecting Zero-Shot Visual Reasoning Capabilities in Vision and Language Models
(
Oral
)
>
link
Vision-language models (VLMs) have shown impressive zero- and few-shot performance on real-world visual question answering (VQA) benchmarks, alluding to their capabilities as visual reasoning engines. However, existing works (typically) use benchmarks that conflate “pure” visual reasoning with world knowledge, and also have questions that involve a limited number of reasoning steps. Thus, it remains unclear whether a VLM’s apparent visual reasoning performance is due to its world knowledge, or due to actual visual reasoning capabilities. To clarify this ambiguity, we systematically benchmark and dissect the zero-shot visual reasoning capabilities of VLMs through synthetic datasets that require minimal world knowledge, and allow for analysis over a broad range of reasoning steps. We specifically focus on evaluating the impact of conveying scene information as either visual embeddings or purely textual scene descriptions to the underlying large language model (LLM) of the VLM. We notably find that the underlying LLMs, when provided textual scene descriptions, consistently perform significantly better compared to being provided visual embeddings. Our work comprehensively identifies limitations of VLMs for compositional visual reasoning, and highlights the important role that LLMs can play in scene understanding and visual reasoning. |
Aishik Nagar · Shantanu Jaiswal · Cheston Tan 🔗 |
-
|
Analog In-Memory Computing with Uncertainty Quantification for Efficient Edge-based Medical Imaging Segmentation
(
Oral
)
>
link
This work investigates the role of the emerging Analog In-memory computing (AIMC) paradigm in enabling Medical AI analysis and improving the certainty of these models at the edge. It contrasts AIMC's efficiency with traditional digital computing's limitations in power, speed, and scalability. Our comprehensive evaluation focuses on brain tumor analysis, spleen segmentation, and nuclei detection. The study highlights the superior robustness of isotropic architectures, which exhibit a minimal accuracy drop (0.04) in analog-aware training, compared to significant drops (up to 0.15) in pyramidal structures. Additionally, the paper emphasizes IMC's effective data pipelining, reducing latency and increasing throughput as well as the exploitation of inherent noise within AIMC, strategically harnessed to augment model certainty. |
Imane Hamzaoui · Hadjer Benmeziane · Zayneb Cherif · Kaoutar El Maghraoui 🔗 |
-
|
Can LLMs Learn a New Language on the Fly? A Case Study on Zhuang
(
Oral
)
>
link
Existing large language models still fail to support many low-resource languages. Especially for the extremely low-resource ones, there is hardly any training data to effectively update the model parameters. We thus investigate whether LLMs can learn a new language on the fly through in-context learning prompting. To study this question, we collect a research suite for Zhuang, a language supported by no LLMs currently. We study the performance of various LLMs on the Zhuang-Chinese translation task and find out the great potential of this learning paradigm. |
Chen Zhang · Mingxu Tao · Quzhe Huang · Zhibin Chen · Yansong Feng 🔗 |
-
|
Collapse of Self-trained Language Models
(
Oral
)
>
link
In various fields of knowledge creation, including science, new ideas often build on pre-existing information. In this work, we explore this concept within the context of language models. Specifically, we explore the potential of self-training models on their own outputs, akin to how humans learn and build on their previous thoughts and actions. While this approach is intuitively appealing, our research reveals its practical limitations. We find that extended self-training of the GPT-2 model leads to a significant degradation in performance, resulting in repetitive and collapsed token output. |
David Herel 🔗 |