Fri 7:30 a.m. - 9:30 a.m.

L-TUNING: SYNCHRONIZED LABEL TUNING FOR PROMPT AND PREFIX IN LLMS ( Poster #257 ) > link

Poster Location: Halle B #257

Efficiently fine-tuning Large Language Models (LLMs) for specific tasks presents a considerable challenge in natural language processing. Traditional methods, like prompt or prefix tuning, typically rely on arbitrary tokens for training, leading to prolonged training times and generalized token use across various class labels. To address these issues, this paper introduces L-Tuning, an efficient fine-tuning approach designed for classification tasks within the Natural Language Inference (NLI) framework. Diverging from conventional methods, L-Tuning focuses on the fine-tuning of label tokens processed through a pre-trained LLM, thereby harnessing its pre-existing semantic knowledge. This technique not only improves the fine-tuning accuracy and efficiency but also facilitates the generation of distinct label embeddings for each class, enhancing the model's training nuance. Our experimental results indicate a significant improvement in training efficiency and classification accuracy with L-Tuning compared to traditional approaches, marking a promising advancement in fine-tuning LLMs for complex language tasks.

Link

Md Kowsher · Md. Shohanur Islam Sobuj · Asif Mahmud · Nusrat Prottasha · Prakash Bhat 🔗

Fri 7:30 a.m. - 9:30 a.m.

Network Inversion of Binarised Neural Nets ( Poster #258 ) > link

Poster Location: Halle B #258

While the deployment of neural networks, yielding impressive results, becomes more prevalent in various applications, their interpretability and understanding remain a critical challenge. Network inversion, a technique that aims to reconstruct the input space from the model’s learned internal representations, plays a pivotal role in unraveling the black-box nature of input to output mappings in neural networks. In safety-critical scenarios, where model outputs may influence pivotal decisions, the integrity of the corresponding input space is paramount, necessitating the elimination of any extraneous ”garbage” to ensure the trustworthiness of the network. Binarised Neural Networks (BNNs), characterized by binary weights and activations, offer computational efficiency and reduced memory requirements, making them suitable for resource-constrained environments. This paper introduces a novel approach to invert a trained BNN by encoding it into a CNF formula that captures the network’s structure, allowing for both inference and inversion.

Link

Pirzada Suhail 🔗

Fri 7:30 a.m. - 9:30 a.m.

G-PECNet: Towards a Generalizable Pedestrian Trajectory Prediction System ( Poster #259 ) > link

Poster Location: Halle B #259

Navigating dynamic physical environments without obstructing or damaging human assets is of quintessential importance for social robots. In this work, we solve autonomous drone navigation's sub-problem of predicting out-of-domain human and agent trajectories using a deep generative model. Our method: General-PECNet or G-PECNet observes an improvement of $9.5$% on the Final Displacement Error (FDE) on 2020's benchmark: PECNet through a combination of architectural improvements inspired by periodic activation functions and synthetic trajectory or data augmentations using hidden markov modeling and reinforcement learning based agents. Additionally, we propose a simple geometry-inspired loss and evaluation metric for trajectory non-linearity analysis. Code available at [Anonymous-repository](https://github.com/ANonyMouxe/GPECNet)

Link

Aryan Garg · Renu Rameshan 🔗

Fri 7:30 a.m. - 9:30 a.m.

Hallucination Benchmark in Medical Visual Question Answering ( Poster #260 ) > link

Poster Location: Halle B #260

The recent success of large language and vision models on vision question answering (VQA), particularly their applications in medicine (Med-VQA), has shown a great potential of realizing effective visual assistants for healthcare. However, these models are not extensively tested on the hallucination phenomenon in clinical settings. Here, we created a hallucination benchmark of medical images paired with question-answer sets and conducted a comprehensive evaluation of the state-of-the-art models. The study provides an in-depth analysis of current models' limitations and reveals the effectiveness of various prompting strategies.

Link

Jinge Wu · Yunsoo Kim · Honghan Wu 🔗

Fri 7:30 a.m. - 9:30 a.m.

Geometric Implications of Classification on Reducing Open Space Risk ( Poster #261 ) > link

Poster Location: Halle B #261

To reduce open space risk of hypotheses, we reexamine the 'simplest' hypothesis class, binary linear classifiers, geometrically.Providing a generalized formulation,we establish a surprising fact: linear classifiers can have arbitrarily high VC dimension, stemming from increasing the number of partitions in input space.Hence, linear classifiers with multiple margins are more expressive than single-margin classifiers.Despite a higher VC dimension, such classifiers have less open space risk than halfspace separators.These geometric insights are useful to detect unseen classes, while probabilistic modeling of risk minimization helps with seen classes. In supervised anomaly detection, we show that a classifier that combines a probabilistic and geometric lens can detect both seen and unseen anomalies well.

Link

Matthew Lau · Leyan Pan · Stefan Davidov · Athanasios Meliopoulos · Wenke Lee 🔗

Fri 7:30 a.m. - 9:30 a.m.

Neural Controlled Differential Equations with Quantum Hidden Evolutions ( Poster #262 ) > link

Poster Location: Halle B #262

We introduce a class of neural controlled differential equation inspired by quantum mechanics. Neural quantum controlled differential equations (NQDEs) model the dynamics by analogue of the Schrodinger equation.Specifically, the hidden state represents the wave function, and its collapse leads to an interpretation of the classification probability. We implement and compare the results of four variants of NQDEs on a toy spiral classification problem.

Link

Lingyi Yang · Zhen Shao 🔗

Fri 7:30 a.m. - 9:30 a.m.

Exploring the Limits of Semantic Image Compression at Micro-bits per Pixel ( Poster #263 ) > link

Poster Location: Halle B #263

Traditional methods, such as JPEG, perform image compression by operating on structural information, such as pixel values or frequency content. These methods are effective to bitrates around one bit per pixel (bpp) and higher at standard image sizes. However, to compress further text-based semantic compression directly stores concepts and their relationships using natural language, which has evolved with humans to efficiently represent these salient concepts. These methods can operate at extremely low bitrates by disregarding structural information like location, size, and orientation. In this work, we use GPT-4V and DALL-E3 from OpenAI to explore the quality-compression frontier for image compression and identify the limitations with current technology. We push semantic compression as low as 100 μbpp (up to 10,000× smaller than JPEG) by introducing an iterative reflection process to improve the decoded image. We further hypothesize this 100 μbpp level represents a soft limit on semantic compression at standard image resolutions.

Link

Bahaa Kotb · Jordan Dotzel · James Dotzel · Mohamed Abdelfattah · Zhiru Zhang 🔗

Fri 7:30 a.m. - 9:30 a.m.

Density-Preserving Heterogeneous Graph Sparsification for Representation Learning ( Poster #264 ) > link

Poster Location: Halle B #264

Graph sparsification is the task of compressing a graph with fewer edges or nodes while preserving its essential structural characteristics. It has been used in machine learning to significantly improve the computational efficiency over homogeneous graphs. In heterogeneous graphs with diverse types of nodes and edges, however, sparsification has not been extensively explored. This work develops sparsification methods that can preserve edge density across different edge types and/or edge importance in terms of eigenvector centrality, improving over existing methods. The methods have been tested on real-world networks, and the results indicate great improvements in the computational efficiency and memory cost.

Link

Srilekha Geda · Chunjiang Zhu 🔗

Fri 7:30 a.m. - 9:30 a.m.

Knowledge Distillation Through Time For Future Event Prediction ( Poster #265 ) > link

Poster Location: Halle B #265

Is it possible to learn from the future? Here, we introduce knowledge distillation through time (KDTT). In traditional knowledge distillation (KD), a reliable teacher model is used to train an error-prone student model. The difference between the teacher and student is typically model capacity, where the teacher network is larger in architecture. In our KDTT framework, the teacher and student models differ in their assigned tasks: the teacher model is tasked with detecting events in sequential data, a relatively simple task relative to the student model, which is challenged with forecasting said events in the future. Through KDTT, the student can use the ‘future’ logits from a teacher model to extract a temporal representation of uncertainty. We show the efficacy of KDTT on seizure prediction models, where the student forecaster achieves over a 20% average increase in the area under the curve of the receiver operating characteristic (AUC-ROC).

Link

Skye Gunasekaran · Jason Eshraghian · Ruomin Zhu · Zdenka Kuncic 🔗

Fri 7:30 a.m. - 9:30 a.m.

PERFORMANCE ANALYSIS OF A QUANTUM-CLASSICAL HYBRID REINFORCEMENT LEARNING APPROACH ( Poster #266 ) > link

Poster Location: Halle B #266

Quantum Machine Learning (QML) is a nascent field of technology that is yet to be fully explored. While previous QML implementations have demonstrated performance efficiency gains over classical benchmarks, it has not been studied in detail whether shallow unentangled quantum circuits can provide the same benefits to reinforcement learning algorithms. Towards this goal, we present a shallow Deep Q-Network (DQN) hybrid quantum-classical Variational Quantum Circuit (VQC) model in the Cartpole-v0 environment that provides an increase in training stability and average reward for any given training run with a simpler unentangled quantum circuit than what is proposed in prior literature.

Link

Evan Mitchell · Biswajit Basu · Pabitra Mitra 🔗

Fri 7:30 a.m. - 9:30 a.m.

Collapse of Self-trained Language Models ( poster ) > link

Poster Location: #268

In various fields of knowledge creation, including science, new ideas often build on pre-existing information. In this work, we explore this concept within the context of language models. Specifically, we explore the potential of self-training models on their own outputs, akin to how humans learn and build on their previous thoughts and actions. While this approach is intuitively appealing, our research reveals its practical limitations. We find that extended self-training of the GPT-2 model leads to a significant degradation in performance, resulting in repetitive and collapsed token output.

Link

David Herel 🔗

Fri 7:30 a.m. - 9:30 a.m.

Training Mixture-of-Experts: A Focus on Expert-Token Matching ( poster ) > link

Poster Location: #269

Recent advancements in sparse Mixture-of-Experts (MoE) models, particularly in the Vision MoE (VMoE) framework, have demonstrated promising results in enhancing vision task performance. However, a key challenge persists in optimally routing tokens (such as image patches) to the right experts, without incurring excessive computational costs. Addressing this, we apply the regularized optimal transport, which relies on the Sinkhorn algorithm to the Vision MoE (VMoE) framework, aiming at improving the token-expert matching process. The resulting model, Sinkhorn-VMoE (SVMoE), represents a meaningful step in optimizing efficiency and effectiveness of sparsely-gated MoE models.

Link

Masoumeh Zareapoor 🔗

-

A Bi-Objective ε-Constrained Framework for Quality-Cost Optimization in Language Model Ensembles ( Poster #267 ) > link

Poster Location: #267

We propose an ensembling framework that uses diverse open-sourced Large Language Models (LLMs) to achieve high response quality while maintaining cost efficiency. We formulate a bi-objective optimization problem to represent the quality-cost tradeoff and then introduce an additional budget constraint that reduces the problem to a straightforward 0/1 knapsack problem. We empirically demonstrate that our framework outperforms the existing ensembling approaches in response quality while significantly reducing costs.

Link

Aditya Singh · Aditi Singla · Kanishk Kukreja 🔗

-

3D SHAPE COMPLETION VIA SPARSE IRREGULAR REPRESENTATION ( poster ) > link

Poster Location: #270

The task of 3D shape completion involves completing missing regions of an object from partial observation. The current methods accomplish this task by modeling latent completion distributions based on an autoregressive model. However, this approach often struggles with geometric details, as it represents 3D shapes with variable latent sequences, leading to gaps (local missing) in the completed shape. In this paper, we introduce a multiple 3D shape completion method using a transformer-based autoregressive model and a fixed-length sparse irregular latent sequence. Experiments demonstrate that our method outperforms state-of-the-art methods in terms of both quality and fidelity.

Link

Jiahui Li 🔗

Main Navigation

Affinity Posters

Tiny Papers Poster Session 8

Krystal Maughan · Thomas F Burns

Halle B

Schedule