Affinity Workshop
Tiny Papers Oral Session 4
Krystal Maughan · Thomas F Burns
Halle A 3
Live content is unavailable. Log in and register to view live content
Schedule
-
|
G-PECNet: Towards a Generalizable Pedestrian Trajectory Prediction System
(
Oral
)
>
link
Navigating dynamic physical environments without obstructing or damaging human assets is of quintessential importance for social robots. In this work, we solve autonomous drone navigation's sub-problem of predicting out-of-domain human and agent trajectories using a deep generative model. Our method: General-PECNet or G-PECNet observes an improvement of $9.5$% on the Final Displacement Error (FDE) on 2020's benchmark: PECNet through a combination of architectural improvements inspired by periodic activation functions and synthetic trajectory or data augmentations using hidden markov modeling and reinforcement learning based agents. Additionally, we propose a simple geometry-inspired loss and evaluation metric for trajectory non-linearity analysis. Code available at [Anonymous-repository](https://github.com/ANonyMouxe/GPECNet)
|
Aryan Garg · Renu Rameshan 🔗 |
-
|
Hallucination Benchmark in Medical Visual Question Answering
(
Oral
)
>
link
The recent success of large language and vision models on vision question answering (VQA), particularly their applications in medicine (Med-VQA), has shown a great potential of realizing effective visual assistants for healthcare. However, these models are not extensively tested on the hallucination phenomenon in clinical settings. Here, we created a hallucination benchmark of medical images paired with question-answer sets and conducted a comprehensive evaluation of the state-of-the-art models. The study provides an in-depth analysis of current models' limitations and reveals the effectiveness of various prompting strategies. |
Jinge Wu · Yunsoo Kim · Honghan Wu 🔗 |
-
|
Enhancing Drug-Drug Interaction Prediction with Context-Aware Architecture
(
Oral
)
>
link
In the field of disease treatment, the simultaneous use of multiple medications can lead to unforeseen adverse reactions, compromising patient safety and therapeutic efficacy. Consequently, predicting drug-drug interactions (DDIs) has emerged as a pivotal research focus to improve disease treatment. While recent advancements have been made in deep learning models for predicting drug pair relations, the nuanced consideration of individual or cellular conditions as influential contextual factors in DDIs is notably lacking. In this study, leveraging existing models, we introduce a methodology to predict DDIs through a context-aware architecture. The evident performance improvement compared to established methodologies underscores the crucial role of the context-aware mechanism in addressing context-conditional DDIs. Furthermore, we perform a systematic ablation analysis to assess the impact of model elements. Simultaneously, we also investigate the potential of incorporating pre-trained molecular representation learning models in this domain. |
Yijingxiu Lu · Yinhua Piao · Sun Kim 🔗 |
-
|
A Shared Encoder for Multi-Source Hyperspectral Images
(
Oral
)
>
link
Multi-source hyperspectral images(HSIs) which captured from diverse sensors commonly possess varying bands.When employing deep learning techniques for their processing, individual models are necessitated for each source due to the disparate dimensions.To tackle this problem, we propose a shared encoder to project all HSIs into a unified feature space.It establishes a general framework for the representation of multi-source HSIs, providing foundational conditions for the development of a universal HSI analysis model. |
Weili Kong · Baisen Liu · Xiaojun Bi · Jiaming Pei 🔗 |
-
|
Nonlinear model reduction for operator learning
(
Oral
)
>
link
Operator learning provides methods to approximate mappings between infinite-dimensional function spaces. Deep operator networks (DeepONets) are a notable architecture in this field. Recently, an extension of DeepONet based on model reduction and neural networks, proper orthogonal decomposition (POD)-DeepONet, has been able to outperform other architectures in terms of accuracy for several benchmark tests. We extend this idea towards nonlinear model order reduction by proposing an efficient framework that combines neural networks with kernel principal component analysis (KPCA) for operator learning. We conduct experiments on three test cases, including the Navier–Stokes equation. Our results demonstrate the superior performance of KPCA-DeepONet over POD-DeepONet. |
Hamidreza Eivazi · Stefan Wittek · Andreas Rausch 🔗 |
-
|
VoltaVision: A Transfer Learning model for electronic component classification
(
Oral
)
>
link
In this paper, we analyze the effectiveness of transfer learning on classifying electronic components. Transfer learning reuses pre-trained models to save time and resources in building a robust classifier rather than learning from scratch. Our work introduces a lightweight CNN, coined as VoltaVision, and compares its performance against more complex models. We test the hypothesis that transferring knowledge from a similar task to our target domain yields better results than state-of-the-art models trained on general datasets. Our dataset and code for this work are available at https://anonymous.4open.science/r/VoltaVision-E4A5. |
Anas Mohammad Ishfaqul Muktadir Osmani · Taimur Rahman · Salekul Islam 🔗 |
-
|
Evaluating Groups of Features via Consistency, Contiguity, and Stability
(
Oral
)
>
link
Feature attributions explain model predictions by assigning importance scores to input features. In high-dimensional data such as images, these scores are often assigned to groups of features at a time. There are a variety of strategies for creating these groups, ranging from simple patches to deep-learning-based segmentation algorithms. What makes certain groups better than others for explanations? We formally define three key criteria for interpretable groups of features: consistency, contiguity, and stability. Surprisingly, we find that patch-based groups outperform groups created via modern segmentation tools. |
Chaehyeon Kim · Weiqiu You · Shreya Havaldar · Eric Wong 🔗 |
-
|
Training Mixture-of-Experts: A Focus on Expert-Token Matching
(
oral
)
>
link
Recent advancements in sparse Mixture-of-Experts (MoE) models, particularly in the Vision MoE (VMoE) framework, have demonstrated promising results in enhancing vision task performance. However, a key challenge persists in optimally routing tokens (such as image patches) to the right experts, without incurring excessive computational costs. Addressing this, we apply the regularized optimal transport, which relies on the Sinkhorn algorithm to the Vision MoE (VMoE) framework, aiming at improving the token-expert matching process. The resulting model, Sinkhorn-VMoE (SVMoE), represents a meaningful step in optimizing efficiency and effectiveness of sparsely-gated MoE models. |
Masoumeh Zareapoor 🔗 |