ICLR 2023 Schedule

Filter Events

SUN 30 APR

10 p.m.

Registration / Check-in

(ends 10:30 AM)

11:15 p.m.

Remarks:

Opening Remarks

(ends 11:30 PM)

11:30 p.m.

Invited Talk:

Entanglements, Exploring Artificial Biodiversity

Sofia Crespo

(ends 12:30 AM)

MON 1 MAY

12:30 a.m.

Coffee Break

1 a.m.

Oral 1 Track 4: Social Aspects of Machine Learning [1:00-2:30]

Orals 1:00-2:20

[1:00] Quantifying Memorization Across Neural Language Models

[1:10] Human-Guided Fair Classification for Natural Language Processing

[1:20] Is Adversarial Training Really a Silver Bullet for Mitigating Data Poisoning?

[1:30] Is the Performance of My Deep Network Too Good to Be True? A Direct Approach to Estimating the Bayes Error in Binary Classification

[1:40] UNICORN: A Unified Backdoor Trigger Inversion Framework

[1:50] Canary in a Coalmine: Better Membership Inference with Ensembled Adversarial Queries

[2:00] Learning to Estimate Shapley Values with Vision Transformers

[2:10] Provable Defense Against Geometric Transformations

(ends 2:30 AM)

Oral 1 Track 2: Machine Learning for Sciences [1:00-2:30]

Orals 1:00-2:10

[1:00] Phase2vec: dynamical systems embedding with a physics-informed convolutional network

[1:10] Evolve Smoothly, Fit Consistently: Learning Smooth Latent Dynamics For Advection-Dominated Systems

[1:20] Compressing multidimensional weather and climate data into neural networks

[1:30] D4FT: A Deep Learning Approach to Kohn-Sham Density Functional Theory

[1:40] Conditional Antibody Design as 3D Equivariant Graph Translation

[1:50] Equiformer: Equivariant Graph Attention Transformer for 3D Atomistic Graphs

[2:00] CROM: Continuous Reduced-Order Modeling of PDEs Using Implicit Neural Representations

(ends 2:30 AM)

Oral 1 Track 5: Reinforcement Learning [1:00-2:30]

Orals 1:00-2:10

[1:00] DEP-RL: Embodied Exploration for Reinforcement Learning in Overactuated and Musculoskeletal Systems

[1:10] The In-Sample Softmax for Offline Reinforcement Learning

[1:20] Emergence of Maps in the Memories of Blind Navigation Agents

[1:30] Does Zero-Shot Reinforcement Learning Exist?

[1:40] Learning Soft Constraints From Constrained Expert Demonstrations

[1:50] Offline Q-learning on Diverse Multi-Task Data Both Scales And Generalizes

[2:00] VIP: Towards Universal Visual Reward and Representation via Value-Implicit Pre-Training

(ends 2:30 AM)

Oral 1 Track 1: Deep Learning and representational learning I [1:00-2:30]

Orals 1:10-2:00

[1:10] Token Merging: Your ViT But Faster

[1:20] TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second

[1:30] Learning Group Importance using the Differentiable Hypergeometric Distribution

[1:40] Neural Networks and the Chomsky Hierarchy

[1:50] Learning on Large-scale Text-attributed Graphs via Variational Inference

(ends 2:30 AM)

Oral 1 Track 3: Neuroscience and Cognitive Science & General Machine Learning [1:00-2:30]

Orals 1:00-1:50

[1:00] A probabilistic framework for task-aligned intra- and inter-area neural manifold estimation

[1:10] Neuroevolution is a Competitive Alternative to Reinforcement Learning for Skill Discovery

[1:20] Disentanglement with Biological Constraints: A Theory of Functional Cell Types

[1:30] Hebbian Deep Learning Without Feedback

[1:40] Domain Generalization via Heckman-type Selection Models

(ends 2:30 AM)

Oral 1 Track 6: Deep Learning and representational learning II [1:00-2:30]

Orals 1:00-2:10

[1:00] Sparsity May Cry: Let Us Fail (Current) Sparse Neural Networks Together!

[1:10] Fisher-Legendre (FishLeg) optimization of deep neural networks

[1:20] Modeling the Data-Generating Process is Necessary for Out-of-Distribution Generalization

[1:30] Meta-prediction Model for Distillation-Aware NAS on Unseen Datasets

[1:40] NeRN: Learning Neural Representations for Neural Networks

[1:50] Divide to Adapt: Mitigating Confirmation Bias for Domain Adaptation of Black-Box Predictors

[2:00] Continual Unsupervised Disentangling of Self-Organizing Representations

(ends 2:30 AM)

2:30 a.m.

Poster Session 1 [2:30-4:30]

Posters 2:30-4:30

GOGGLE: Generative Modelling for Tabular Data by Learning Relational Structure

MaskViT: Masked Visual Pre-Training for Video Prediction

DiffuSeq: Sequence to Sequence Text Generation with Diffusion Models

Understanding Edge-of-Stability Training Dynamics with a Minimalist Example

Pushing the Accuracy-Group Robustness Frontier with Introspective Self-play

A Neural Mean Embedding Approach for Back-door and Front-door Adjustment

Understanding DDPM Latent Codes Through Optimal Transport

How to prepare your task head for finetuning

GLM-130B: An Open Bilingual Pre-trained Model

An efficient encoder-decoder architecture with top-down attention for speech separation

Characterizing intrinsic compositionality in transformers with Tree Projections

Pushing the Limits of Fewshot Anomaly Detection in Industry Vision: Graphcore

CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers

3D Segmenter: 3D Transformer based Semantic Segmentation via 2D Panoramic Distillation

Light Sampling Field and BRDF Representation for Physically-based Neural Rendering

Generating Sequences by Learning to Self-Correct

Rethinking skip connection model as a learnable Markov chain

Modeling Multimodal Aleatoric Uncertainty in Segmentation with Mixture of Stochastic Experts

Masked Vision and Language Modeling for Multi-modal Representation Learning

Explaining Temporal Graph Models through an Explorer-Navigator Framework

Trainability Preserving Neural Pruning

Sparsity May Cry: Let Us Fail (Current) Sparse Neural Networks Together!

Neural Networks and the Chomsky Hierarchy

Mega: Moving Average Equipped Gated Attention

Fisher-Legendre (FishLeg) optimization of deep neural networks

Meta-prediction Model for Distillation-Aware NAS on Unseen Datasets

Can discrete information extraction prompts generalize across language models?

Token Merging: Your ViT But Faster

How Informative is the Approximation Error from Tensor Decomposition for Neural Network Compression?

Broken Neural Scaling Laws

Softened Symbol Grounding for Neuro-symbolic Systems

Continuous-time identification of dynamic state-space models by deep subspace encoding

TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second

A VAE for Transformers with Nonparametric Variational Information Bottleneck

Cross-Layer Retrospective Retrieving via Layer Attention

Avoiding spurious correlations via logit correction

Mitigating Dataset Bias by Using Per-Sample Gradient

Test-Time Adaptation via Self-Training with Nearest Neighbor Information

LPT: Long-tailed Prompt Tuning for Image Classification

Modeling the Data-Generating Process is Necessary for Out-of-Distribution Generalization

Divide to Adapt: Mitigating Confirmation Bias for Domain Adaptation of Black-Box Predictors

Continual Unsupervised Disentangling of Self-Organizing Representations

Editing models with task arithmetic

Learning Group Importance using the Differentiable Hypergeometric Distribution

TaskPrompter: Spatial-Channel Multi-Task Prompting for Dense Scene Understanding

Towards Better Selective Classification

Structure by Architecture: Structured Representations without Regularization

Learning on Large-scale Text-attributed Graphs via Variational Inference

Deep Declarative Dynamic Time Warping for End-to-End Learning of Alignment Paths

Interpretable Debiasing of Vectorized Language Representations with Iterative Orthogonalization

NeRN: Learning Neural Representations for Neural Networks

Learning to Induce Causal Structure

On the Soft-Subnetwork for Few-Shot Class Incremental Learning

Learning to reason over visual objects

QAID: Question Answering Inspired Few-shot Intent Detection

Contrastive Meta-Learning for Partially Observable Few-Shot Learning

Learning topology-preserving data representations

DAG Matters! GFlowNets Enhanced Explainer for Graph Neural Networks

Distributed Extra-gradient with Optimal Complexity and Communication Guarantees

Unsupervised Manifold Alignment with Joint Multidimensional Scaling

Deconstructing Distributions: A Pointwise Framework of Learning

Neural Agents Struggle to Take Turns in Bidirectional Emergent Communication

PandA: Unsupervised Learning of Parts and Appearances in the Feature Maps of GANs

Diffusion-based Image Translation using disentangled style and content representation

Finding the Global Semantic Representation in GAN through Fréchet Mean

Domain Generalization via Heckman-type Selection Models

Interaction-Based Disentanglement of Entities for Object-Centric World Models

Matching receptor to odorant with protein language and graph neural networks

Equiformer: Equivariant Graph Attention Transformer for 3D Atomistic Graphs

Compressing multidimensional weather and climate data into neural networks

Conditional Antibody Design as 3D Equivariant Graph Translation

Phase2vec: dynamical systems embedding with a physics-informed convolutional network

Protein Sequence and Structure Co-Design with Equivariant Translation

CROM: Continuous Reduced-Order Modeling of PDEs Using Implicit Neural Representations

Interpretable Geometric Deep Learning via Learnable Randomness Injection

Learning Cut Selection for Mixed-Integer Linear Programming via Hierarchical Sequence Model

D4FT: A Deep Learning Approach to Kohn-Sham Density Functional Theory

Understanding Neural Coding on Latent Manifolds by Sharing Features and Dividing Ensembles

A probabilistic framework for task-aligned intra- and inter-area neural manifold estimation

How gradient estimator variance and bias impact learning in neural networks

Hebbian Deep Learning Without Feedback

Disentanglement with Biological Constraints: A Theory of Functional Cell Types

Multi-objective optimization via equivariant deep hypervolume approximation

Momentum Stiefel Optimizer, with Applications to Suitably-Orthogonal Attention, and Optimal Transport

Why (and When) does Local SGD Generalize Better than SGD?

EPISODE: Episodic Gradient Clipping with Periodic Resampled Corrections for Federated Learning with Heterogeneous Data

Denoising Diffusion Samplers

Weighted Clock Logic Point Process

Causal Balancing for Domain Generalization

Dynamic Update-to-Data Ratio: Minimizing World Model Overfitting

Q-Pensieve: Boosting Sample Efficiency of Multi-Objective RL Through Memory Sharing of Q-Snapshots

A Control-Centric Benchmark for Video Prediction

Preference Transformer: Modeling Human Preferences using Transformers for RL

Emergence of Maps in the Memories of Blind Navigation Agents

Learning Zero-Shot Cooperation with Humans, Assuming Humans Are Biased

Offline Q-learning on Diverse Multi-Task Data Both Scales And Generalizes

Learning Achievement Structure for Structured Exploration in Domains with Sparse Reward

Learning Soft Constraints From Constrained Expert Demonstrations

LS-IQ: Implicit Reward Regularization for Inverse Reinforcement Learning

MoDem: Accelerating Visual Model-Based Reinforcement Learning with Demonstrations

RPM: Generalizable Multi-Agent Policies for Multi-Agent Reinforcement Learning

The In-Sample Softmax for Offline Reinforcement Learning

Does Zero-Shot Reinforcement Learning Exist?

Scaling Pareto-Efficient Decision Making via Offline Multi-Objective RL

DEP-RL: Embodied Exploration for Reinforcement Learning in Overactuated and Musculoskeletal Systems

Safe Reinforcement Learning From Pixels Using a Stochastic Latent Representation

Performance Bounds for Model and Policy Transfer in Hidden-parameter MDPs

VIP: Towards Universal Visual Reward and Representation via Value-Implicit Pre-Training

Neuroevolution is a Competitive Alternative to Reinforcement Learning for Skill Discovery

Wasserstein Auto-encoded MDPs: Formal Verification of Efficiently Distilled RL Policies with Many-sided Guarantees

Quantifying Memorization Across Neural Language Models

Fooling SHAP with Stealthily Biased Sampling

Valid P-Value for Deep Learning-driven Salient Region

UNICORN: A Unified Backdoor Trigger Inversion Framework

Human-Guided Fair Classification for Natural Language Processing

Canary in a Coalmine: Better Membership Inference with Ensembled Adversarial Queries

Sound Randomized Smoothing in Floating-Point Arithmetic

Provable Defense Against Geometric Transformations

Is the Performance of My Deep Network Too Good to Be True? A Direct Approach to Estimating the Bayes Error in Binary Classification

Is Adversarial Training Really a Silver Bullet for Mitigating Data Poisoning?

Equal Improvability: A New Fairness Notion Considering the Long-term Impact

On the Perils of Cascading Robust Classifiers

Learning to Estimate Shapley Values with Vision Transformers

Neural-based classification rule learning for sequential data

Machine Unlearning of Federated Clusters

Multimodal Federated Learning via Contrastive Representation Ensemble

A new characterization of the edge of stability based on a sharpness measure aware of batch gradient distribution

On The Relative Error of Random Fourier Features for Preserving Kernel Distance

A General Framework For Proving The Equivariant Strong Lottery Ticket Hypothesis

Contextual bandits with concave rewards, and an application to fair ranking

Why adversarial training can hurt robust accuracy

Continuous pseudo-labeling from the start

Neural Groundplans: Persistent Neural Scene Representations from a Single Image

A Multi-Grained Self-Interpretable Symbolic-Neural Model For Single/Multi-Labeled Text Classification

Hyperbolic Self-paced Learning for Self-supervised Skeleton-based Action Representations

Identifiability Results for Multimodal Contrastive Learning

Diffusion Adversarial Representation Learning for Self-supervised Vessel Segmentation

Guiding Safe Exploration with Weakest Preconditions

(ends 4:30 AM)

Lunch

3:30 a.m.

4:30 a.m.

Invited Talk:

Understanding Systematic Deviations in Data for Trustworthy AI

Girmaw Abebe Tadesse

(ends 5:30 AM)

5:30 a.m.

Coffee Break

6 a.m.

Oral 2 Track 5: Generative models & Theory [6:00-7:30]

Orals 6:00-7:00

[6:00] Neural Optimal Transport

[6:10] Implicit Bias of Large Depth Networks: a Notion of Rank for Nonlinear Functions

[6:20] Effects of Graph Convolutions in Multi-layer Networks

[6:40] Modeling content creator incentives on algorithm-curated platforms

[6:50] Optimal Transport for Offline Imitation Learning

(ends 7:30 AM)

Oral 2 Track 2: General Machine Learning [6:00-7:30]

Orals 6:00-7:20

[6:00] The Symmetric Generalized Eigenvalue Problem as a Nash Equilibrium

[6:10] Metadata Archaeology: Unearthing Data Subsets by Leveraging Training Dynamics

[6:20] Targeted Hyperparameter Optimization with Lexicographic Preferences Over Multiple Objectives

[6:40] LAVA: Data Valuation without Pre-Specified Learning Algorithms

[7:00] Learning a Data-Driven Policy Network for Pre-Training Automated Feature Engineering

[7:10] Learning where and when to reason in neuro-symbolic inference

(ends 7:30 AM)

Oral 2 Track 4: Reinforcement Learning [6:00-7:30]

Orals 6:00-7:20

[6:00] Multi-skill Mobile Manipulation for Object Rearrangement

[6:10] The Surprising Effectiveness of Equivariant Models in Domains with Latent Symmetry

[6:20] A System for Morphology-Task Generalization via Unified Representation and Behavior Distillation

[6:30] Sample-Efficient Reinforcement Learning by Breaking the Replay Ratio Barrier

[6:40] Powderworld: A Platform for Understanding Generalization via Rich Task Distributions

[6:50] Near-optimal Policy Identification in Active Reinforcement Learning

[7:00] BC-IRL: Learning Generalizable Reward Functions from Demonstrations

[7:10] Learning About Progress From Experts

(ends 7:30 AM)

Oral 2 Track 3: Generative models [6:00-7:30]

Orals 6:00-7:00

[6:00] Diffusion Posterior Sampling for General Noisy Inverse Problems

[6:10] Prompt-to-Prompt Image Editing with Cross-Attention Control

[6:20] Sequential Latent Variable Models for Few-Shot High-Dimensional Time-Series Forecasting

[6:30] Diffusion Models Already Have A Semantic Latent Space

[6:40] DreamFusion: Text-to-3D using 2D Diffusion

[6:50] Sampling is as easy as learning the score: theory for diffusion models with minimal data assumptions

(ends 7:30 AM)

Oral 2 Track 1: Applications [6:00-7:30]

Orals 6:00-7:20

[6:00] GPViT: A High Resolution Non-Hierarchical Vision Transformer with Group Propagation

[6:20] Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization

[6:30] Human Motion Diffusion Model

[6:40] NTFields: Neural Time Fields for Physics-Informed Robot Motion Planning

[6:50] UNIFIED-IO: A Unified Model for Vision, Language, and Multi-modal Tasks

[7:00] Mass-Editing Memory in a Transformer

[7:10] On the Usefulness of Embeddings, Clusters and Strings for Text Generation Evaluation

(ends 7:30 AM)

Oral 2 Track 6: Applications & Social Aspects of Machine Learning [6:00-7:30]

Orals 6:00-7:20

[6:00] A Call to Reflect on Evaluation Practices for Failure Detection in Image Classification

[6:10] Associative Memory Augmented Asynchronous Spatiotemporal Representation Learning for Event-based Perception

[6:20] MapTR: Structured Modeling and Learning for Online Vectorized HD Map Construction

[6:30] ROSCOE: A Suite of Metrics for Scoring Step-by-Step Reasoning

[6:40] Ask Me Anything: A simple strategy for prompting language models

[6:50] Code Translation with Compiler Representations

[7:00] Hidden Markov Transformer for Simultaneous Machine Translation

[7:10] Selection-Inference: Exploiting Large Language Models for Interpretable Logical Reasoning

(ends 7:30 AM)

7:30 a.m.

Poster Session 2 [7:30-9:30]

Posters 7:30-9:30

Domain Generalisation via Domain Adaptation: An Adversarial Fourier Amplitude Approach

Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning

Selective Frequency Network for Image Restoration

Pareto Invariant Risk Minimization: Towards Mitigating the Optimization Dilemma in Out-of-Distribution Generalization

Error Sensitivity Modulation based Experience Replay: Mitigating Abrupt Representation Drift in Continual Learning

Complexity-Based Prompting for Multi-step Reasoning

Language Models Are Greedy Reasoners: A Systematic Formal Analysis of Chain-of-Thought

Ask Me Anything: A simple strategy for prompting language models

Human Motion Diffusion Model

NTFields: Neural Time Fields for Physics-Informed Robot Motion Planning

On the Usefulness of Embeddings, Clusters and Strings for Text Generation Evaluation

Temporal Coherent Test Time Optimization for Robust Video Classification

UNIFIED-IO: A Unified Model for Vision, Language, and Multi-modal Tasks

Mass-Editing Memory in a Transformer

Robust Scheduling with GFlowNets

TranSpeech: Speech-to-Speech Translation With Bilateral Perturbation

In-Situ Text-Only Adaptation of Speech Models with Low-Overhead Speech Imputations

A Call to Reflect on Evaluation Practices for Failure Detection in Image Classification

Open-Vocabulary Object Detection upon Frozen Vision and Language Models

H2RBox: Horizontal Box Annotation is All You Need for Oriented Object Detection

ROSCOE: A Suite of Metrics for Scoring Step-by-Step Reasoning

GPViT: A High Resolution Non-Hierarchical Vision Transformer with Group Propagation

Weakly Supervised Knowledge Transfer with Probabilistic Logical Reasoning for Object Detection

Iterative Circuit Repair Against Formal Specifications

Hidden Markov Transformer for Simultaneous Machine Translation

An Equal-Size Hard EM Algorithm for Diverse Dialogue Generation

DFlow: Learning to Synthesize Better Optical Flow Datasets via a Differentiable Pipeline

Discrete Contrastive Diffusion for Cross-Modal Music and Image Generation

Code Translation with Compiler Representations

Iterative Patch Selection for High-Resolution Image Recognition

MapTR: Structured Modeling and Learning for Online Vectorized HD Map Construction

Learning Iterative Neural Optimizers for Image Steganography

Associative Memory Augmented Asynchronous Spatiotemporal Representation Learning for Event-based Perception

Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization

FIGARO: Controllable Music Generation using Learned and Expert Features

General Neural Gauge Fields

Preserving Pre-trained Features Helps Calibrate Fine-tuned Language Models

Grounding Graph Network Simulators using Physical Sensor Observations

$\mathscr{N}$-WL: A New Hierarchy of Expressivity for Graph Neural Networks

Spatial Attention Kinetic Networks with E(n)-Equivariance

ChordMixer: A Scalable Neural Attention Model for Sequences with Different Length

Decomposed Prompting: A Modular Approach for Solving Complex Tasks

Bit-Pruning: A Sparse Multiplication-Less Dot-Product

Over-Training with Mixup May Hurt Generalization

Exploring and Exploiting Decision Boundary Dynamics for Adversarial Robustness

Can CNNs Be More Robust Than Transformers?

Scaffolding a Student to Instill Knowledge

Constraining Representations Yields Models That Know What They Don't Know

TVSPrune - Pruning Non-discriminative filters via Total Variation separability of intermediate representations without fine tuning

MLPInit: Embarrassingly Simple GNN Training Acceleration with MLP Initialization

Towards One-shot Neural Combinatorial Solvers: Theoretical and Empirical Notes on the Cardinality-Constrained Case

Decoupled Training for Long-Tailed Classification With Stochastic Representations

REPAIR: REnormalizing Permuted Activations for Interpolation Repair

Feature Reconstruction From Outputs Can Mitigate Simplicity Bias in Neural Networks

Gradient Gating for Deep Multi-Rate Learning on Graphs

The Symmetric Generalized Eigenvalue Problem as a Nash Equilibrium

Targeted Hyperparameter Optimization with Lexicographic Preferences Over Multiple Objectives

PASHA: Efficient HPO and NAS with Progressive Resource Allocation

Direct Embedding of Temporal Network Edges via Time-Decayed Line Graphs

Metadata Archaeology: Unearthing Data Subsets by Leveraging Training Dynamics

Learning where and when to reason in neuro-symbolic inference

LAVA: Data Valuation without Pre-Specified Learning Algorithms

Learning a Data-Driven Policy Network for Pre-Training Automated Feature Engineering

Neural Optimal Transport

DreamFusion: Text-to-3D using 2D Diffusion

Diffusion Models Already Have A Semantic Latent Space

Diffusion Posterior Sampling for General Noisy Inverse Problems

Generative Modelling with Inverse Heat Dissipation

Discrete Predictor-Corrector Diffusion Models for Image Synthesis

Unified Detoxifying and Debiasing in Language Generation via Inference-time Adaptive Optimization

Explicitly Minimizing the Blur Error of Variational Autoencoders

Sequential Latent Variable Models for Few-Shot High-Dimensional Time-Series Forecasting

Prompt-to-Prompt Image Editing with Cross-Attention Control

Sampling is as easy as learning the score: theory for diffusion models with minimal data assumptions

Factorized Fourier Neural Operators

Competitive Physics Informed Networks

Learning Symbolic Models for Graph-structured Physical Mechanism

Improved Training of Physics-Informed Neural Networks Using Energy-Based Priors: a Study on Electrical Impedance Tomography

Diffusion Probabilistic Modeling of Protein Backbones in 3D for the motif-scaffolding problem

Protein Representation Learning by Geometric Structure Pretraining

Uni-Mol: A Universal 3D Molecular Representation Learning Framework

Computational Language Acquisition with Theory of Mind

Exploring perceptual straightness in learned visual representations

Learning Sparse and Low-Rank Priors for Image Recovery via Iterative Reweighted Least Squares Minimization

CUTS: Neural Causal Discovery from Irregular Time-Series Data

Diffusion Models for Causal Discovery via Topological Ordering

Accurate Bayesian Meta-Learning by Accurate Task Posterior Inference

Trading Information between Latents in Hierarchical Variational Autoencoders

Neural Causal Models for Counterfactual Identification and Estimation

Causal Reasoning in the Presence of Latent Confounders via Neural ADMG Learning

Scaling up and Stabilizing Differentiable Planning with Implicit Differentiation

SpeedyZero: Mastering Atari with Limited Data and Time

Diminishing Return of Value Expansion Methods in Model-Based Reinforcement Learning

Optimal Transport for Offline Imitation Learning

On the Feasibility of Cross-Task Transfer with Model-Based Reinforcement Learning

Policy Expansion for Bridging Offline-to-Online Reinforcement Learning

The Surprising Effectiveness of Equivariant Models in Domains with Latent Symmetry

Multi-skill Mobile Manipulation for Object Rearrangement

Near-optimal Policy Identification in Active Reinforcement Learning

Powderworld: A Platform for Understanding Generalization via Rich Task Distributions

Minimum Description Length Control

BC-IRL: Learning Generalizable Reward Functions from Demonstrations

Become a Proficient Player with Limited Data through Watching Pure Videos

Learning About Progress From Experts

Certifiably Robust Policy Learning against Adversarial Multi-Agent Communication

Sample-Efficient Reinforcement Learning by Breaking the Replay Ratio Barrier

A System for Morphology-Task Generalization via Unified Representation and Behavior Distillation

POPGym: Benchmarking Partially Observable Reinforcement Learning

Cheap Talk Discovery and Utilization in Multi-Agent Reinforcement Learning

Global Explainability of GNNs via Logic Combination of Learned Concepts

Revisiting Graph Adversarial Attack and Defense From a Data Distribution Perspective

Selection-Inference: Exploiting Large Language Models for Interpretable Logical Reasoning

An Exact Poly-Time Membership-Queries Algorithm for Extracting a Three-Layer ReLU Network

Fairness and Accuracy under Domain Generalization

Variational Information Pursuit for Interpretable Predictions

Quantile Risk Control: A Flexible Framework for Bounding the Probability of High-Loss Predictions

Quantifying and Mitigating the Impact of Label Errors on Model Disparity Metrics

Measure the Predictive Heterogeneity

MultiViz: Towards Visualizing and Understanding Multimodal Models

Effects of Graph Convolutions in Multi-layer Networks

Memorization Capacity of Neural Networks with Conditional Computation

Implicit Bias of Large Depth Networks: a Notion of Rank for Nonlinear Functions

Constructive TT-representation of the tensors given as index interaction functions with applications

Benign Overfitting in Classification: Provably Counter Label Noise with Larger Models

Uniform-in-time propagation of chaos for the mean-field gradient Langevin dynamics

Information-Theoretic Analysis of Unsupervised Domain Adaptation

Modeling content creator incentives on algorithm-curated platforms

Optimizing Spca-based Continual Learning: A Theoretical Approach

Amortised Invariance Learning for Contrastive Self-Supervision

Unsupervised visualization of image datasets using contrastive learning

Self-Supervised Set Representation Learning for Unsupervised Meta-Learning

Temperature Schedules for self-supervised contrastive methods on long-tail data

The hidden uniform cluster prior in self-supervised learning

(ends 9:30 AM)

8 a.m.

9:30 a.m.

Remarks:

Opening Ceremony

(ends 9:50 AM)

9:50 a.m.

Reception:

Reception

(ends 11:00 AM)

11 p.m.

Registration / Check-in

(ends 9:00 AM)

11:30 p.m.

Invited Talk:

Importance-Weighting Approach to Distribution Shift Adaptation

Masashi Sugiyama

(ends 12:30 AM)

TUE 2 MAY

12:30 a.m.

Coffee Break

1 a.m.

Oral 3 Track 4: General Machine Learning & Unsupervised and Self-supervised learning [1:00-2:30]

Orals 1:00-2:20

[1:00] On the duality between contrastive and non-contrastive self-supervised learning

[1:10] Unsupervised Meta-learning via Few-shot Pseudo-supervised Contrastive Learning

[1:20] The Trade-off between Universality and Label Efficiency of Representations from Contrastive Learning

[1:30] Self-supervised learning with rotation-invariant kernels

[1:40] DINO as a von Mises-Fisher mixture model

[1:50] Loss Landscapes are All You Need: Neural Network Generalization Can Be Explained Without the Implicit Bias of Gradient Descent

[2:00] Efficient Discrete Multi Marginal Optimal Transport Regularization

[2:10] Sparsity-Constrained Optimal Transport

(ends 2:30 AM)

Oral 3 Track 2: Deep Learning and representational learning [1:00-2:30]

Orals 1:00-2:20

[1:00] Efficient Conditionally Invariant Representation Learning

[1:10] Image to Sphere: Learning Equivariant Features for Efficient Pose Prediction

[1:20] Omnigrok: Grokking Beyond Algorithmic Data

[1:30] Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers

[1:40] Multi-Rate VAE: Train Once, Get the Full Rate-Distortion Curve

[1:50] Multi-lingual Evaluation of Code Generation Models

[2:00] Rethinking the Expressive Power of GNNs via Graph Biconnectivity

[2:10] Hyperbolic Deep Reinforcement Learning

(ends 2:30 AM)

Oral 3 Track 3: Generative models [1:00-2:30]

Orals 1:00-2:20

[1:00] The Role of ImageNet Classes in Fréchet Inception Distance

[1:10] Learning Diffusion Bridges on Constrained Domains

[1:20] Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow

[1:30] Learning multi-scale local conditional probability models of images

[1:40] An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion

[1:50] Rarity Score : A New Metric to Evaluate the Uncommonness of Synthesized Images

[2:00] Deterministic training of generative autoencoders using invertible layers

[2:10] 3D generation on ImageNet

(ends 2:30 AM)

Oral 3 Track 1: Reinforcement Learning [1:00-2:30]

Orals 1:00-2:20

[1:00] Adversarial Diversity in Hanabi

[1:10] Moving Forward by Moving Backward: Embedding Action Impact over Action Semantics

[1:20] Programmatically Grounded, Compositionally Generalizable Robotic Manipulation

[1:30] On the Sensitivity of Reward Inference to Misspecified Human Models

[1:40] Understanding and Adopting Rational Behavior by Bellman Score Estimation

[1:50] SMART: Self-supervised Multi-task pretrAining with contRol Transformers

[2:00] Dichotomy of Control: Separating What You Can Control from What You Cannot

[2:10] Fast and Precise: Adjusting Planning Horizon with Adaptive Subgoal Search

(ends 2:30 AM)

Oral 3 Track 5: Deep Learning and representational learning & Neuroscience and Cognitive Science [1:00-2:30]

Orals 1:00-2:20

[1:00] Sign and Basis Invariant Networks for Spectral Graph Representation Learning

[1:10] ACMP: Allen-Cahn Message Passing with Attractive and Repulsive Forces for Graph Neural Networks

[1:20] Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task

[1:30] QuAnt: Quantum Annealing with Learnt Couplings

[1:40] Unmasking the Lottery Ticket Hypothesis: What's Encoded in a Winning Ticket's Mask?

[1:50] The Asymmetric Maximum Margin Bias of Quasi-Homogeneous Neural Networks

[2:00] The Lie Derivative for Measuring Learned Equivariance

[2:10] Training language models to summarize narratives improves brain alignment

(ends 2:30 AM)

2:30 a.m.

Poster Session 3 [2:30-4:30]

Posters 2:30-4:30

Few-shot Backdoor Attacks via Neural Tangent Kernels

Mid-Vision Feedback

Markup-to-Image Diffusion Models with Scheduled Sampling

Language models are multilingual chain-of-thought reasoners

Language Models Can Teach Themselves to Program Better

A Non-monotonic Self-terminating Language Model

DiffusER: Diffusion via Edit-based Reconstruction

Understanding Embodied Reference with Touch-Line Transformer

Rethinking Self-Supervised Visual Representation Learning in Pre-training for 3D Human Pose and Shape Estimation

Active Image Indexing

Voint Cloud: Multi-View Point Cloud Representation for 3D Understanding

Edge Guided GANs with Contrastive Learning for Semantic Image Synthesis

Agent-based Graph Neural Networks

Limitless Stability for Graph Convolutional Networks

The Asymmetric Maximum Margin Bias of Quasi-Homogeneous Neural Networks

Rethinking the Expressive Power of GNNs via Graph Biconnectivity

Anti-Symmetric DGN: a stable architecture for Deep Graph Networks

ACMP: Allen-Cahn Message Passing with Attractive and Repulsive Forces for Graph Neural Networks

LilNetX: Lightweight Networks with EXtreme Model Compression and Structured Sparsification

Unmasking the Lottery Ticket Hypothesis: What's Encoded in a Winning Ticket's Mask?

QuAnt: Quantum Annealing with Learnt Couplings

TANGOS: Regularizing Tabular Neural Networks through Gradient Orthogonalization and Specialization

Treeformer: Dense Gradient Trees for Efficient Attention Computation

Latent Bottlenecked Attentive Neural Processes

The Onset of Variance-Limited Behavior for Networks in the Lazy and Rich Regimes

Multi-lingual Evaluation of Code Generation Models

DFPC: Data flow driven pruning of coupled channels without data.

The Lie Derivative for Measuring Learned Equivariance

DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection

Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers

Scaling Forward Gradient With Local Losses

Self-Ensemble Protection: Training Checkpoints Are Good Data Protectors

The Curious Case of Benign Memorization

Topology-aware Robust Optimization for Out-of-Distribution Generalization

Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task

This Looks Like It Rather Than That: ProtoKNN For Similarity-Based Classifiers

ODAM: Gradient-based Instance-Specific Visual Explanations for Object Detection

Cross-Level Distillation and Feature Denoising for Cross-Domain Few-Shot Classification

DBQ-SSD: Dynamic Ball Query for Efficient 3D Object Detection

Leveraging Unlabeled Data to Track Memorization

Data Valuation Without Training of a Model

Multivariate Time-series Imputation with Disentangled Temporal Representations

Learning to Compose Soft Prompts for Compositional Zero-Shot Learning

TimesNet: Temporal 2D-Variation Modeling for General Time Series Analysis

Image to Sphere: Learning Equivariant Features for Efficient Pose Prediction

Sign and Basis Invariant Networks for Spectral Graph Representation Learning

SoftMatch: Addressing the Quantity-Quality Tradeoff in Semi-supervised Learning

Unicom: Universal and Compact Representation Learning for Image Retrieval

Online Boundary-Free Continual Learning by Scheduled Data Prior

Efficient Conditionally Invariant Representation Learning

Learning to Extrapolate: A Transductive Approach

FedFA: Federated Feature Augmentation

Decepticons: Corrupted Transformers Breach Privacy in Federated Learning for Language Models

Omnigrok: Grokking Beyond Algorithmic Data

Backpropagation through Combinatorial Algorithms: Identity with Projection Works

Neural Radiance Field Codebooks

Autoregressive Conditional Neural Processes

Loss Landscapes are All You Need: Neural Network Generalization Can Be Explained Without the Implicit Bias of Gradient Descent

FIT: A Metric for Model Sensitivity

Classically Approximating Variational Quantum Machine Learning with Random Fourier Features

Label Propagation with Weak Supervision

Learning to CROSS exchange to solve min-max vehicle routing problems

Continual Pre-training of Language Models

Diffusion Probabilistic Fields

FunkNN: Neural Interpolation for Functional Generation

Rarity Score : A New Metric to Evaluate the Uncommonness of Synthesized Images

An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion

The Role of ImageNet Classes in Fréchet Inception Distance

3D generation on ImageNet

LDMIC: Learning-based Distributed Multi-view Image Coding

Learning multi-scale local conditional probability models of images

Learning Diffusion Bridges on Constrained Domains

StyleMorph: Disentangled 3D-Aware Image Synthesis with a 3D Morphable StyleGAN

Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow

A critical look at the evaluation of GNNs under heterophily: Are we really making progress?

E3Bind: An End-to-End Equivariant Network for Protein-Ligand Docking

Sampling-free Inference for Ab-Initio Potential Energy Surface Networks

Actionable Neural Representations: Grid Cells from Minimal Constraints

Training language models to summarize narratives improves brain alignment

Sparsity-Constrained Optimal Transport

Noise Is Not the Main Factor Behind the Gap Between Sgd and Adam on Transformers, But Sign Descent Might Be

Variance Reduction is an Antidote to Byzantines: Better Rates, Weaker Assumptions and Communication Compression as a Cherry on the Top

Efficient Discrete Multi Marginal Optimal Transport Regularization

ROCO: A General Framework for Evaluating Robustness of Combinatorial Optimization Solvers on Graphs

Sampling-based inference for large linear models, with application to linearised Laplace

Bridge the Inference Gaps of Neural Processes via Expectation Maximization

Estimating individual treatment effects under unobserved confounding using binary instruments

Imitating Human Behaviour with Diffusion Models

Programmatically Grounded, Compositionally Generalizable Robotic Manipulation

Integrating Symmetry into Differentiable Planning with Steerable Convolutions

Transformer-based World Models Are Happy With 100k Interactions

Adversarial Diversity in Hanabi

On the Sensitivity of Reward Inference to Misspecified Human Models

Fast and Precise: Adjusting Planning Horizon with Adaptive Subgoal Search

Moving Forward by Moving Backward: Embedding Action Impact over Action Semantics

Provable Sim-to-real Transfer in Continuous Domain with Partial Observations

Understanding and Adopting Rational Behavior by Bellman Score Estimation

Using Both Demonstrations and Language Instructions to Efficiently Learn Robotic Tasks

Dichotomy of Control: Separating What You Can Control from What You Cannot

Adversarial Imitation Learning with Preferences

SMART: Self-supervised Multi-task pretrAining with contRol Transformers

Hyperbolic Deep Reinforcement Learning

Efficient Planning in a Compact Latent Action Space

Stateful Active Facilitator: Coordination and Environmental Heterogeneity in Cooperative Multi-Agent Reinforcement Learning

A Mixture-of-Expert Approach to RL-based Dialogue Management

Efficient Deep Reinforcement Learning Requires Regulating Overfitting

Large Language Models are Human-Level Prompt Engineers

Holistic Adversarially Robust Pruning

FaiREE: fair classification with finite-sample and distribution-free guarantee

Efficient Certified Training and Robustness Verification of Neural ODEs

ESD: Expected Squared Difference as a Tuning-Free Trainable Calibration Measure

How to Exploit Hyperspherical Embeddings for Out-of-Distribution Detection?

GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face Synthesis

Easy Differentially Private Linear Regression

Revisiting Robustness in Graph Machine Learning

CANIFE: Crafting Canaries for Empirical Privacy Measurement in Federated Learning

Tuning Frequency Bias in Neural Network Training with Nonuniform Data

Adaptive Optimization in the $\infty$-Width Limit

How Sharpness-Aware Minimization Minimizes Sharpness?

Mini-batch $k$-means terminates within $O(d/\epsilon)$ iterations

Fundamental limits on the robustness of image classifiers

Offline Congestion Games: How Feedback Type Affects Data Coverage Requirement

Interpretations of Domain Adaptations via Layer Variational Analysis

Long-Tailed Learning Requires Feature Learning

Generalization Bounds for Federated Learning: Fast Rates, Unparticipating Clients and Unbounded Losses

KwikBucks: Correlation Clustering with Cheap-Weak and Expensive-Strong Signals

Robust Fair Clustering: A Novel Fairness Attack and Defense Framework

Link Prediction with Non-Contrastive Learning

Learning Rates as a Function of Batch Size: A Random Matrix Theory Approach to Neural Network Training

Towards a Unified Theoretical Understanding of Non-contrastive Learning via Rank Differential Mechanism

DINO as a von Mises-Fisher mixture model

Unsupervised Meta-learning via Few-shot Pseudo-supervised Contrastive Learning

On the duality between contrastive and non-contrastive self-supervised learning

The Trade-off between Universality and Label Efficiency of Representations from Contrastive Learning

Self-supervised learning with rotation-invariant kernels

(ends 4:30 AM)

Affinity Poster Session:

Blog Track Poster Session

(ends 4:30 AM)

Lunch

3:30 a.m.

4:30 a.m.

Invited Talk:

AI, History and Equity

Elaine Nsoesie

(ends 5:30 AM)

5:30 a.m.

Coffee Break

6 a.m.

Oral 4 Track 3: Reinforcement Learning I [6:00-7:30]

Orals 6:00-7:20

[6:00] Planning Goals for Exploration

[6:10] Outcome-directed Reinforcement Learning by Uncertainty \& Temporal Distance-Aware Curriculum Goal Generation

[6:20] Pink Noise Is All You Need: Colored Noise Exploration in Deep Reinforcement Learning

[6:30] Benchmarking Offline Reinforcement Learning on Real-Robot Hardware

[6:40] Choreographer: Learning and Adapting Skills in Imagination

[6:50] A CMDP-within-online framework for Meta-Safe Reinforcement Learning

[7:00] Confidence-Conditioned Value Functions for Offline Reinforcement Learning

[7:10] Extreme Q-Learning: MaxEnt RL without Entropy

(ends 7:30 AM)

Oral 4 Track 2: Probabilistic Methods [6:00-7:30]

Orals 6:00-7:10

[6:00] Active Learning in Bayesian Neural Networks with Balanced Entropy Learning Principle

[6:10] SAM as an Optimal Relaxation of Bayes

[6:20] Generative Augmented Flow Networks

[6:30] A Laplace-inspired Distribution on SO(3) for Probabilistic Rotation Estimation

[6:40] Domain-Indexing Variational Bayes: Interpretable Domain Index for Domain Adaptation

[6:50] GRACE-C: Generalized Rate Agnostic Causal Estimation via Constraints

[7:00] Rhino: Deep Causal Temporal Relationship Learning with History-dependent Noise

(ends 7:30 AM)

Oral 4 Track 1: Unsupervised and Self-supervised learning [6:00-7:30]

Orals 6:00-7:20

[6:00] Minimalistic Unsupervised Representation Learning with the Sparse Manifold Transform

[6:10] AANG : Automating Auxiliary Learning

[6:20] STUNT: Few-shot Tabular Learning with Self-generated Tasks from Unlabeled Tables

[6:30] Task-customized Masked Autoencoder via Mixture of Cluster-conditional Experts

[6:40] When Source-Free Domain Adaptation Meets Learning with Noisy Labels

[6:50] Towards Stable Test-time Adaptation in Dynamic Wild World

[7:00] Proposal-Contrastive Pretraining for Object Detection from Fewer Data

[7:10] Unsupervised Semantic Segmentation with Self-supervised Object-centric Representations

(ends 7:30 AM)

Oral 4 Track 5: Machine Learning for Sciences & Probabilistic Methods [6:00-7:30]

Orals 6:00-7:10

[6:00] Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal Proofs

[6:10] Flow Annealed Importance Sampling Bootstrap

[6:20] Learning Controllable Adaptive Simulation for Multi-resolution Physics

[6:30] Minimax Optimal Kernel Operator Learning via Multilevel Training

[6:40] Neural Lagrangian Schr\"{o}dinger Bridge: Diffusion Modeling for Population Dynamics

[6:50] Pre-training via Denoising for Molecular Property Prediction

[7:00] MARS: Meta-learning as Score Matching in the Function Space

(ends 7:30 AM)

Oral 4 Track 4: Reinforcement Learning II [6:00-7:30]

Orals 6:00-7:20

[6:00] Transformers are Sample-Efficient World Models

[6:10] Building a Subspace of Policies for Scalable Continual Learning

[6:20] Neural Episodic Control with State Abstraction

[6:30] Learnable Behavior Control: Breaking Atari Human World Records via Sample-Efficient Behavior Selection

[6:40] Offline RL with No OOD Actions: In-Sample Learning via Implicit Value Regularization

[6:50] Is Conditional Generative Modeling all you need for Decision Making?

[7:00] RLx2: Training a Sparse Deep Reinforcement Learning Model from Scratch

[7:10] Towards Effective and Interpretable Human-Agent Collaboration in MOBA Games: A Communication Perspective

(ends 7:30 AM)

Oral 4 Track 6: Deep Learning and representational learning- Reinforcement Learning [6:00-7:30]

Orals 6:00-7:20

[6:00] CUDA: Curriculum of Data Augmentation for Long-tailed Recognition

[6:10] One-Pixel Shortcut: On the Learning Preference of Deep Neural Networks

[6:20] Learning Label Encodings for Deep Regression

[6:30] Multifactor Sequential Disentanglement via Structured Koopman Autoencoders

[6:40] A Unified Algebraic Perspective on Lipschitz Neural Networks

[6:50] From Play to Policy: Conditional Behavior Generation from Uncurated Robot Data

[7:00] Git Re-Basin: Merging Models modulo Permutation Symmetries

[7:10] In-context Reinforcement Learning with Algorithm Distillation

(ends 7:30 AM)

6:30 a.m.

7:30 a.m.

Poster Session 4 [7:30-9:30]

Posters 7:30-9:30

Mind the Pool: Convolutional Neural Networks Can Overfit Input Size

Generalizing and Decoupling Neural Collapse via Hyperspherical Uniformity Gap

f-DM: A Multi-stage Diffusion Model via Progressive Signal Transformation

Compositional Semantic Parsing with Large Language Models

NANSY++: Unified Voice Synthesis with Neural Analysis and Synthesis

Principal Components Bias in Over-parameterized Linear Models, and its Manifestation in Deep Neural Networks

Revisiting Populations in multi-agent Communication

Topologically penalized regression on manifolds

Compositional Prompt Tuning with Motion Cues for Open-vocabulary Video Relation Detection

D4AM: A General Denoising Framework for Downstream Acoustic Models

Sparse Token Transformer with Attention Back Tracking

Meta Learning to Bridge Vision and Language Models for Multimodal Few-Shot Learning

Video Scene Graph Generation from Single-Frame Weak Supervision

Personalized Reward Learning with Interaction-Grounded Learning (IGL)

Learning Locality and Isotropy in Dialogue Modeling

Latent Graph Inference using Product Manifolds

Spikformer: When Spiking Neural Network Meets Transformer

Specformer: Spectral Graph Neural Networks Meet Transformers

A Unified Algebraic Perspective on Lipschitz Neural Networks

Git Re-Basin: Merging Models modulo Permutation Symmetries

The Lazy Neuron Phenomenon: On Emergence of Activation Sparsity in Transformers

Don’t forget the nullspace! Nullspace occupancy as a mechanism for out of distribution failure

DELTA: DEGRADATION-FREE FULLY TEST-TIME ADAPTATION

Extremely Simple Activation Shaping for Out-of-Distribution Detection

A Unified Framework for Soft Threshold Pruning

Modelling Long Range Dependencies in $N$D: From Task-Specific to a General Purpose CNN

Massively Scaling Heteroscedastic Classifiers

That Label's got Style: Handling Label Style Bias for Uncertain Image Segmentation

What Can we Learn From The Selective Prediction And Uncertainty Estimation Performance Of 523 Imagenet Classifiers?

Towards Robust Object Detection Invariant to Real-World Domain Shifts

Self-Distillation for Further Pre-training of Transformers

CUDA: Curriculum of Data Augmentation for Long-tailed Recognition

From Play to Policy: Conditional Behavior Generation from Uncurated Robot Data

Multifactor Sequential Disentanglement via Structured Koopman Autoencoders

One-Pixel Shortcut: On the Learning Preference of Deep Neural Networks

Enhancing Meta Learning via Multi-Objective Soft Improvement Functions

Learning Label Encodings for Deep Regression

Hard-Meta-Dataset++: Towards Understanding Few-Shot Performance on Difficult Tasks

PerFedMask: Personalized Federated Learning with Optimized Masking Vectors

Data augmentation alone can improve adversarial training

(Certified!!) Adversarial Robustness for Free!

Scaling Laws For Deep Learning Based Image Reconstruction

Learning with Auxiliary Activation for Memory-Efficient Training

Weakly-supervised HOI Detection via Prior-guided Bi-level Representation Learning

FiT: Parameter Efficient Few-shot Transfer Learning for Personalized and Federated Image Classification

Contrastive Learning for Unsupervised Domain Adaptation of Time Series

Disentangling Learning Representations with Density Estimation

Linearly Mapping from Image to Text Space

Towards Understanding and Mitigating Dimensional Collapse in Heterogeneous Federated Learning

Reliability of CKA as a Similarity Measure in Deep Learning

Beyond calibration: estimating the grouping loss of modern neural networks

Approximate Vanishing Ideal Computations at Scale

Artificial Neuronal Ensembles with Learned Context Dependent Gating

Sub-Task Decomposition Enables Learning in Sequence to Sequence Tasks

Online Bias Correction for Task-Free Continual Learning

On the Importance and Applicability of Pre-Training for Federated Learning

Towards Smooth Video Composition

Long Range Language Modeling via Gated State Spaces

A Graph Neural Network Approach to Automated Model Building in Cryo-EM Maps

DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking

Neural Lagrangian Schr\"{o}dinger Bridge: Diffusion Modeling for Population Dynamics

Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal Proofs

On the Word Boundaries of Emergent Languages Based on Harris's Articulation Scheme

Deep Generative Symbolic Regression

De Novo Molecular Generation via Connection-aware Motif Mining

Enhancing the Inductive Biases of Graph Neural ODE for Modeling Physical Systems

Pre-training via Denoising for Molecular Property Prediction

Flow Annealed Importance Sampling Bootstrap

Minimax Optimal Kernel Operator Learning via Multilevel Training

Learning Controllable Adaptive Simulation for Multi-resolution Physics

HyperDeepONet: learning operator with complex target function space using the limited resources via hypernetwork

One Transformer Can Understand Both 2D & 3D Molecular Data

Simplicial Hopfield networks

Bayesian Oracle for bounding information gain in neural encoding models

TiAda: A Time-scale Adaptive Algorithm for Nonconvex Minimax Optimization

Symmetries, Flat Minima, and the Conserved Quantities of Gradient Flow

Faster federated optimization under second-order similarity

Generative Augmented Flow Networks

Domain-Indexing Variational Bayes: Interpretable Domain Index for Domain Adaptation

SAM as an Optimal Relaxation of Bayes

A Laplace-inspired Distribution on SO(3) for Probabilistic Rotation Estimation

GFlowNets and variational inference

GRACE-C: Generalized Rate Agnostic Causal Estimation via Constraints

Energy-Based Test Sample Adaptation for Domain Generalization

MARS: Meta-learning as Score Matching in the Function Space

Active Learning in Bayesian Neural Networks with Balanced Entropy Learning Principle

Active Learning for Object Detection with Evidential Deep Learning and Hierarchical Uncertainty Aggregation

Versatile Neural Processes for Learning Implicit Neural Representations

Rhino: Deep Causal Temporal Relationship Learning with History-dependent Noise

Causal Imitation Learning via Inverse Reinforcement Learning

Neural Episodic Control with State Abstraction

Greedy Actor-Critic: A New Conditional Cross-Entropy Method for Policy Improvement

Towards Effective and Interpretable Human-Agent Collaboration in MOBA Games: A Communication Perspective

Learnable Behavior Control: Breaking Atari Human World Records via Sample-Efficient Behavior Selection

Population-size-Aware Policy Optimization for Mean-Field Games

Is Conditional Generative Modeling all you need for Decision Making?

Conservative Bayesian Model-Based Value Expansion for Offline Policy Optimization

Building a Subspace of Policies for Scalable Continual Learning

Choreographer: Learning and Adapting Skills in Imagination

Outcome-directed Reinforcement Learning by Uncertainty \& Temporal Distance-Aware Curriculum Goal Generation

Investigating Multi-task Pretraining and Generalization in Reinforcement Learning

RLx2: Training a Sparse Deep Reinforcement Learning Model from Scratch

Offline Reinforcement Learning via High-Fidelity Generative Behavior Modeling

Information-Theoretic Characterization of the Generalization Error for Iterative Semi-Supervised Learning

Offline RL with No OOD Actions: In-Sample Learning via Implicit Value Regularization

Benchmarking Offline Reinforcement Learning on Real-Robot Hardware

Planning Goals for Exploration

A CMDP-within-online framework for Meta-Safe Reinforcement Learning

The Provable Benefit of Unsupervised Data Sharing for Offline Reinforcement Learning

Imitating Graph-Based Planning with Goal-Conditioned Policies

Decision Transformer under Random Frame Dropping

Transformers are Sample-Efficient World Models

Extreme Q-Learning: MaxEnt RL without Entropy

In-context Reinforcement Learning with Algorithm Distillation

Pink Noise Is All You Need: Colored Noise Exploration in Deep Reinforcement Learning

Confidence-Conditioned Value Functions for Offline Reinforcement Learning

Priors, Hierarchy, and Information Asymmetry for Skill Transfer in Reinforcement Learning

MAESTRO: Open-Ended Environment Design for Multi-Agent Reinforcement Learning

Prompting GPT-3 To Be Reliable

Temporal Disentanglement of Representations for Improved Generalisation in Reinforcement Learning

Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 Small

Incompatibility Clustering as a Defense Against Backdoor Poisoning Attacks

simpleKT: A Simple But Tough-to-Beat Baseline for Knowledge Tracing

Efficient Model Updates for Approximate Unlearning of Graph-Structured Data

Re-weighting Based Group Fairness Regularization via Classwise Robust Optimization

Private Federated Learning Without a Trusted Server: Optimal Algorithms for Convex Losses

The Implicit Bias of Minima Stability in Multivariate Shallow ReLU Networks

Learning to Linearize Deep Neural Networks for Secure and Efficient Private Inference

On Achieving Optimal Adversarial Test Error

Ollivier-Ricci Curvature for Hypergraphs: A Unified Framework

Learning Low Dimensional State Spaces with Overparameterized Recurrent Neural Nets

Meta-Learning in Games

Towards Stable Test-time Adaptation in Dynamic Wild World

Task-customized Masked Autoencoder via Mixture of Cluster-conditional Experts

AANG : Automating Auxiliary Learning

When Source-Free Domain Adaptation Meets Learning with Noisy Labels

Minimalistic Unsupervised Representation Learning with the Sparse Manifold Transform

STUNT: Few-shot Tabular Learning with Self-generated Tasks from Unlabeled Tables

A Unified Approach to Reinforcement Learning, Quantal Response Equilibria, and Two-Player Zero-Sum Games

Don’t fear the unlabelled: safe semi-supervised learning via debiasing

Unsupervised Semantic Segmentation with Self-supervised Object-centric Representations

wav2tok: Deep Sequence Tokenizer for Audio Retrieval

Proposal-Contrastive Pretraining for Object Detection from Fewer Data

Jointly Learning Visual and Auditory Speech Representations from Raw Data

Weighted Ensemble Self-Supervised Learning

(ends 9:30 AM)

8 a.m.

11 p.m.

Registration / Check-in

(ends 9:00 AM)

11:30 p.m.

Invited Talk:

Dialogue Research in the Era of LLMs

Dilek Hakkani-Tur

(ends 12:30 AM)

WED 3 MAY

12:30 a.m.

Coffee Break

1 a.m.

Oral 5 Track 4: Applications & Optimization [1:00-2:30]

Orals 1:00-2:10

[1:00] Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language

[1:10] Clean-image Backdoor: Attacking Multi-label Models with Poisoned Labels Only

[1:20] DocPrompting: Generating Code by Retrieving the Docs

[1:30] View Synthesis with Sculpted Neural Points

[1:40] VA-DepthNet: A Variational Approach to Single Image Depth Prediction

[1:50] Visual Classification via Description from Large Language Models

[2:00] Mitigating Gradient Bias in Multi-objective Learning: A Provably Convergent Approach

(ends 2:30 AM)

Oral 5 Track 3: Deep Learning and representational learning [1:00-2:30]

Orals 1:00-2:20

[1:00] Hungry Hungry Hippos: Towards Language Modeling with State Space Models

[1:10] Relative representations enable zero-shot latent space communication

[1:20] ExpressivE: A Spatio-Functional Embedding For Knowledge Graph Completion

[1:30] Distilling Model Failures as Directions in Latent Space

[1:40] Graph Neural Networks for Link Prediction with Subgraph Sketching

[1:50] The Influence of Learning Rule on Representation Dynamics in Wide Neural Networks

[2:00] REVISITING PRUNING AT INITIALIZATION THROUGH THE LENS OF RAMANUJAN GRAPH

[2:10] A Closer Look at Model Adaptation using Feature Distortion and Simplicity Bias

(ends 2:30 AM)

Oral 5 Track 1: Unsupervised and Self-supervised learning & Social Aspects of Machine Learning- [1:00-2:30]

Orals 1:00-2:20

[1:00] Progress measures for grokking via mechanistic interpretability

[1:10] Localized Randomized Smoothing for Collective Robustness Certification

[1:20] Towards Interpretable Deep Reinforcement Learning with Human-Friendly Prototypes

[1:30] CLIP-Dissect: Automatic Description of Neuron Representations in Deep Vision Networks

[1:40] Model-based Causal Bayesian Optimization

[1:50] Corrupted Image Modeling for Self-Supervised Visual Pre-Training

[2:00] SimPer: Simple Self-Supervised Learning of Periodic Targets

[2:10] Simplicial Embeddings in Self-Supervised Learning and Downstream Classification

(ends 2:30 AM)

Oral 5 Track 2: Optimization [1:00-2:30]

Orals 1:00-2:10

[1:00] DASHA: Distributed Nonconvex Optimization with Communication Compression and Optimal Oracle Complexity

[1:10] Single-shot General Hyper-parameter Optimization for Federated Learning

[1:20] Solving Constrained Variational Inequalities via a First-order Interior Point-based Method

[1:30] FedExP: Speeding Up Federated Averaging via Extrapolation

[1:40] LMC: Fast Training of GNNs via Subgraph Sampling with Provable Convergence

[1:50] Multi-Objective Online Learning

[2:00] Continuous PDE Dynamics Forecasting with Implicit Neural Representations

(ends 2:30 AM)

Oral 5 Track 5: Deep Learning and representational learning & Reinforcement Learning [1:00-2:30]

Orals 1:00-2:20

[1:00] Efficient recurrent architectures through activity sparsity and sparse back-propagation through time

[1:10] PLOT: Prompt Learning with Optimal Transport for Vision-Language Models

[1:20] Aligning Model and Macaque Inferior Temporal Cortex Representations Improves Model-to-Human Behavioral Alignment and Adversarial Robustness

[1:30] Implicit regularization in Heavy-ball momentum accelerated stochastic gradient descent

[1:40] A Primal-Dual Framework for Transformers and Neural Networks

[1:50] Learning with Logical Constraints but without Shortcut Satisfaction

[2:00] No Reason for No Supervision: Improved Generalization in Supervised Models

[2:10] Generating Diverse Cooperative Agents by Learning Incompatible Policies

(ends 2:30 AM)

2:30 a.m.

Poster Session 5 [2:30-4:30]

Posters 2:30-4:30

A Simple Approach for Visual Room Rearrangement: 3D Mapping and Semantic Search

Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-Shot Learners

Globally Injective ReLU Networks

View Synthesis with Sculpted Neural Points

Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language

Calibrating Sequence likelihood Improves Conditional Language Generation

DocPrompting: Generating Code by Retrieving the Docs

DamoFD: Digging into Backbone Design on Face Detection

Clean-image Backdoor: Attacking Multi-label Models with Poisoned Labels Only

Weakly Supervised Explainable Phrasal Reasoning with Neural Fuzzy Logic

Selective Annotation Makes Language Models Better Few-Shot Learners

The KFIoU Loss for Rotated Object Detection

$\mathrm{SE}(3)$-Equivariant Attention Networks for Shape Reconstruction in Function Space

Perfectly Secure Steganography Using Minimum Entropy Coupling

SLTUNET: A Simple Unified Model for Sign Language Translation

VA-DepthNet: A Variational Approach to Single Image Depth Prediction

Visual Classification via Description from Large Language Models

E-CRF: Embedded Conditional Random Field for Boundary-caused Class Weights Confusion in Semantic Segmentation

Proactive Multi-Camera Collaboration for 3D Human Pose Estimation

AnyDA: Anytime Domain Adaptation

SMART: Sentences as Basic Units for Text Evaluation

GAIN: On the Generalization of Instructional Action Understanding

Scaleformer: Iterative Multi-scale Refining Transformers for Time Series Forecasting

Red PANDA: Disambiguating Image Anomaly Detection by Removing Nuisance Factors

Equivariant Descriptor Fields: SE(3)-Equivariant Energy-Based Models for End-to-End Visual Robotic Manipulation Learning

Learning Math Reasoning from Self-Sampled Correct and Partially-Correct Solutions

Policy Pre-training for Autonomous Driving via Self-supervised Geometric Modeling

Revocable Deep Reinforcement Learning with Affinity Regularization for Outlier-Robust Graph Matching

Graph Neural Networks are Inherently Good Generalizers: Insights by Bridging GNNs and MLPs

Graph Neural Networks for Link Prediction with Subgraph Sketching

Confidence-Based Feature Imputation for Graphs with Partially Known Features

Sparse tree-based Initialization for Neural Networks

REVISITING PRUNING AT INITIALIZATION THROUGH THE LENS OF RAMANUJAN GRAPH

Self-Stabilization: The Implicit Bias of Gradient Descent at the Edge of Stability

Liquid Structural State-Space Models

Predictive Inference with Feature Conformal Prediction

ExpressivE: A Spatio-Functional Embedding For Knowledge Graph Completion

Out-of-distribution Detection with Implicit Outlier Transformation

Improving Deep Regression with Ordinal Entropy

How I Learned to Stop Worrying and Love Retraining

Distilling Model Failures as Directions in Latent Space

Efficient Edge Inference by Selective Query

Understanding Zero-shot Adversarial Robustness for Large-Scale Models

A Primal-Dual Framework for Transformers and Neural Networks

Hungry Hungry Hippos: Towards Language Modeling with State Space Models

Implicit regularization in Heavy-ball momentum accelerated stochastic gradient descent

Efficient recurrent architectures through activity sparsity and sparse back-propagation through time

Diversify and Disambiguate: Out-of-Distribution Robustness via Disagreement

The Augmented Image Prior: Distilling 1000 Classes by Extrapolating from a Single Image

How Much Data Are Augmentations Worth? An Investigation into Scaling Laws, Invariance, and Implicit Regularization

A Closer Look at Model Adaptation using Feature Distortion and Simplicity Bias

Surgical Fine-Tuning Improves Adaptation to Distribution Shifts

Linear Connectivity Reveals Generalization Strategies

PLOT: Prompt Learning with Optimal Transport for Vision-Language Models

Learning with Logical Constraints but without Shortcut Satisfaction

Confidence Estimation Using Unlabeled Data

NORM: Knowledge Distillation via N-to-One Representation Matching

Does Deep Learning Learn to Abstract? A Systematic Probing Framework

The Influence of Learning Rule on Representation Dynamics in Wide Neural Networks

A Generalized Projected Bellman Error for Off-policy Value Estimation in Reinforcement Learning

Is Forgetting Less a Good Inductive Bias for Forward Transfer?

Deep Learning on Implicit Neural Representations of Shapes

FreeMatch: Self-adaptive Thresholding for Semi-supervised Learning

Bispectral Neural Networks

Bias Propagation in Federated Learning

On The Inadequacy of Optimizing Alignment and Uniformity in Contrastive Learning of Sentence Representations

Relative representations enable zero-shot latent space communication

No Reason for No Supervision: Improved Generalization in Supervised Models

DCI-ES: An Extended Disentanglement Framework with Connections to Identifiability

Recursive Time Series Data Augmentation

Autoencoders as Cross-Modal Teachers: Can Pretrained 2D Image Transformers Help 3D Representation Learning?

Leveraging Importance Weights in Subset Selection

Generative Modeling Helps Weak Supervision (and Vice Versa)

Discovering Evolution Strategies via Meta-Black-Box Optimization

DAG Learning on the Permutahedron

Kernel Neural Optimal Transport

DiGress: Discrete Denoising diffusion for graph generation

Neural Architecture Design and Robustness: A Dataset

An Extensible Multi-modal Multi-task Object Dataset with Materials

ManiSkill2: A Unified Benchmark for Generalizable Manipulation Skills

FINDE: Neural Differential Equations for Finding and Preserving Invariant Quantities

Clifford Neural Layers for PDE Modeling

A Self-Attention Ansatz for Ab-initio Quantum Chemistry

Learning differentiable solvers for systems with hard constraints

Learning Domain-Agnostic Representation for Disease Diagnosis

GAMR: A Guided Attention Model for (visual) Reasoning

RandProx: Primal-Dual Optimization Algorithms with Randomized Proximal Updates

Finding Actual Descent Directions for Adversarial Training

DASHA: Distributed Nonconvex Optimization with Communication Compression and Optimal Oracle Complexity

Solving Constrained Variational Inequalities via a First-order Interior Point-based Method

LMC: Fast Training of GNNs via Subgraph Sampling with Provable Convergence

An Adaptive Policy to Employ Sharpness-Aware Minimization

Mitigating Gradient Bias in Multi-objective Learning: A Provably Convergent Approach

FedExP: Speeding Up Federated Averaging via Extrapolation

Multi-Objective Online Learning

Single-shot General Hyper-parameter Optimization for Federated Learning

SWIFT: Rapid Decentralized Federated Learning via Wait-Free Model Communication

Robustness to corruption in pre-trained Bayesian neural networks

Particle-based Variational Inference with Preconditioned Functional Gradient Flow

Calibrating Transformers via Sparse Gaussian Processes

Can Agents Run Relay Race with Strangers? Generalization of RL to Out-of-Distribution Trajectories

Measuring axiomatic soundness of counterfactual image models

Latent State Marginalization as a Low-cost Approach for Improving Exploration

Model-based Causal Bayesian Optimization

Hybrid RL: Using both offline and online data can make RL efficient

Value Memory Graph: A Graph-Structured World Model for Offline Reinforcement Learning

Generating Diverse Cooperative Agents by Learning Incompatible Policies

User-Interactive Offline Reinforcement Learning

Simple Emergent Action Representations from Multi-Task Policy Training

Timing is Everything: Learning to Act Selectively with Costly Actions and Budgetary Constraints

Proto-Value Networks: Scaling Representation Learning with Auxiliary Tasks

Energy-based Out-of-Distribution Detection for Graph Neural Networks

Localized Randomized Smoothing for Collective Robustness Certification

Planning with Sequence Models through Iterative Energy Minimization

Robust Explanation Constraints for Neural Networks

Strategic Classification with Graph Neural Networks

Discovering Latent Knowledge in Language Models Without Supervision

Concept Gradient: Concept-based Interpretation Without Linear Assumption

A law of adversarial risk, interpolation, and label noise

Measuring Forgetting of Memorized Training Examples

Progress measures for grokking via mechanistic interpretability

Everybody Needs Good Neighbours: An Unsupervised Locality-based Method for Bias Mitigation

CLIP-Dissect: Automatic Description of Neuron Representations in Deep Vision Networks

Individual Privacy Accounting with Gaussian Differential Privacy

Temporal Dependencies in Feature Importance for Time Series Prediction

Explaining RL Decisions with Trajectories

Stochastic Differentially Private and Fair Learning

Towards Interpretable Deep Reinforcement Learning with Human-Friendly Prototypes

How robust is unsupervised representation learning to distribution shift?

A Non-Asymptotic Analysis of Oversmoothing in Graph Neural Networks

On the Saturation Effect of Kernel Ridge Regression

Characterizing the spectrum of the NTK via a power series expansion

Collaborative Pure Exploration in Kernel Bandit

Learning ReLU networks to high uniform accuracy is intractable

Understanding The Robustness of Self-supervised Learning Through Topic Modeling

Bidirectional Language Models Are Also Few-shot Learners

Self-Supervised Category-Level Articulated Object Pose Estimation with Part-Level SE(3) Equivariance

From $t$-SNE to UMAP with contrastive learning

Simplicial Embeddings in Self-Supervised Learning and Downstream Classification

Corrupted Image Modeling for Self-Supervised Visual Pre-Training

SimPer: Simple Self-Supervised Learning of Periodic Targets

What Do Self-Supervised Vision Transformers Learn?

Human-level Atari 200x faster

(ends 4:30 AM)

Lunch

3:30 a.m.

Town Hall:

Town Hall: LLMs in the ICLR Writing Process?

(ends 4:00 AM)

4:30 a.m.

Invited Talk:

Learned optimizers: why they're the future, why they’re hard, and what they can do now

Jascha Sohl-Dickstein

(ends 5:30 AM)

5:30 a.m.

Remarks:

Closing Ceremony

(ends 5:35 AM)

Coffee Break

6 a.m.

Oral 6 Track 2: Infrastructure & Social Aspects of Machine Learning [6:00-7:30]

Orals 6:00-7:10

[6:00] DaxBench: Benchmarking Deformable Object Manipulation with Differentiable Physics

[6:10] Betty: An Automatic Differentiation Library for Multilevel Optimization

[6:20] WikiWhy: Answering and Explaining Cause-and-Effect Questions

[6:30] MEDFAIR: Benchmarking Fairness for Medical Imaging

[6:40] Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation

[6:50] Confidential-PROFITT: Confidential PROof of FaIr Training of Trees

[7:00] Disparate Impact in Differential Privacy from Gradient Misalignment

(ends 7:30 AM)

Oral 6 Track 4: Applications & Social Aspects of Machine Learning & General Machine Learning [6:00-7:30]

Orals 6:00-7:20

[6:00] Binding Language Models in Symbolic Languages

[6:10] MeshDiffusion: Score-based Generative 3D Mesh Modeling

[6:20] The Modality Focusing Hypothesis: Towards Understanding Crossmodal Knowledge Distillation

[6:30] AutoGT: Automated Graph Transformer Architecture Search

[6:40] Zero-Shot Image Restoration Using Denoising Diffusion Null-Space Model

[6:50] LightGCL: Simple Yet Effective Graph Contrastive Learning for Recommendation

[7:00] Certified Training: Small Boxes are All You Need

[7:10] Inequality phenomenon in $l_{\infty}$-adversarial training, and its unrealized threats

(ends 7:30 AM)

Oral 6 Track 3: Deep Learning and representational learning [6:00-7:30]

Orals 6:00-7:20

[6:00] Agree to Disagree: Diversity through Disagreement for Better Transferability

[6:10] What learning algorithm is in-context learning? Investigations with linear models

[6:20] Addressing Parameter Choice Issues in Unsupervised Domain Adaptation by Aggregation

[6:30] Encoding Recurrence into Transformers

[6:40] Universal Few-shot Learning of Dense Prediction Tasks with Visual Token Matching

[6:50] Simplified State Space Layers for Sequence Modeling

[7:00] Relational Attention: Generalizing Transformers for Graph-Structured Tasks

[7:10] Sparse Mixture-of-Experts are Domain Generalizable Learners

(ends 7:30 AM)

Oral 6 Track 1: Theory [6:00-7:30]

Orals 6:00-7:20

[6:00] Near-optimal Coresets for Robust Clustering

[6:10] Efficiently Computing Nash Equilibria in Adversarial Team Markov Games

[6:20] Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning

[6:30] Statistical Efficiency of Score Matching: The View from Isoperimetry

[6:40] Subquadratic Algorithms for Kernel Matrices via Kernel Density Estimation

[6:50] Depth Separation with Multilayer Mean-Field Networks

[7:00] Learning with Stochastic Orders

[7:10] Nonlinear Reconstruction for Operator Learning of PDEs with Discontinuities

(ends 7:30 AM)

Oral 6 Track 6: Deep Learning [6:00-7:30]

Orals 6:00-6:50

[6:00] Unsupervised Model Selection for Time Series Anomaly Detection

[6:10] A Kernel Perspective of Skip Connections in Convolutional Networks

[6:20] ReAct: Synergizing Reasoning and Acting in Language Models

[6:30] A framework for benchmarking Class-out-of-distribution detection and its application to ImageNet

[6:40] Packed Ensembles for efficient uncertainty estimation

(ends 7:30 AM)

Oral 6 Track 5: Applications- & Deep Learning and representational learning [6:00-7:30]

Orals 6:00-7:20

[6:00] Language Modelling with Pixels

[6:10] Parametrizing Product Shape Manifolds by Composite Networks

[6:20] ImageNet-X: Understanding Model Mistakes with Factor of Variation Annotations

[6:30] Data Continuity Matters: Improving Sequence Modeling with Lipschitz Regularizer

[6:40] Dual Algorithmic Reasoning

[6:50] DIFFormer: Scalable (Graph) Transformers Induced by Energy Constrained Diffusion

[7:00] Warping the Space: Weight Space Rotation for Class-Incremental Few-Shot Learning

[7:10] Last Layer Re-Training is Sufficient for Robustness to Spurious Correlations

(ends 7:30 AM)

Affinity Event:

Women in Machine Learning Social

(ends 8:00 AM)

7:30 a.m.

Poster Session 6 [7:30-9:30]

Posters 7:30-9:30

Summarization Programs: Interpretable Abstractive Summarization with Neural Modular Trees

Binding Language Models in Symbolic Languages

WiNeRT: Towards Neural Ray Tracing for Wireless Channel Modelling and Differentiable Simulations

Static Prediction of Runtime Errors by Learning to Execute Programs with External Resource Descriptions

ReAct: Synergizing Reasoning and Acting in Language Models

Greedification Operators for Policy Optimization: Investigating Forward and Reverse KL Divergences

Automating Nearest Neighbor Search Configuration with Constrained Optimization

On the Robustness to Misspecification of α-posteriors and Their Variational Approximations

Leveraging Future Relationship Reasoning for Vehicle Trajectory Prediction

CodeBPE: Investigating Subtokenization Options for Large Language Model Pretraining on Source Code

GOOD: Exploring geometric cues for detecting objects in an open world

Short-Term Memory Convolutions

Zero-Shot Image Restoration Using Denoising Diffusion Null-Space Model

Diagnosing and Rectifying Vision Models using Language

Real-Time Image Demoir$\acute{e}$ing on Mobile Devices

Switch-NeRF: Learning Scene Decomposition with Mixture of Experts for Large-scale Neural Radiance Fields

Is Attention All That NeRF Needs?

Consolidator: Mergable Adapter with Group Connections for Visual Adaptation

DropIT: Dropping Intermediate Tensors for Memory-Efficient DNN Training

Learning Uncertainty for Unknown Domains with Zero-Target-Assumption

Offline RL for Natural Language Generation with Implicit Language Q Learning

LightGCL: Simple Yet Effective Graph Contrastive Learning for Recommendation

Language Modelling with Pixels

CLIPSep: Learning Text-queried Sound Separation with Noisy Unlabeled Videos

A framework for benchmarking Class-out-of-distribution detection and its application to ImageNet

Reversible Column Networks

AutoGT: Automated Graph Transformer Architecture Search

Compositionality with Variation Reliably Emerges in Neural Networks

Equivariance-aware Architectural Optimization of Neural Networks

Semi-Parametric Inducing Point Networks and Neural Processes

Relational Attention: Generalizing Transformers for Graph-Structured Tasks

Parametrizing Product Shape Manifolds by Composite Networks

DIFFormer: Scalable (Graph) Transformers Induced by Energy Constrained Diffusion

PowerQuant: Automorphism Search for Non-Uniform Quantization

Over-parameterized Model Optimization with Polyak-{\L}ojasiewicz Condition

Last Layer Re-Training is Sufficient for Robustness to Spurious Correlations

Ordered GNN: Ordering Message Passing to Deal with Heterophily and Over-smoothing

What Is Missing in IRM Training and Evaluation? Challenges and Solutions

Dual Algorithmic Reasoning

ImageNet-X: Understanding Model Mistakes with Factor of Variation Annotations

MA-BERT: Towards Matrix Arithmetic-only BERT Inference by Eliminating Complex Non-Linear Functions

Composing Ensembles of Pre-trained Models via Iterative Consensus

Hierarchical Abstraction for Combinatorial Generalization in Object Rearrangement

$\Lambda$-DARTS: Mitigating Performance Collapse by Harmonizing Operation Selection among Cells

Encoding Recurrence into Transformers

Agree to Disagree: Diversity through Disagreement for Better Transferability

Deep Transformers without Shortcuts: Modifying Self-attention for Faithful Signal Propagation

More ConvNets in the 2020s: Scaling up Kernels Beyond 51x51 using Sparsity

Addressing Parameter Choice Issues in Unsupervised Domain Adaptation by Aggregation

TTN: A Domain-Shift Aware Batch Normalization in Test-Time Adaptation

Lossless Adaptation of Pretrained Vision Models For Robotic Manipulation

Verifying the Union of Manifolds Hypothesis for Image Data

Unveiling the sampling density in non-uniform geometric graphs

Geometrically regularized autoencoders for non-Euclidean data

Simplified State Space Layers for Sequence Modeling

Data Continuity Matters: Improving Sequence Modeling with Lipschitz Regularizer

Write and Paint: Generative Vision-Language Models are Unified Modal Learners

$k$NN Prompting: Beyond-Context Learning with Calibration-Free Nearest Neighbor Inference

A Simple Yet Powerful Deep Active Learning With Snapshots Ensembles

Packed Ensembles for efficient uncertainty estimation

What learning algorithm is in-context learning? Investigations with linear models

Part-Based Models Improve Adversarial Robustness

Effectively Modeling Time Series with Simple Discrete State Spaces

Warping the Space: Weight Space Rotation for Class-Incremental Few-Shot Learning

The Modality Focusing Hypothesis: Towards Understanding Crossmodal Knowledge Distillation

Understanding the Covariance Structure of Convolutional Filters

Universal Few-shot Learning of Dense Prediction Tasks with Visual Token Matching

Hyperparameter Optimization through Neural Network Partitioning

Unsupervised Model Selection for Time Series Anomaly Detection

ContraNorm: A Contrastive Learning Perspective on Oversmoothing and Beyond

Bridging the Gap to Real-World Object-Centric Learning

Disentanglement of Correlated Factors via Hausdorff Factorized Support

EquiMod: An Equivariance Module to Improve Visual Instance Discrimination

When to Make and Break Commitments?

Block and Subword-Scaling Floating-Point (BSFP) : An Efficient Non-Uniform Quantization For Low Precision Inference

A Statistical Framework for Personalized Federated Learning and Estimation: Theory, Algorithms, and Privacy

Blurring Diffusion Models

Neural Implicit Shape Editing using Boundary Sensitivity

MeshDiffusion: Score-based Generative 3D Mesh Modeling

Efficient Federated Domain Translation

Betty: An Automatic Differentiation Library for Multilevel Optimization

Winning Both the Accuracy of Floating Point Activation and the Simplicity of Integer Arithmetic

SoftZoo: A Soft Robot Co-design Benchmark For Locomotion In Diverse Environments

WikiWhy: Answering and Explaining Cause-and-Effect Questions

DaxBench: Benchmarking Deformable Object Manipulation with Differentiable Physics

Equivariant Shape-Conditioned Generation of 3D Molecules for Ligand-Based Drug Design

Multiple sequence alignment as a sequence-to-sequence learning problem

Context-enriched molecule representations improve few-shot drug discovery

Interneurons accelerate learning dynamics in recurrent neural networks for statistical adaptation

SGDA with shuffling: faster convergence for nonconvex-PŁ minimax optimization

Trainable Weight Averaging: Efficient Training by Optimizing Historical Solutions

Solving stochastic weak Minty variational inequalities without increasing batch size

Min-Max Multi-objective Bilevel Optimization with Applications in Robust Machine Learning

Fast Nonlinear Vector Quantile Regression

Meta Temporal Point Processes

Riemannian Metric Learning via Optimal Transport

Evolving Populations of Diverse RL Agents with MAP-Elites

Provably Efficient Risk-Sensitive Reinforcement Learning: Iterated CVaR and Worst Path

Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning

Scaling Laws for a Multi-Agent Reinforcement Learning Model

Simplifying Model-based RL: Learning Representations, Latent-space Models, and Policies with One Objective

Impossibly Good Experts and How to Follow Them

Reward Design with Language Models

Harnessing Mixed Offline Reinforcement Learning Datasets via Trajectory Weighting

Visual Imitation Learning with Patch Rewards

Backstepping Temporal Difference Learning

Expressive Monotonic Neural Networks

Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation

Certified Training: Small Boxes are All You Need

Confidential-PROFITT: Confidential PROof of FaIr Training of Trees

MEDFAIR: Benchmarking Fairness for Medical Imaging

Inequality phenomenon in $l_{\infty}$-adversarial training, and its unrealized threats

Disparate Impact in Differential Privacy from Gradient Misalignment

Causal Confusion and Reward Misidentification in Preference-Based Reward Learning

Panning for Gold in Federated Learning: Targeted Text Extraction under Arbitrarily Large-Scale Aggregation

Excess Risk of Two-Layer ReLU Neural Networks in Teacher-Student Settings and its Superiority to Kernel Methods

On The Specialization of Neural Modules

Label-free Concept Bottleneck Models

Depth Separation with Multilayer Mean-Field Networks

Pitfalls of Gaussians as a noise distribution in NCE

Near-optimal Coresets for Robust Clustering

Towards convergence to Nash equilibria in two-team zero-sum games

Efficiently Computing Nash Equilibria in Adversarial Team Markov Games

Robust Algorithms on Adaptive Inputs from Bounded Adversaries

A view of mini-batch SGD via generating functions: conditions of convergence, phase transitions, benefit from negative momenta.

Variance-Aware Sparse Linear Bandits

Strong inductive biases provably prevent harmless interpolation

Subquadratic Algorithms for Kernel Matrices via Kernel Density Estimation

Forward Super-Resolution: How Can GANs Learn Hierarchical Generative Models for Real-World Distributions

Plateau in Monotonic Linear Interpolation --- A "Biased" View of Loss Landscape for Deep Networks

Statistical Efficiency of Score Matching: The View from Isoperimetry

Nonlinear Reconstruction for Operator Learning of PDEs with Discontinuities

Learning in temporally structured environments

A Kernel Perspective of Skip Connections in Convolutional Networks

Learning with Stochastic Orders

DAVA: Disentangling Adversarial Variational Autoencoder

Fake It Until You Make It : Towards Accurate Near-Distribution Novelty Detection

Unsupervised 3D Object Learning through Neuron Activity aware Plasticity

Heterogeneous Neuronal and Synaptic Dynamics for Spike-Efficient Unsupervised Learning: Theory and Design Principles

SlotFormer: Unsupervised Visual Dynamics Simulation with Object-Centric Models

A Message Passing Perspective on Learning Dynamics of Contrastive Learning

Universal Approximation Theorems for Differentiable Geometric Deep Learning

Rethinking the Effect of Data Augmentation in Adversarial Contrastive Learning

Self-Supervised Geometric Correspondence for Category-Level 6D Object Pose Estimation in the Wild

Exploring The Role of Mean Teachers in Self-supervised Masked Auto-Encoders

(ends 9:30 AM)

10 p.m.

Registration / Check-in

(ends 9:00 AM)

11:20 p.m.

Affinity Workshop:

Joint IndabaXRwanda / BlackInAI

(ends 9:00 AM)

11:30 p.m.

Affinity Workshop:

Kaggle@ICLR 2023: ML Solutions in Africa

(ends 8:00 AM)

11:45 p.m.

Workshop:

Trustworthy and Reliable Large-Scale Machine Learning Models

(ends 8:30 AM)

THU 4 MAY

midnight

Workshop:

Tackling Climate Change with Machine Learning: Global Perspectives and Local Challenges

(ends 9:45 AM)

Workshop:

Reincarnating Reinforcement Learning

(ends 8:00 AM)

Workshop:

AI for Agent-Based Modelling (AI4ABM)

(ends 9:00 AM)

Workshop:

Physics for Machine Learning

(ends 9:00 AM)

Workshop:

Trustworthy Machine Learning for Healthcare

(ends 8:00 AM)

12:15 a.m.

Workshop:

Mathematical and Empirical Understanding of Foundation Models (ME-FoMo)

(ends 7:45 AM)

12:30 a.m.

Coffee Break

12:40 a.m.

Workshop:

Neural Fields across Fields: Methods and Applications of Implicit Neural Representations

(ends 9:05 AM)

1 a.m.

Workshop:

Neurosymbolic Generative Models (NeSy-GeMs)

(ends 10:00 AM)

Workshop:

What do we need for successful domain generalization?

(ends 9:40 AM)

3 a.m.

Lunch

5:30 a.m.

Coffee Break

6 a.m.

Workshop:

From Molecules to Materials: ICLR 2023 Workshop on Machine learning for materials (ML4Materials)

(ends 3:00 PM)

9 a.m.

10 p.m.

Registration / Check-in

(ends 3:00 AM)

11 p.m.

11:50 p.m.

Workshop:

Scene Representations for Autonomous Driving

(ends 8:30 AM)

FRI 5 MAY

midnight

Workshop:

The 4th Workshop on practical ML for Developing Countries: learning under limited/low resource settings

(ends 8:00 AM)

Workshop:

ICLR 2023 Workshop on Sparsity in Neural Networks: On practical limitations and tradeoffs between sustainability and efficiency

(ends 10:00 AM)

Workshop:

ICLR 2023 Workshop on Machine Learning for Remote Sensing

(ends 8:30 AM)

Workshop:

First workshop on "Machine Learning & Global Health".

(ends 8:00 AM)

Workshop:

Multimodal Representation Learning (MRL): Perks and Pitfalls

(ends 9:00 AM)

Affinity Workshop:

Tiny Papers Showcase Day (a DEI initiative)

(ends 8:00 AM)

Workshop:

Pitfalls of limited data and computation for Trustworthy ML

(ends 9:00 AM)

12:30 a.m.

Coffee Break

1 a.m.

Workshop:

4th Workshop on African Natural Language Processing (AfricaNLP 2023)

(ends 10:00 AM)

1:10 a.m.

Workshop:

Time Series Representation Learning for Health

(ends 10:00 AM)

2 a.m.

Workshop:

Machine Learning for Drug Discovery (MLDD)

(ends 11:00 AM)

3 a.m.

Lunch

5:30 a.m.

Workshop:

Deep Learning for Code (DL4C)

(ends 2:00 PM)

Workshop:

Machine Learning for IoT: Datasets, Perception, and Understanding

(ends 2:30 PM)

Coffee Break

6 a.m.

Workshop:

Backdoor Attacks and Defenses in Machine Learning

(ends 3:05 PM)