Skip to yearly menu bar
Skip to main content
Main Navigation
ICLR
Help/FAQ
Contact ICLR
Downloads
ICLR Blog
Code of Conduct
Privacy Policy
Create Profile
Reset Password
Journal To Conference Track
Diversity & Inclusion
Proceedings at OpenReview
Future Meetings
Press
Exhibitor Information
ICLR Twitter
About ICLR
My Stuff
Login
Select Year: (2026)
2026
2025
2024
2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
Getting Started
Schedule
Main Conference
Invited Talks
Awards
Papers
In-person Orals
Spotlight Posters
Blog Track Posters
Workshops
Community
Town Hall
Socials
Sponsors
Organizers
Help
Getting Started
Layout:
mini
compact
topic
detail
×
No topics available
No sessions available
title
author
topic
session
shuffle
by
serendipity
bookmarked first
visited first
not visited first
bookmarked but not visited
Enable Javascript in your browser to see the papers page.
Reading Images Like Texts: Sequential Image Understanding in Vision-Language Models
AdAEM: An Adaptively and Automated Extensible Evaluation Method of LLMs' Value Difference
DIVA-GRPO: Enhancing Multimodal Reasoning through Difficulty-Adaptive Variant Advantage
Scaling Reasoning Hop Exposes Weaknesses: Demystifying and Improving Hop Generalization in Large Language Models
Maximizing Asynchronicity in Event-based Neural Networks
MOAI: Module-Optimizing Architecture for Non-Interactive Secure Transformer Inference
AutoFly: Vision-Language-Action Model for UAV Autonomous Navigation in the Wild
Generation then Reconstruction: Accelerating Masked Autoregressive Models via Two-Stage Sampling
Guiding Mixture-of-Experts with Temporal Multimodal Interactions
Do We Need All the Synthetic Data? Targeted Image Augmentation via Diffusion Models
MLP Memory: A Retriever-Pretrained Memory for Large Language Models
Data Aware and Scalable Sensitivity Analysis for Decision Tree Ensembles
An Information-Theoretic Parameter-Free Bayesian Framework for Probing Labeled Dependency Trees from Attention Score
LearNAT: Learning NL2SQL with AST-guided Task Decomposition for Large Language Models
Reliable Poisoned Sample Detection against Backdoor Attacks Enhanced by Sharpness Aware Minimization
Adaptive Mixture of Disentangled Experts for Dynamic Graphs under Distribution Shifts
ResWorld: Temporal Residual World Model for End-to-End Autonomous Driving
Carré du champ flow matching: better quality-generalisation tradeoff in generative models
Similarity-aware Non-Convex Federated Optimization
DriveVLA-W0: World Models Amplify Data Scaling Law in Autonomous Driving
Terminal Velocity Matching
CMT-Benchmark: A Benchmark for Condensed Matter Theory Built by Expert Researchers
FedMuon: Federated Learning with Bias-corrected LMO-based Optimization
$\ell_1$ Latent Distance based Continuous-time Graph Representation
ThinkOmni: Lifting Textual Reasoning to Omni-modal Scenarios via Guidance Decoding
The Achilles’ Heel of LLMs: How Altering a Handful of Neurons Can Cripple Language Abilities
DADA: Dual Averaging with Distance Adaptation
Neyman-Pearson Classification under Both Null and Alternative Distributions Shift
Decision Pre-Trained Transformer is a Scalable In-Context Reinforcement Learner
VLBiMan: Vision-Language Anchored One-Shot Demonstration Enables Generalizable Bimanual Robotic Manipulation
$\nabla$-Reasoner: LLM Reasoning via Test-Time Gradient Descent in Textual Space
Pseudo-Non-Linear Data Augmentation: A Constrained Energy Minimization Viewpoint
An Efficient SE(p)-Invariant Transport Metric Driven by Polar Transport Discrepancy-based Representation
AttriCtrl: A Generalizable Framework for Controlling Semantic Attribute Intensity in Diffusion Models
Bridging Radiology and Pathology Foundation Models via Concept-Based Multimodal Co-Adaptation
SimULi: Real-Time LiDAR and Camera Simulation with Unscented Transforms
ResCP: Reservoir Conformal Prediction for Time Series Forecasting
FlowGen: Synthesizing Diverse Flowcharts to Enhance and Benchmark MLLM Reasoning
Si-GT: Fast Interconnect Signal Integrity Analysis for Integrated Circuit Design via Graph Transformers
Coarse-to-Fine Learning of Dynamic Causal Structures
Let LLMs Speak Embedding Languages: Generative Text Embeddings via Iterative Contrastive Refinement
Optimizing ID Consistency in Multimodal Large Models: Facial Restoration via Alignment, Entanglement, and Disentanglement
Human-LLM Collaborative Feature Engineering for Tabular Data
Human-AI Curation Synergy: Scaling Preference Data Curation via Human-Guided AI Feedback
Object-Centric Refinement for Enhanced Zero-Shot Segmentation
Distributional Consistency Loss: Beyond Pointwise Data Terms in Inverse Problems
TimeSearch-R: Adaptive Temporal Search for Long-Form Video Understanding via Self-Verification Reinforcement Learning
Uni-NTFM: A Unified Foundation Model for EEG Signal Representation Learning
Controllable Logical Hypothesis Generation for Abductive Reasoning in Knowledge Graphs
MME-Emotion: A Holistic Evaluation Benchmark for Emotional Intelligence in Multimodal Large Language Models
Robust Reward Modeling via Causal Rubrics
SPECS: Decoupling Multimodal Learning via Self-distilled Preference-based Cold Start
Learning Hierarchical and Geometry-Aware Graph Representations for Text-to-CAD
HGNet: Scalable Foundation Model for Automated Knowledge Graph Generation from Scientific Literature
Orak: A Foundational Benchmark for Training and Evaluating LLM Agents on Diverse Video Games
RPM: Reasoning-Level Personalization for Black-Box Large Language Models
An efficient, provably optimal, practical algorithm for the 0-1 loss linear classification problem
Dual-Robust Cross-Domain Offline Reinforcement Learning Against Dynamics Shifts
Progressive Online Video Understanding with Evidence-Aligned Timing and Transparent Decisions
Otters: An Energy-Efficient Spiking Transformer via Optical Time-to-First-Spike Encoding
Spatial-DISE: A Unified Benchmark for Evaluating Spatial Reasoning in Vision-Language Models
LUMINA: Detecting Hallucinations in RAG System with Context–Knowledge Signals
OmniActor: A Generalist GUI and Embodied Agent for 2D&3D Worlds
Transformers Don’t Need LayerNorm at Inference Time: Scaling LayerNorm Removal to GPT-2 XL and Implications for Mechanistic Interpretability
Mixture-of-Experts Can Surpass Dense LLMs Under Strictly Equal Resource
Zero-shot HOI Detection with MLLM-based Detector-agnostic Interaction Recognition
Post-hoc Probabilistic Vision-Language Models
Automatic Image-Level Morphological Trait Annotation for Organismal Images
ComputerRL: Scaling End-to-End Online Reinforcement Learning for Computer Use Agents
Tina: Tiny Reasoning Models via LoRA
What Scales in Cross-Entropy Scaling Law?
Don't Just Fine-tune the Agent, Tune the Environment
Conformalized Decision Risk Assessment
Bird's-eye-view Informed Reasoning Driver
Online Minimization of Polarization and Disagreement via Low-Rank Matrix Bandits
Flash-Mono: Feed-Forward Accelerated Gaussian Splatting Monocular SLAM
Object Fidelity Diffusion for Remote Sensing Image Generation
Harder Is Better: Boosting Mathematical Reasoning via Difficulty-Aware GRPO and Multi-Aspect Question Reformulation
STARK: Strategic Team of Agents for Refining Kernels
AdaReasoner: Dynamic Tool Orchestration for Iterative Visual Reasoning
Constantly Improving Image Models Need Constantly Improving Benchmarks
Corner Gradient Descent
Omni-iEEG: A Large-Scale, Comprehensive iEEG Dataset and Benchmark for Epilepsy Research
Doctor-R1: Mastering Clinical Inquiry with Experiential Agentic Reinforcement Learning
The Curious Case of In-Training Compression of State Space Models
Using cognitive models to reveal value trade-offs in language models
HackWorld: Evaluating Computer-Use Agents on Exploiting Web Application Vulnerabilities
A Structured, Tagged, and Localized Visual Question Answering Dataset with Full Sentence Answers and Scene Graphs for Chest X-ray Images
ExpertLongBench: Benchmarking Language Models on Expert-Level Long-Form Generation Tasks with Structured Checklists
Thinking as Society: Multi-Social-Agent Self-Distillation for Multimodal Misinformation Detection
Emergent Dexterity Via Diverse Resets and Large-Scale Reinforcement Learning
Image Inpainting with Preference Alignment
FingerTip 20K: A Benchmark for Proactive and Personalized Mobile LLM Agents
Query-Aware Flow Diffusion for Graph-Based RAG with Retrieval Guarantees
SABRE-FL: Selective and Accurate Backdoor Rejection for Federated Prompt Learning
Real-Time Reasoning Agents in Evolving Environments
IVEBench: Modern Benchmark Suite for Instruction-Guided Video Editing Assessment
The Human Brain as a Dynamic Mixture of Expert Models in Video Understanding
Breaking Barriers: Do Reinforcement Fine-tuning Gains Transfer To Unseen Domains?
Hippoformer: Integrating Hippocampus-inspired Spatial Memory with Transformers
OVID: Open-Vocabulary Intrusion Detection
Representing local protein environments with machine learning force fields
DevOps-Gym: Benchmarking AI Agents in Software DevOps Cycle
Divid: Disentangled Spatial-Temporal Modeling within LLMs for Temporally Grounded Video Understanding
GEPO: Group Expectation Policy Optimization for Stable Heterogeneous Reinforcement Learning
RewardMap: Tackling Sparse Rewards in Fine-grained Visual Reasoning via Multi-Stage Reinforcement Learning
PRISMM-Bench: A Benchmark of Peer-Review Grounded Multimodal Inconsistencies
Agnostics: Learning to Synthesize Code in Any Programming Language with a Universal Reinforcement Learning Environment
SpotIt: Evaluating Text-to-SQL Evaluation with Formal Verification
RefTool: Reference-Guided Tool Creation for Knowledge-Intensive Reasoning
Random Anchors with Low-rank Decorrelated Learning: A Minimalist Pipeline for Class-Incremental Medical Image Classification
Shuffle-R1: Efficient RL framework for Multimodal Large Language Models via Data-centric Dynamic Shuffle
VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models
Event-T2M: Event-level Conditioning for Complex Text-to-Motion Synthesis
Adaptive Rollout Allocation for Online Reinforcement Learning with Verifiable Rewards
ViMo: A Generative Visual GUI World Model for App Agents
DAMR: Efficient and Adaptive Context-Aware Knowledge Graph Question Answering with LLM-Guided MCTS
Bi-directional Bias Attribution: Debiasing Large Language Models without Modifying Prompts
EasyCreator: Empowering 4D Creation through Video Inpainting
Agent-X: Evaluating Deep Multimodal Reasoning in Vision-Centric Agentic Tasks
GeoPurify: A Data-Efficient Geometric Distillation Framework for Open-Vocabulary 3D Segmentation
Binomial Gradient-Based Meta-Learning for Enhanced Meta-Gradient Estimation
Learning Data-Efficient and Generalizable Neural Operators via Fundamental Physics Knowledge
AdvChain: Adversarial Chain-of-Thought Tuning for Robust Safety Alignment of Large Reasoning Models
BeyondBench: Benchmark-Free Evaluation of Reasoning in Language Models
Progressive Gaussian Transformer with Anisotropy-aware Sampling for Open Vocabulary Occupancy Prediction
SpaceControl: Introducing Test-Time Spatial Control to 3D Generative Modeling
ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning
Towards Anomaly-Aware Pre-Training and Fine-Tuning for Graph Anomaly Detection
Discount Model Search for Quality Diversity Optimization in High-Dimensional Measure Spaces
Optimistic Task Inference for Behavior Foundation Models
Mixture of Cognitive Reasoners: Modular Reasoning with Brain-Like Specialization
STAT: Skill-Targeted Adaptive Training
ProxyAttn: Guided Sparse Attention via Representative Heads
Inheriting Generalizable Knowledge from LLMs to Diverse Vertical Tasks
PRISM: Partial-label Relational Inference with Spatial and Spectral Cues
Premise Selection for a Lean Hammer
Bridging the performance-gap between target-free and target-based reinforcement learning
Hilbert-Guided Sparse Local Attention
Equilibrium Language Models
FARTrack: Fast Autoregressive Visual Tracking with High Performance
Principled Fast and Meta Knowledge Learners for Continual Reinforcement Learning
Feature segregation by signed weights in artificial vision systems and biological models
Dataset Distillation for Memorized Data: Soft Labels can Leak Held-Out Teacher Knowledge
Adaptive Scaling of Policy Constraints for Offline Reinforcement Learning
Adaptive Test-Time Training for Predicting Need for Invasive Mechanical Ventilation in Multi-Center Cohorts
SpikeStereoNet: A Brain-Inspired Framework for Stereo Depth Estimation from Spike Streams
Scaf-GRPO: Scaffolded Group Relative Policy Optimization for Enhancing LLM Reasoning
QuaMo: Quaternion Motions for Vision-based 3D Human Kinematics Capture
AlphaAgentEvo: Evolution-Oriented Alpha Mining via Self-Evolving Agentic Reinforcement Learning
Interleaving Reasoning for Better Text-to-Image Generation
Graph Diffusion Transformers are In-Context Molecular Designers
Combination-of-Experts with Knowledge Sharing for Cross-Task Vehicle Routing Problems
Matching without Group Barrier for Heterogeneous Treatment Effect Estimation
3D RNA Inverse Design with Reinforcement Learning-Guided Diffusion Models
Planner Aware Path Learning in Diffusion Language Models Training
HLD: Approximate Hierarchical Linguistic Distribution Modeling for LLM-Generated Text Detection
OmniMouse: Scaling properties of multi-modal, multi-task Brain Models on 150B Neural Tokens
Learn to Guide Your Diffusion Model
SAVE: A Generalizable Framework for Multi-Condition Single-Cell Generation with Gene Block Attention
EchoMind: An Interrelated Multi-level Benchmark for Evaluating Empathetic Speech Language Models
Slicing Wasserstein over Wasserstein via Functional Optimal Transport
CoAct-1: Computer-using Multi-agent System with Coding Actions
Scaling Speech Tokenizers with Diffusion Autoencoders
Threading Keyframe with Narratives: MLLMs as Strong Long Video Comprehenders
Spatially Informed Autoencoders for Interpretable Visual Representation Learning
LingoLoop Attack: Trapping MLLMs via Linguistic Context and State Entrapment into Endless Loops
Learning to Solve Orienteering Problem with Time Windows and Variable Profits
Embodied Navigation Foundation Model
DAVE: A VLM Vision Encoder for Document Understanding and Web Agents
Evaluating Language Models' Evaluations of Games
Composition of Memory Experts for Diffusion World Models
OmniVideoBench: Towards Audio-Visual Understanding Evaluation for Omni MLLMs
AbstRaL: Augmenting LLMs' Reasoning by Reinforcing Abstract Thinking
Structure Learning from Time-Series Data with Lag-Agnostic Structural Prior
OpenPros: A Large-Scale Dataset for Limited View Prostate Ultrasound Computed Tomography
Inferring brain plasticity rule under long-term stimulation with structured recurrent dynamics
RePrompt: Reasoning-Augmented Reprompting for Text-to-Image Generation via Reinforcement Learning
Universal Model Routing for Efficient LLM Inference
Spotlight on Token Perception for Multimodal Reinforcement Learning
Low-Rank Few-Shot Node Classification by Node-Level Graph Diffusion
FrameThinker: Learning to Think with Long Videos via Multi-Turn Frame Spotlighting
LookaheadKV: Fast and Accurate KV Cache Eviction by Glimpsing into the Future without Generation
Unpacking Human Preference for LLMs: Demographically Aware Evaluation with the Diverse Framework
Gradient Intrinsic Dimensionality Alignment:Narrowing The Gap Between Low-Rank Adaptation and Full Fine-Tuning
Scalable Exploration for High-Dimensional Continuous Control via Value-Guided Flow
POEMetric: The Last Stanza of Humanity
What Do Large Language Models Know About Opinions?
TGM: A Modular and Efficient Library for Machine Learning on Temporal Graphs
Stable and Scalable Deep Predictive Coding Networks with Meta Prediction Errors
DR-GGAD: Dual Residual Centering for Mitigating Anomaly Non‑Discriminativity in Generalist Graph Anomaly Detection
FaLW: A Forgetting-aware Loss Reweighting for Long-tailed Unlearning
Cross-Embodied Co-Design for Dexterous Hands
On the Design of One-step Diffusion via Shortcutting Flow Paths
Decomposed Attention Fusion in MLLMs for Training-free Video Reasoning Segmentation
FASTer: Toward Powerful and Efficient Autoregressive Vision–Language–Action Models with Learnable Action Tokenizer and Block-wise Decoding
Rényi Sharpness: A Novel Sharpness that Strongly Correlates with Generalization
Accelerated Learning with Linear Temporal Logic using Differentiable Simulation
AFD-INSTRUCTION: A Comprehensive Antibody Instruction Dataset with Functional Annotations for LLM-Based Understanding and Design
SpinBench: Perspective and Rotation as a Lens on Spatial Reasoning in VLMs
MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent
PACE: Pretrained Audio Continual Learning
TerraFM: A Scalable Foundation Model for Unified Multisensor Earth Observation
Exploratory Diffusion Model for Unsupervised Reinforcement Learning
SoftCFG: Uncertainty-guided Stable Guidance for Visual Autoregressive Model
From Pixels to Semantics: Unified Facial Action Representation Learning for Micro-Expression Analysis
Disco: Densely-overlapping Cell Instance Segmentation via Adjacency-aware Collaborative Coloring
Tuning the burn-in phase in training recurrent neural networks improves their performance
Culture In a Frame: C$^3$B as a Comic-Based Benchmark for Multimodal Culturally Awareness
CaTS: Calibrated Test-Time Scaling for Efficient LLM Inference
Reasoning Scaffolding: Distilling the Flow of Thought from LLMs
NANO3D: A Training-Free Approach for Efficient 3D Editing Without Masks
Virne: A Comprehensive Benchmark for RL-based Network Resource Allocation in NFV
Making Slow Thinking Faster: Compressing LLM Chain-of-Thought via Step Entropy
RankLLM: Weighted Ranking of LLMs by Quantifying Question Difficulty
Action Chunking and Data Augmentation Yield Exponential Improvements for Imitation Learning in Continuous Spaces
f-INE: A Hypothesis Testing Framework for Estimating Influence under Training Randomness
Memorization Through the Lens of Sample Gradients
BIRD: Behavior Induction via Representation-structure Distillation
PTQ4ARVG: Post-Training Quantization for AutoRegressive Visual Generation Models
BAH Dataset for Ambivalence/Hesitancy Recognition in Videos for Behavioural Change
GUIDE: Gated Uncertainty-Informed Disentangled Experts for Long-tailed Recognition
Think in Parallel, Answer as One: Logit Averaging for Open-Ended Reasoning
TrainRef: Curating Data with Label Distribution and Minimal Reference for Accurate Prediction and Reliable Confidence
PrefixMemory-Tuning: Modernizing Prefix-Tuning by Decoupling the Prefix from Attention
Battery Fault: A Comprehensive Dataset and Benchmark for Battery Fault Diagnosis
GraphUniverse: Enabling Systematic Evaluation of Inductive Generalization
PropensityBench: Evaluating Latent Safety Risks in Large Language Models via an Agentic Approach
Fostering Video Reasoning via Next-Event Prediction
Dropping Just a Handful of Preferences Can Change Top Large Language Model Rankings
Fly-CL: A Fly-Inspired Framework for Enhancing Efficient Decorrelation and Reduced Training Time in Pre-trained Model-based Continual Representation Learning
Long-tailed Test-Time Adaptation for Vision-Language Models
ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory
Neural Latent Arbitrary Lagrangian-Eulerian Grids for Fluid-Solid Interaction
Neural Dynamics Self-Attention for Spiking Transformers
Variational Reasoning for Language Models
Embracing Discrete Search: A Reasonable Approach to Causal Structure Learning
W-EDIT: A Wavelet-Based Frequency-Aware Framework for Text-Driven Image Editing
Reinforcing General Reasoning Without Verifiers
Multi-Armed Bandits with Minimum Aggregated Revenue Constraints
FSOD-VFM: Few-Shot Object Detection with Vision Foundation Models and Graph Diffusion
FERD: Fairness-Enhanced Data-Free Adversarial Robustness Distillation
ICYM2I: The illusion of multimodal informativeness under missingness
Intention-Conditioned Flow Occupancy Models
A Step to Decouple Optimization in 3DGS
TabStruct: Measuring Structural Fidelity of Tabular Data
FACT: Fine-grained Across-variable Convolution for Multivariate Time Series Forecasting
COSMOS: A Hybrid Adaptive Optimizer for Efficient Training of Large Language Models
Adaptive Canonicalization with Application to Invariant Anisotropic Geometric Networks
A Bayesian Nonparametric Framework for Private, Fair, and Balanced Tabular Data Synthesis
Generating Directed Graphs with Dual Attention and Asymmetric Encoding
C-Evolve: Consensus-based Evolution for Prompt Groups
The Geometry of Reasoning: Flowing Logics in Representation Space
Adaptive Domain Shift in Diffusion Models for Cross-Modality Image Translation
Agentic Collaboration as an Information Bottleneck Problem
On the Tension Between Optimality and Adversarial Robustness in Policy Optimization
MSCR: Exploring the Vulnerability of LLMs’ Mathematical Reasoning Abilities Using Multi-Source Candidate Replacement
Don't Throw Away Your Beams: Improving Consistency-based Uncertainties in LLMs via Beam Search
Bridging Successor Measure and Online Policy Learning with Flow Matching-Based Representations
ELEPHANT: Measuring and understanding social sycophancy in LLMs
Defending against Backdoor Attacks via Module Switching
Parameterized Hardness of Zonotope Containment and Neural Network Verification
ExpGuard: LLM Content Moderation in Specialized Domains
OSCAR: Online Soft Compression for RAG
LiveWeb-IE: A Benchmark For Online Web Information Extraction
More Than What Was Chosen: LLM-based Explainable Recommendation Beyond Noisy User Preferences
When Large Multimodal Models Confront Evolving Knowledge: Challenges and Explorations
Catalog-Native LLM: Speaking Item-ID dialect with Less Entanglement for Recommendation
Distribution-Aware Multi-Granularity Phase Coding: Towards Lower Conversion Error for Spike-Driven Large Language Models
An Ensemble Framework for Unbiased Language Model Watermarking
iFusion: Integrating Dynamic Interest Streams via Diffusion Model for Click-Through Rate Prediction
Soft-Masked Diffusion Language Models
LLMs Process Lists With General Filter Heads
SeedPrints: Fingerprints Can Even Tell Which Seed Your Large Language Model Was Trained From
UniQL: Unified Quantization and Low-rank Compression for Adaptive Edge LLMs
Meta-UCF: Unified Task-Conditioned LoRA Generation for Continual Learning in Large Language Models
AnesSuite: A Comprehensive Benchmark and Dataset Suite for Anesthesiology Reasoning in LLMs
Contractive Diffusion Policies: Robust Action Diffusion via Contractive Score-Based Sampling with Differential Equations
SecP-Tuning: Efficient Privacy-Preserving Prompt Tuning for Large Language Models via MPC
ARFlow: Auto-regressive Optical Flow Estimation for Arbitrary-Length Videos via Progressive Next-Frame Forecasting
Understanding and Improving Length Generalization in Hierarchical Sparse Attention Models
SimpleToM: Exposing the Gap between Explicit ToM Inference and Implicit ToM Application in LLMs
Sheaves Reloaded: A Direction Awakening
Refine Drugs, Don’t Complete Them: Uniform-Source Discrete Flows for Fragment-Based Drug Discovery
VMDiff: Visual Mixing Diffusion for Limitless Cross-Object Synthesis
Modeling the Density of Pixel-level Self-supervised Embeddings for Unsupervised Pathology Segmentation in Medical CT
Actions as Language: Fine-Tuning VLMs into VLAs Without Catastrophic Forgetting
Can You Hear Me Now? A Benchmark for Long-Range Graph Propagation
Thyme: Think Beyond Images
Prune Redundancy, Preserve Essence: Vision Token Compression in VLMs via Synergistic Importance-Diversity
Theoretical Modeling of Large Language Model Self-Improvement Training Dynamics Through Solver-Verifier Gap
CodeBrain: Towards Decoupled Interpretability and Multi-Scale Architecture for EEG Foundation Model
CellAgent: LLM-Driven Multi-Agent Framework for Natural Language-Based Single-Cell Analysis
Theoretical Guarantees for Causal Discovery on Large Random Graphs
Robust Federated Inference
Cross-Domain Policy Optimization via Bellman Consistency and Hybrid Critics
MIAM: Modality Imbalance-Aware Masking for Multimodal Ecological Applications
The Imitation Game: Turing Machine Imitator is Length Generalizable Reasoner
CitySeeker: How Do VLMs Explore Embodied Urban Navigation with Implicit Human Needs?
Self-Consistency Improves the Trustworthiness of Self-Interpretable GNNs
A Theoretical Analysis of Mamba’s Training Dynamics: Filtering Relevant Features for Generalization in State Space Models
RLVER: Reinforcement Learning with Verifiable Emotion Rewards for Empathetic Agents
Fluent Alignment with Disfluent Judges: Post-training for lower-resource languages
AudioX: A Unified Framework for Anything-to-Audio Generation
GRACE: Generative Representation Learning via Contrastive Policy Optimization
Fair Decision Utility in Human-AI Collaboration: Interpretable Confidence Adjustment for Humans with Cognitive Disparities
Reliability-Adjusted Prioritized Experience Replay
WeTok: Powerful Discrete Tokenization for High-Fidelity Visual Reconstruction
Empowering Small VLMs to Think with Dynamic Memorization and Exploration
Bayesian Ensemble for Sequential Decision-Making
ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data
Riemannian High-Order Pooling for Brain Foundation Models
AutoEP: LLMs-Driven Automation of Hyperparameter Evolution for Metaheuristic Algorithms
WorldGym: World Model as An Environment for Policy Evaluation
YuE: Scaling Open Foundation Models for Long-Form Music Generation
A Training-Free Framework for Long Video Understanding via Video-Query-Options Similarity
ST-SimDiff: Balancing Spatiotemporal Similarity and Difference for Efficient Video Understanding with MLLMs
Heterogeneous Agent Q-weighted Policy Optimization
ImpossibleBench: Measuring LLMs' Propensity of Exploiting Test Cases
P-GenRM: Personalized Generative Reward Model with Test-time User-based Scaling
VisCodex: Unified Multimodal Code Generation via Merging Vision and Coding Models
OFMU: OPTIMIZATION-DRIVEN FRAMEWORK FOR MACHINE UNLEARNING
APPLE: Toward General Active Perception via Reinforcement Learning
Differential Fine-Tuning Large Language Models Towards Better Diverse Reasoning Abilities
Exploring Diverse Generation Paths via Inference-time Stiefel Activation Steering
Hallucination Begins Where Saliency Drops
From Conversation to Query Execution: Benchmarking User and Tool Interactions for EHR Database Agents
Disentangling Length Bias in Preference Learning via Response-Conditioned Modeling
Reasoning or Retrieval? A Study of Answer Attribution on Large Reasoning Models
VisualPrompter: Semantic-Aware Prompt Optimization with Visual Feedback for Text-to-Image Synthesis
Context Tokens are Anchors: Understanding the Repetition Curse in Diffusion MLLMs from an Information Flow Perspective
Advancing Complex Video Object Segmentation via Progressive Concept Construction
Next Visual Granularity Generation
Discrete Diffusion Trajectory Alignment via Stepwise Decomposition
DrVoice: Parallel Speech-Text Voice Conversation Model via Dual-Resolution Speech Representations
Scaling Laws and Symmetry, Evidence from Neural Force Fields
Medical thinking with multiple images
LogiConBench: Benchmarking Logical Consistencies of LLMs
Breaking Gradient Temporal Collinearity for Robust Spiking Neural Networks
Learning on a Razor’s Edge: Identifiability and Singularity of Polynomial Neural Networks
Detecting Misbehaviors of Large Vision-Language Models by Evidential Uncertainty Quantification
SpaCE-Eval: A Benchmark for Real-World Multi-Modal Reasoning
Perturbed Dynamic Time Warping: A Probabilistic Framework and Generalized Variants
NoisePrints: Distortion-Free Watermarks for Authorship in Private Diffusion Models
FlowCast: Trajectory Forecasting for Scalable Zero-Cost Speculative Flow Matching
STEM: SCALING TRANSFORMERS WITH EMBEDDING MODULES
A Relative Error-Based Evaluation Framework of Heterogeneous Treatment Effect Estimators
From Text to Talk: Audio-Language Model Needs Non-Autoregressive Joint Training
Transformers with Endogenous In-Context Learning: Bias Characterization and Mitigation
Constrained Decoding of Diffusion LLMs with Context-Free Grammars
ROVER: Benchmarking Reciprocal Cross-Modal Reasoning for Omnimodal Generation
Preference Leakage: A Contamination Problem in LLM-as-a-judge
WOW-Seg: A Word-free Open World Segmentation Model
Speech-to-LaTeX: New Models and Datasets for Converting Spoken Equations and Sentences
Contamination Detection for VLMs Using Multi‑Modal Semantic Perturbations
Video-As-Prompt: Unified Semantic Control for Video Generation
Geometry-aware Policy Imitation
TAVAE: A VAE with Adaptable Priors Explains Contextual Modulation in the Visual Cortex
Steering Diffusion Models Towards Credible Content Recommendation
DynaGuard: A Dynamic Guardian Model With User-Defined Policies
Evidence for Limited Metacognition in LLMs
It's All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization
Learnability and Privacy Vulnerability are Entangled in a Few Critical Weights
Explore-on-Graph: Incentivizing Autonomous Exploration of Large Language Models on Knowledge Graphs with Path-refined Reward Modeling
Predicting LLM Output Length via Entropy-Guided Representations
Evaluating Data Influence in Meta Learning
Dissecting Representation Misalignment in Contrastive Learning via Influence Function
GoalRank: Group-Relative Optimization for a Large Ranking Model
Rethinking Policy Diversity in Ensemble Policy Gradient in Large-Scale Reinforcement Learning
Score-Based Density Estimation from Pairwise Comparisons
SCRIBES: Web-Scale Script-Based Semi-Structured Data Extraction with Reinforcement Learning
Harnessing Hyperbolic Geometry for Harmful Prompt Detection and Sanitization
KnowGuard: Knowledge-Driven Abstention for Multi-Round Clinical Reasoning
TRIDENT: Cross-Domain Trajectory Spatio-Temporal Representation via Distance-Preserving Triplet Learning
ResT: Reshaping Token-Level Policy Gradients for Tool-Use Large Language Models
A Representer Theorem for Hawkes Processes via Penalized Least Squares Minimization
Why Adversarially Train Diffusion Models?
Multiple-Prediction-Powered Inference
MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence
Deploying Models to Non-participating Clients in Federated Learning without Fine-tuning: A Hypernetwork-based Approach
Measuring and Mitigating Rapport Bias of Large Language Models under Multi-Agent Social Interactions
EarthSE: A Benchmark Evaluating Earth Scientific Exploration Capability for Large Language Models
TCD-Arena: Assessing Robustness of Time Series Causal Discovery Methods Against Assumption Violations
CAGE: A Framework for Culturally Adaptive Red-Teaming Benchmark Generation
Barriers for Learning in an Evolving World: Mathematical Understanding of Loss of Plasticity
Self-Improving Skill Learning for Robust Skill-based Meta-Reinforcement Learning
Enhancing Learning with Noisy Labels via Rockafellian Relaxation
Fisher-Rao Sensitivity for Out-of-Distribution Detection in Deep Neural Networks
Mechanistic Independence: A Principle for Identifiable Disentangled Representations
SSVPO: Effective Step-Level Credit Assignment for RL Training of Language Models
Composite Optimization with Error Feedback: the Dual Averaging Approach
Probability Distributions Computed by Hard-Attention Transformers
Optimizing Agent Planning for Security and Autonomy
REMem: Reasoning with Episodic Memory in Language Agent
Panoptic Pairwise Distortion Graph
FreeAdapt: Unleashing Diffusion Priors for Ultra-High-Definition Image Restoration
Any-to-Bokeh: Arbitrary-Subject Video Refocusing with Video Diffusion Model
Learning Survival Distributions with Individually Calibrated Asymmetric Laplace Distribution
RAS: Retrieval-And-Structuring for Knowledge-Intensive LLM Generation
EigenBench: A Comparative Behavioral Measure of Value Alignment
DiVE-k: DIFFERENTIAL VISUAL REASONING FOR FINE-GRAINED IMAGE RECOGNITION
Counterfactual LLM-based Framework for Measuring Rhetorical Style
Log-Augmented Generation: Scaling Test-Time Reasoning with Reusable Computation
Point-UQ: An Uncertainty-Quantification Paradigm for Point Cloud Few-Shot Class Incremental Learning
Micro-Macro Coupled Koopman Modeling on Graph for Traffic Flow Prediction
mCLM: A Modular Chemical Language Model that Generates Functional and Makeable Molecules
Convergence Analysis of Tsetlin Machines for Basic Boolean Operators under Noise-Free and Noisy Training Conditions
K-Prism: A Knowledge-Guided and Prompt Integrated Universal Medical Image Segmentation Model
MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers
Selective Rotary Position Embedding
Unfolding Spatial Cognition: Evaluating Multimodal Models on Visual Simulations
CGSA: Class-Guided Slot-Aware Adaptation for Source-Free Object Detections
RewardEval: Advancing Reward Model Evaluation
KRAMABENCH: A Benchmark for AI Systems on Data-to-Insight Pipelines over Data Lakes
From Markov to Laplace: How Mamba In-Context Learns Markov Chains
RefAny3D: 3D Asset-Referenced Diffusion Models for Image Generation
Improving Attributed Long-form Question Answering with Intent Awareness
The Choice of Divergence: A Neglected Key to Mitigating Diversity Collapse in Reinforcement Learning with Verifiable Reward
MetaCaptioner: Towards Generalist Visual Captioning with Open-source Suites
DeepWeightFlow: Re-Basined Flow Matching for Generating Neural Network Weights
Towards Multimodal Time Series Anomaly Detection with Semantic Alignment and Condensed Interaction
DiMeR: Disentangled Mesh Reconstruction Model with Normal-only Geometry Training
Scaling Sequence-to-Sequence Generative Neural Rendering
Talk, Evaluate, Diagnose: User-aware Agent Evaluation with Automated Error Analysis
AP-OOD: Attention Pooling for Out-of- Distribution Detection
Pareto-Conditioned Diffusion Models for Offline Multi-Objective Optimization
WebGen-Agent: Enhancing Interactive Website Generation with Multi-Level Feedback and Step-Level Reinforcement Learning
In-Context Watermarks for Large Language Models
Conformalized Hierarchical Calibration for Uncertainty-Aware Adaptive Hashing
PYRREGULAR: A Unified Framework for Irregular Time Series, with Classification Benchmarks
$\mu$LO: Compute-Efficient Meta-Generalization of Learned Optimizers
Learning for Highly Faithful Explainability
Dynamic Classifier-Free Diffusion Guidance via Online Feedback
Stretching Beyond the Obvious: A Gradient-Free Framework to Unveil the Hidden Landscape of Visual Invariance
Spatial Forcing: Implicit Spatial Representation Alignment for Vision-language-action Model
Unlocking Full Efficiency of Token Filtering in Large Language Model Training
Breaking Scale Anchoring: Frequency Representation Learning for Accurate High-Resolution Inference from Low-Resolution Training
Pi-CCA: Prompt-Invariant CCA Certificates for Replay-Free Vision–Language Continual Learning
LD-EnSF: Synergizing Latent Dynamics with Ensemble Score Filters for Fast Data Assimilation with Sparse Observations
VITA: Vision-to-Action Flow Matching Policy
Scaling Multi-Task Bayesian Optimization with Large Language Models
Reversible Primitive–Composition Alignment for Continual Vision–Language Learning
Scaling Bayesian Experimental Design to High-Dimensions with Information-Guided Diffusion
FETAL-GAUGE: A BENCHMARK FOR ASSESSING VISION-LANGUAGE MODELS IN FETAL ULTRASOUND
EchoMotion: Unified Human Video and Motion Generation via Dual-Modality Diffusion Transformer
Durian: Dual Reference Image-Guided Portrait Animation with Attribute Transfer
XIL: Cross-Expanding Incremental Learning
Attention Is All You Need for KV Cache in Diffusion LLMs
EditReward: A Human-Aligned Reward Model for Instruction-Guided Image Editing
IMSE: Intrinsic Mixture of Spectral Experts Fine-tuning for Test-Time Adaptation
OBS-Diff: Accurate Pruning For Diffusion Models in One-Shot
OD$^3$: Optimization-free Dataset Distillation for Object Detection
Bongard-RWR+: Real-World Representations of Fine-Grained Concepts in Bongard Problems
: One LLM Token for Explicit Graph Structural Understanding
PuzzleWorld: A Benchmark for Multimodal, Open-Ended Reasoning in Puzzlehunts
A foundation model with multi-variate parallel attention to generate neuronal activity
Reinforced Latent Reasoning for LLM-based Recommendation
ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows
Beyond Student: An Asymmetric Network for Neural Network Inheritance
Quantized Gradient Projection for Memory-Efficient Continual Learning
BoRA: Towards More Expressive Low-Rank Adaptation with Block Diversity
Two failure modes of deep transformers and how to avoid them: a unified theory of signal propagation at initialisation
Trajectory Generation with Conservative Value Guidance for Offline Reinforcement Learning
Hierarchical Value-Decomposed Offline Reinforcement Learning for Whole-Body Control
JointDiff: Bridging Continuous and Discrete in Multi-Agent Trajectory Generation
Expressive yet Efficient Feature Expansion with Adaptive Cross-Hadamard Products
Robust Adversarial Quantification via Conflict-Aware Evidential Deep Learning
Constitutional Classifiers++: Production-Grade Defenses against Universal Jailbreaks
Declarative Audio Editing with Audio Language Model
EdiVal-Agent: An Object-Centric Framework for Automated, Fine-Grained Evaluation of Multi-Turn Editing
Spatially Guided Training for Vision-Language-Action Model
Score Distillation Beyond Acceleration: Generative Modeling from Corrupted Data
Arbitrary Generative Video Interpolation
Group-Normalized Implicit Value Optimization for Language Models
Reward Models Inherit Value Biases from Pretraining
VQ-Transplant: Efficient VQ-Module Integration for Pre-trained Visual Tokenizers
Memba: Membrane-driven Parameter-Efficient Fine-Tuning for Mamba
Nasty Adversarial Training: A Probability Sparsity Perspective for Robustness Enhancement
Agentic Context Engineering: Learning Comprehensive Contexts for Self-Improving Language Models
LiteGuard: Efficient Task-Agnostic Model Fingerprinting with Enhanced Generalization
Fingerprinting Deep Neural Networks for Ownership Protection: An Analytical Approach
Self-Guided Low Light Object Detection Framework
ATOM: A Pretrained Neural Operator for Multitask Molecular Dynamics
Flock: A Knowledge Graph Foundation Model via Learning on Random Walks
A Scene is Worth a Thousand Features: Feed-Forward Camera Localization from a Collection of Image Features
MEGS^{2}: Memory-Efficient Gaussian Splatting via Spherical Gaussians and Unified Pruning
Symmetric Space Learning for Combinatorial Generalization
OrderDP: A Theoretically Guaranteed Lossless Dynamic Data Pruning Framework
MVCustom: Multi-View Customized Diffusion via Geometric Latent Rendering and Completion
Robust Training of Neural Networks at Arbitrary Precision and Sparsity
Federated Learning with Profile Mapping under Distribution Shifts and Drifts
Benchmarking Bias Mitigation Toward Fairness Without Harm from Vision to LVLMs
Learning to See Before Seeing: Demystifying LLM Visual Priors from Language Pre-training
Fore-Mamba3D: Mamba-based Foreground-Enhanced Encoding for 3D Object Detection
Generative Human Geometry Distribution
Towards Reliable Benchmarking: A Contamination Free, Controllable Evaluation Framework for Multi-step LLM Function Calling
Learning Dynamics of Logits Debiasing for Long-Tailed Semi-Supervised Learning
PerSpectra: A Scalable and Configurable Pluralist Benchmark of Perspectives from Arguments
SkyEvents: A Large-Scale Event-enhanced UAV Dataset for Robust 3D Scene Reconstruction
Controllable Sequence Editing for Biological and Clinical Trajectories
Mean Flow Policy with Instantaneous Velocity Constraint for One-step Action Generation
A Derandomization Framework for Structure Discovery: Applications in Neural Networks and Beyond
UI-Ins: Enhancing GUI Grounding with Multi-Perspective Instruction as Reasoning
Direct Preference Optimization for Primitive-Enabled Hierarchical RL: A Bilevel Approach
Less is more: Clustered Cross-Covariance Control for Offline RL
J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning
AUHead: Realistic Emotional Talking Head Generation via Action Units Control
GmNet: Revisiting Gating Mechanisms From A Frequency View
Earth-Agent: Unlocking the Full Landscape of Earth Observation with Agents
Aligning Collaborative View Recovery and Tensorial Subspace Learning via Latent Representation for Incomplete Multi-View Clustering
Lipschitz Bandits with Stochastic Delayed Feedback
Identifying and Evaluating Inactive Heads in Pretrained LLMs
World2Minecraft: Occupancy-Driven simulated scenes Construction
HOG-Diff: Higher-Order Guided Diffusion for Graph Generation
GraphOmni: A Comprehensive and Extensible Benchmark Framework for Large Language Models on Graph-theoretic Tasks
How Do Medical MLLMs Fail? A Study on Visual Grounding in Medical Images
Same Content, Different Representations: A Controlled Study for Table QA
MAGO: Beyond Fixed Hyperparameters with Multi-Objective Pareto Optimization for Hybrid LLM Reasoning
Product of Experts for Visual Generation
In-Place Test-Time Training
Efficient Zero-shot Inpainting with Decoupled Diffusion Guidance
Near Optimal Robust Federated Learning Against Data Poisoning Attack
Distillation of Large Language Models via Concrete Score Matching
Convex Dominance in Deep Learning: A Scaling Law of Loss and Learning Rate
AMiD: Knowledge Distillation for LLMs with $\alpha$-mixture Assistant Distribution
Out of the Memory Barrier: A Highly Memory-Efficient Training System for LLMs with Million-Token Contexts
Customizing Visual Emotion Evaluation for MLLMs: An Open-vocabulary, Multifaceted, and Scalable Approach
Translating Flow to Policy via Hindsight Online Imitation
One protein is all you need
AC-Sampler: Accelerate and Correct Diffusion Sampling with Metropolis-Hastings Algorithm
A Fano-Style Accuracy Upper Bound for LLM Single-Pass Reasoning in Multi-Hop QA
CryoSplat: Gaussian Splatting for Cryo-EM Homogeneous Reconstruction
GeoGramBench: Benchmarking the Geometric Program Reasoning in Modern LLMs
AbdCTBench: Learning Clinical Biomarker Representations from Abdominal Surface Geometry
BridgeDrive: Diffusion Bridge Policy for Closed-Loop Trajectory Planning in Autonomous Driving
Adaptive Collaboration with Humans: Metacognitive Policy Optimization for Multi-Agent LLMs with Continual Learning
Spectrum Tuning: Post-Training for Distributional Coverage and In-Context Steerability
BEP: A Binary Error Propagation Algorithm for Binary Neural Networks Training
Adaptive Debiasing Tsallis Entropy for Test-Time Adaptation
TIPO: Text to Image with Text Pre-sampling for Prompt Optimization
Closing the Gap Between Text and Speech Understanding in LLMs
Block Recurrent Dynamics in Vision Transformers
ActivationReasoning: Logical Reasoning in Latent Activation Spaces
AlphaBench: Benchmarking Large Language Models in Formulaic Alpha Factor Mining
Monocular Normal Estimation via Shading Sequence Estimation
Lost in Tokenization: Context as the Key to Unlocking Biomolecular Understanding in Scientific LLMs
HierLoc: Hyperbolic Entity Embeddings for Hierarchical Visual Geolocation
Mitigating Privacy Risk via Forget Set-Free Unlearning
From Utterance to Vividity: Training Expressive Subtitle Translation LLM via Adaptive Local Preference Optimization
SSD-GS: Scattering and Shadow Decomposition for Relightable 3D Gaussian Splatting
SpikePingpong: Spike Vision-based Fast-Slow Pingpong Robot System
Gauge-invariant representation holonomy
HiGS: History-Guided Sampling for Plug-and-Play Enhancement of Diffusion Models
When to use Graphs in RAG: A Comprehensive Analysis for Graph Retrieval-Augmented Generation
Pre-training Limited Memory Language Models with Internal and External Knowledge
Gradient-Based Diversity Optimization with Differentiable Top-$k$ Objective
Plan and Budget: Effective and Efficient Test-Time Scaling on Reasoning Large Language Models
FragFM: Hierarchical Framework for Efficient Molecule Generation via Fragment-Level Discrete Flow Matching
EchoGen: Generating Visual Echoes in Any Scene via Feed-Forward Subject-Driven Auto-Regressive Model
Text2Interact: High-Fidelity and Diverse Text-to-Two-Person Interaction Generation
Detecting Data Contamination from Reinforcement Learning Post-training for Large Language Models
SoFlow: Solution Flow Models for One-Step Generative Modeling
LadderSym: A Multimodal Interleaved Transformer for Music Practice Error Detection
PathChat-SegR1: Reasoning Segmentation in Pathology via SO-GRPO
Addressing divergent representations from causal interventions on neural networks
DiaBlo: Diagonal Blocks Are Sufficient For Finetuning
SNAP-UQ: Self-supervised Next-Activation Prediction for Single-Pass Uncertainty in TinyML
QeRL: Beyond Efficiency - Quantization-enhanced Reinforcement Learning for LLMs
Actions Speak Louder than Prompts: A Large-Scale Study of LLMs for Graph Inference
ProRe: A Proactive Reward System for GUI Agents via Reasoner–Actor Collaboration
Chart Deep Research in LVLMs via Parallel Relative Policy Optimization
SportR: A Benchmark for Multimodal Large Language Model Reasoning in Sports
Only Brains Align with Brains: Cross-Region Patterns Expose Limits of Normative Models
Enabling True Global Perception in State Space Models for Visual Tasks
Neural Predictor-Corrector: Solving Homotopy Problems with Reinforcement Learning
Efficient Learning on Large Graphs using a Densifying Regularity Lemma
When LLMs get significantly worse: A statistical approach to detect model degradations
Token-Guard: Towards Token-Level Hallucination Control via Self-Checking Decoding
GT-Space: Enhancing Heterogeneous Collaborative Perception with Ground Truth Feature Space
Efficient Reasoning with Balanced Thinking
PrismAudio: Decomposed Chain-of-Thought and Multi-dimensional Rewards for Video-to-Audio Generation
Beyond Structure: Invariant Crystal Property Prediction with Pseudo-Particle Ray Diffraction
Stackelberg Coupling of Online Representation Learning and Reinforcement Learning
Modality-free Graph In-context Alignment
A$^2$FM: An Adaptive Agent Foundation Model for Tool-Aware Hybrid Reasoning
EvolProver: Advancing Automated theorem proving by Evolving Formalized Problems via Symmetry and Difficulty
Flash-Searcher: Fast and Effective Web Agents via DAG-Based Parallel Execution
TaskCraft: Automated Generation of Agentic Tasks
Prima.cpp: Fast 30-70B LLM Inference on Heterogeneous and Low-Resource Home Clusters
ACADREASON: Exploring the Limits of Reasoning Models with Academic Research Problems
Sparse Attention Adaptation for Long Reasoning
STAR: Strategy-driven Automatic Jailbreak Red-teaming For Large Language Model
Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains
Towards Understanding The Calibration Benefits of Sharpness-Aware Minimization
Bridging Generalization Gap of Heterogeneous Federated Clients Using Generative Models
GradPruner: Gradient-guided Layer Pruning Enabling Efficient Fine-Tuning and Inference for LLMs
Change Point Localization and Inference in Dynamic Multilayer Networks
Synchronizing Probabilities in Model-Driven Lossless Compression
Seeing What’s Not There: Negation Understanding Needs More Than Training
R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning
Best-of-Infinity: Asymptotic Performance of Test-Time Compute
xLSTM Scaling Laws: Competitive Performance with Linear Time-Complexity
How Transformers Learn Causal Structures In-Context: Explainable Mechanism Meets Theoretical Guarantee
Breaking the SFT Plateau: Multimodal Structured Reinforcement Learning for Chart-to-Code Generation
LENS: Multi-level Evaluation of Multimodal Reasoning with Large Language Models
Rethinking LLM-as-a-Judge: Representation-as-a-Judge with Small Language Models via Semantic Capacity Asymmetry
Temporal Sparse Autoencoders: Leveraging the Sequential Nature of Language for Interpretability
Topology Matters in RTL Circuit Representation Learning
GTool: Graph Enhanced Tool Planning with Large Language Model
TumorChain: Interleaved Multimodal Chain-of-Thought Reasoning for Traceable Clinical Tumor Analysis
MoNE: Replacing Redundant Experts with Lightweight Novices for Structured Pruning of MoE
Property-Driven Protein Inverse Folding with Multi-Objective Preference Alignment
Unveiling the Mechanism of Continuous Representation Full-Waveform Inversion: A Wave Based Neural Tangent Kernel Framework
ImagenWorld: Stress-Testing Image Generation Models with Explainable Human Evaluation on Open-ended Real-World Tasks
Multimodal Dataset Distillation Made Simple by Prototype-guided Data Synthesis
Multifidelity Simulation-based Inference for Computationally Expensive Simulators
ExpVid: A Benchmark for Experiment Video Understanding & Reasoning
DRPO: Efficient Reasoning via Decoupled Reward Policy Optimization
Shoot First, Ask Questions Later? Building Rational Agents that Explore and Act Like People
TRIBE: TRImodal Brain Encoder for whole-brain fMRI response prediction
Visual Jigsaw Post-Training Improves MLLMs
Improving Extreme Wind Prediction with Frequency-Informed Learning
Overparametrization bends the landscape: BBP transitions at initialization in simple Neural Networks
Efficient Spatially-Variant Convolution via Differentiable Sparse Kernel Complex
Highly Efficient and Effective LLMs with Multi-Boolean Architectures
ELLMob: Event-Driven Human Mobility Generation with Self-Aligned LLM Framework
Boosted Trees on a Diet: Compact Models for Resource-Constrained Devices
Reconstructing KV Caches with Cross-Layer Fusion for Enhanced Transformers
Optimizing Data Augmentation through Bayesian Model Selection
FlexiCodec: A Dynamic Neural Audio Codec for Low Frame Rates
Multiple Streams of Knowledge Retrieval: Enriching and Recalling in Transformers
BaseReward: A Strong Baseline for Multimodal Reward Model
Bayesian Influence Functions for Hessian-Free Data Attribution
Unlearning during Training: Domain-Specific Gradient Ascent for Domain Generalization
Delay Flow Matching
Dynamic Reflections: Probing Video Representations with Text Alignment
Multiplicative Diffusion Models: Beyond Gaussian Latents
RainPro-8: An Efficient Deep Learning Model to Estimate Rainfall Probabilities Over 8 Hours
RL makes MLLMs see better than SFT
On Optimal Hyperparameters for Differentially Private Deep Transfer Learning
CoT Vectors: Transferring and Probing the Reasoning Mechanisms of LLMs
NRGPT: An Energy-based Alternative for GPT
LFQA-E: Carefully Benchmarking Long-form QA Evaluation
MAC-AMP: A Closed-Loop Multi-Agent Collaboration System for Multi-Objective Antimicrobial Peptide Design
Distilling and Adapting: A Topology-Aware Framework for Zero-Shot Interaction Prediction in Multiplex Biological Networks
VINCIE: Unlocking In-context Image Editing from Video
IGC-Net for conditional average potential outcome estimation over time
Online Conformal Prediction with Adversarial Feedback via Regret Minimization
Benchmarking Multi-Agent Reinforcement Learning in Power Grid Operations
FAST‑DIPS: Adjoint‑Free Analytic Steps and Hard‑Constrained Likelihood Correction for Diffusion‑Prior Inverse Problems
Learning to Parallel: Accelerating Diffusion Large Language Models via Adaptive Parallel Decoding
ExPO-HM: Learning to Explain-then-Detect for Hateful Meme Detection
Local Entropy Search over Descent Sequences for Bayesian Optimization
MedVR: Annotation-Free Medical Visual Reasoning via Agentic Reinforcement Learning
RoboPARA: Dual-Arm Robot Planning with Parallel Allocation and Recomposition Across Tasks
When to Retrain after Drift: A Data-Only Test of Post-Drift Data Size Sufficiency
Moving Beyond Medical Exams: A Clinician-Annotated Fairness Dataset of Real-World Tasks and Ambiguity in Mental Healthcare
Compositional Diffusion with Guided search for Long-Horizon Planning
MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head
To Sink or Not to Sink: Visual Information Pathways in Large Vision-Language Models
GeomMotif: A Benchmark for Arbitrary Geometric Preservation in Protein Generation
Bayesian Post Training Enhancement of Regression Models with Calibrated Rankings
Residual Feature Integration is Sufficient to Prevent Negative Transfer
UltraViCo: Breaking Extrapolation Limits in Video Diffusion Transformers
Generative Universal Verifier as Multimodal Meta-Reasoner
Image Quality Assessment for Embodied AI
Selective Expert Guidance for Effective and Diverse Exploration in Reinforcement Learning of LLMs
cadrille: Multi-modal CAD Reconstruction with Reinforcement Learning
PairFlow: Closed-Form Source-Target Coupling for Few-Step Generation in Discrete Flow Models
Interp3D: Correspondence-aware Interpolation for Generative Textured 3D Morphing
Massive Memorization with Hundreds of Trillions of Parameters for Sequential Transducer Generative Recommenders
RayI2P: Learning Rays for Image-to-Point Cloud Registration
RobustSpring: Benchmarking Robustness to Image Corruptions for Optical Flow, Scene Flow and Stereo
TimeSeg: An Information-Theoretic Segment-Wise Explainer for Time-Series Predictions
Regret-Guided Search Control for Efficient Learning in AlphaZero
InfoScan: Information-Efficient Visual Scanning via Resource-Adaptive Walks
EmotionHallucer: Evaluating Emotion Hallucinations in Multimodal Large Language Models
ClarifyVC: Clarifying Ambiguous Commands in Vehicle Control with a Hybrid Data Augmentation Pipeline
Sample-efficient and Scalable Exploration in Continuous-Time RL
Cat-PO: Cross-modal Adaptive Token-rewards for Preference Optimization in Truthful Multimodal LLMs
Fine-tuning Quantized Neural Networks with Zeroth-order Optimization
OpenFly: A COMPREHENSIVE PLATFORM FOR AERIAL VISION-LANGUAGE NAVIGATION
Beyond Multi-Token Prediction: Pretraining LLMs with Future Summaries
TriQDef: Disrupting Semantic and Gradient Alignment to Prevent Adversarial Patch Transferability in Quantized Neural Networks
ToProVAR: Efficient Visual Autoregressive Modeling via Tri-Dimensional Entropy-Aware Semantic Analysis and Sparsity Optimization
UNIVERSAL AND EFFICIENT LOADING BALANCING FOR RL TRAINING OF LARGE MULTIMODAL MODELS
Single Index Bandits: Generalized Linear Contextual Bandits with Unknown Reward Functions
Characterizing Deep Research: A Benchmark and Formal Definition
Bridging Fairness and Explainability: Can Input-Based Explanations Promote Fairness in Hate Speech Detection?
Visual Backdoor Attacks on MLLM Embodied Decision Making via Contrastive Trigger Learning
NAB: Neural Adaptive Binning for Sparse-View CT reconstruction
STAR-Bench: Probing Deep Spatio-Temporal Reasoning as Audio 4D Intelligence
Energy-Based Transformers are Scalable Learners and Thinkers
Compositional Neuro-Symbolic Concepts in Neural Activities
Stop Unnecessary Reflection: Training LRMs for Efficient Reasoning with Adaptive Reflection and Length Coordinated Penalty
Are Reasoning LLMs Robust to Interventions on their Chain-of-Thought?
Perception-Aware Policy Optimization for Multimodal Reasoning
Speculative Speculative Decoding
Constructive Distortion: Improving MLLMs with Attention-Guided Image Warping
Deep Ignorance: Filtering Pretraining Data Builds Tamper-Resistant Safeguards into Open-Weight LLMs
The Seismic Wavefield Common Task Framework
Obfuscated Activations Bypass LLM Latent-Space Defenses
PhyScensis: Physics-Augmented LLM Agents for Complex Physical Scene Generation
ParallelBench: Understanding the Trade-offs of Parallel Decoding in Diffusion LLMs
KaVa: Latent Reasoning via Compressed KV-Cache Distillation
Diffusion Alignment as Variataional Expectation-Maximization
Path Channels and Plan Extension Kernels: a Mechanistic Description of Planning in a Sokoban RNN
Sobolev Gradient Ascent for Optimal Transport: Barycenter Optimization and Convergence Analysis
FineNib: A Query Synthesizer For Static Analysis of Security Vulnerabilities
A Cognitive Process-Inspired Architecture for Subject-Agnostic Brain Visual Decoding
Composable Sparse Subnetworks via Maximum-Entropy Principle
Neodragon: Mobile Video Generation Using Diffusion Transformer
LearnIR: Learnable Posterior Sampling for Real-World Image Restoration
BRIDGE: Bi-level Reinforcement Learning for Dynamic Group Structure in Coalition Formation Games
When Style Breaks Safety: Defending LLMs Against Superficial Style Alignment
Text-to-3D by Stitching a Multi-view Reconstruction Network to a Video Generator
Best-of-Majority: Minimax-Optimal Strategy for Pass@k Inference Scaling
Continuous Space-Time Video Super-Resolution with 3D Fourier Fields
Divergence-Free Neural Networks with Application to Image Denoising
AMPED: Adaptive Multi-objective Projection for balancing Exploration and skill Diversification
TRACED: Transition-aware Regret Approximation with Co-learnability for Environment Design
Federated Learning of Quantile Inference under Local Differential Privacy
Knowledge Distillation as Decontamination? Revisiting the “Data Laundering” Concern
Learning-Augmented Moment Estimation on Time-Decay Models
Temporal Graph Thumbnail: Robust Representation Learning with Global Evolutionary Skeleton
Beyond Penalization: Diffusion-based Out-of-Distribution Detection and Selective Regularization in Offline Reinforcement Learning
Guided Speculative Inference for Efficient Test-Time Alignment of LLMs
EigenScore: OOD Detection using Posterior Covariance in Diffusion Models
Reforming the Mechanism: Editing Reasoning Patterns in LLMs with Circuit Reshaping
Learning Efficient and Interpretable Multi-Agent Communication
Do 3D Large Language Models Really Understand 3D Spatial Relationships?
SpaCE-10: A Comprehensive Benchmark for Multimodal Large Language Models in Compositional Spatial Intelligence
The Open Proof Corpus: A Large-Scale Study of LLM-Generated Mathematical Proofs
Language in the Flow of Time: Time-Series-Paired Texts Weaved into a Unified Temporal Narrative
DR-SAC: Distributionally Robust Soft Actor-Critic for Reinforcement Learning under Uncertainty
AtC: Aggregate-then-Calibrate for Human-centered Assessment
How Many Code and Test Cases Are Enough? Evaluating Test Cases Generation from a Binary-Matrix Perspective
DirMoE: Dirichlet-Routed Mixture of Experts
Through the Lens of Contrast: Self-Improving Visual Reasoning in VLMs
Layerwise Federated Learning for Heterogeneous Quantum Clients using Quorus
Inconsistency Biases in Dynamic Data Pruning
From Verifiable Dot to Reward Chain: Harnessing Verifiable Reference-based Rewards for Reinforcement Learning of Open-ended Generation
Low Rank Transformer for Multivariate Time Series Anomaly Detection and Localization
Thinking on the Fly: Test-Time Reasoning Enhancement via Latent Thought Policy Optimization
Pretraining with Re-parametrized Self-Attention: Unlocking Generalizationin SNN-Based Neural Decoding Across Time, Brains, and Tasks
Search Self-Play: Pushing the Frontier of Agent Capability without Supervision
Traceable Black-Box Watermarks For Federated Learning
Can Large Language Models Match the Conclusions of Systematic Reviews?
Heterogeneous Federated Fine-Tuning with Parallel One-Rank Adaptation
Alignment through Meta-Weighted Online Sampling: Bridging the Gap between Data Generation and Preference Optimization
THEMIS: Towards Holistic Evaluation of MLLMs for Scientific Paper Fraud Forensics
Horseshoe Splatting: Handling Structural Sparsity for Uncertainty-Aware Gaussian-Splatting Radiance Field Rendering
How Stable is the Next Token? A Geometric View of LLM Prediction Stability
Overthinking Reduction with Decoupled Rewards and Curriculum Data Scheduling
DragFlow: Unleashing DiT Priors with Region-Based Supervision for Drag Editing
MoRA: Missing Modality Low-Rank Adaptation for Visual Recognition
Towards Sampling Data Structures for Tensor Products in Turnstile Streams
AgentGym-RL: An Open-Source Framework to Train LLM Agents for Long-Horizon Decision Making via Multi-Turn RL
Action-aware Dynamic Pruning for Efficient Vision-Language-Action Manipulation
New Hybrid Fine-Tuning Paradigm for LLMs: Algorithm Design and Convergence Analysis Framework
Generating metamers of human scene understanding
Rethinking Causal Mask Attention for Vision-Language Inference
Catching the Details: Self-Distilled RoI Predictors for Fine-Grained MLLM Perception
Guaranteed Simply Connected Mesh Reconstruction from an Unorganized Point Cloud
On the identifiability of causal graphs with multiple environments
Language Agents for Hypothesis-driven Clinical Decision Making with Reinforcement Learning
Shrinking Proteins with Diffusion
Rewarding Doubt: A Reinforcement Learning Approach to Calibrated Confidence Expression of Large Language Models
JointAVBench: A Benchmark for Joint Audio-Visual Reasoning Evaluation
Dynamic Chunking for End-to-End Hierarchical Sequence Modeling
A Physics-Inspired Optimizer: Velocity Regularized Adam
Nonparametric Teaching of Attention Learners
Healthcare Insurance Fraud Detection via Continual Fiedler Vector Graph Model
AC-Foley: Reference-Audio-Guided Video-to-Audio Synthesis with Acoustic Transfer
The False Promise of Zero-Shot Super-Resolution in Machine-Learned Operators
Topological Anomaly Quantification for Semi-supervised Graph Anomaly Detection
NurValues: Real-World Nursing Values Evaluation for Large Language Models in Clinical Context
Full-Graph vs. Mini-Batch Training: Comprehensive Analysis from a Batch Size and Fan-Out Size Perspective
Bridging the Gap Between Promise and Performance for FP4 Quantization
Mordal: Automated Pretrained Model Selection for Vision Language Models
MedAgent-Pro: Towards Evidence-based Multi-modal Medical Diagnosis via Reasoning Agentic Workflow
Zero-shot Forecasting by Simulation Alone
Beyond Binary Preferences: A Principled Framework for Reward Modeling with Ordinal Feedback
On the Interaction of Compressibility and Adversarial Robustness
HardcoreLogic: Challenging Large Reasoning Models with Long-tail Logic Puzzle Games
DiffuDETR: Rethinking Detection Transformers with Diffusion Process
Toward Safer Diffusion Language Models: Discovery and Mitigation of Priming Vulnerability
Differentiable JPEG-based Input Perturbation for Knowledge Distillation Amplification via Conditional Mutual Information Maximization
Not All Models Suit Expert Offloading: On Local Routing Consistency of Mixture-of-Expert Models
Human Behavior Atlas: Benchmarking Unified Psychological And Social Behavior Understanding
Inductive Reasoning for Temporal Knowledge Graphs with Emerging Entities
From Single to Multi-Granularity: Toward Long-Term Memory Association and Selection of Conversational Agents
Asynchronous Denoising Diffusion Models for Aligning Text-to-Image Generation
Delta-XAI: A Unified Framework for Explaining Prediction Changes in Online Time Series Monitoring
Evoking User Memory: Personalizing LLM via Recollection-Familiarity Adaptive Retrieval
Harpoon: Generalised Manifold Guidance for Conditional Tabular Diffusion
Best-of-three-worlds Analysis for Dueling Bandits with Borda Winner
Robustness of Probabilistic Models to Low-Quality Data: A Multi-Perspective Analysis
Bound by semanticity: universal laws governing the generalization-identification tradeoff
How Text Quality Interventions Reshape Neural Scaling Laws for LLMs: Empirical Study
ForestPersons: A Large-Scale Dataset for Under-Canopy Missing Person Detection
Meta-Learning Theory-Informed Inductive Biases using Deep Kernel Gaussian Processes
SUSD: Structured Unsupervised Skill Discovery through State Factorization
ExoPredicator: Learning Abstract Models of Dynamic Worlds for Robot Planning
Efficient Adversarial Attacks on High-dimensional Offline Bandits
Beyond Accuracy: Are Time Series Foundation Models Well-Calibrated?
In-Context Compositional Q-Learning for Offline Reinforcement Learning
Context Learning for Multi-Agent Discussion
Multi-turn Evaluation of Anthropomorphic Behaviours in Large Language Models
Improving Online-to-Nonconvex Conversion for Smooth Optimization via Double Optimism
Aligning Deep Implicit Preferences by Learning to Reason Defensively
Pinet: Optimizing hard-constrained neural networks with orthogonal projection layers
GoT-R1: Unleashing Reasoning Capability of Autoregressive Visual Generation with Reinforcement Learning
Dual-Kernel Adapter: Expanding Spatial Horizons for Data-Constrained Medical Image Analysis
GHOST: Hallucination-Inducing Image Generation for Multimodal LLMs
SFT Doesn’t Always Hurt General Capabilities: Revisiting Domain-Specific Fine-Tuning in LLMs
Beware Untrusted Simulators -- Reward-Free Backdoor Attacks in Reinforcement Learning
MedGMAE: Gaussian Masked Autoencoders for Medical Volumetric Representation Learning
What's the plan? Metrics for implicit planning in LLMs and their application to rhyme generation
Scale-wise Distillation of Diffusion Models
$\boldsymbol{\partial^\infty}$-Grid: Differentiable Grid Representations for Fast and Accurate Solutions to Differential Equations
HalluGuard: Demystifying Data-Driven and Reasoning-Driven Hallucinations in LLMs
HSG-12M: A Large-Scale Dataset of Spatial Multigraphs from the Energy Spectra of non-Hermitian Crystals
WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs
EA3D: Event-Augmented 3D Diffusion for Generalizable Novel View Synthesis
Seeing Through the Brain: New Insights from Decoding Visual Stimuli with fMRI
ReFORM: Reflected Flows for On-support Offline RL via Noise Manipulation
Tracing the Traces: Latent Temporal Signals for Efficient and Accurate Reasoning
Softmax is not Enough (for Adaptive Conformal Classification)
Beyond Uniformity: Regularizing Implicit Neural Representations through a Lipschitz Lens
Any-Subgroup Equivariant Networks via Symmetry Breaking
SGD-Based Knowledge Distillation with Bayesian Teachers: Theory and Guidelines
Understanding Sensitivity of Differential Attention through the Lens of Adversarial Robustness
TROLL: Trust Regions Improve Reinforcement Learning for Large Language Models
Mapping Post-Training Forgetting in Language Models at Scale
Diffusion and Flow-based Copulas: Forgetting and Remembering Dependencies
MTVCraft: Tokenizing 4D Motion for Arbitrary Character Animation
CIMemories: A Compositional Benchmark For Contextual Integrity In LLMs
Gradient-Sign Masking for Task Vector Transport Across Pre-Trained Models
Never Saddle: Reparameterized Steepest Descent as Mirror Flow
GuardAlign: Robust Safety Alignment in Multimodal Large Language Models
Neural Hamilton--Jacobi Characteristic Flows for Optimal Transport
Towards Interpretable Visual Decoding with Attention to Brain Representations
Efficient Differentiable Contact Model with Long-range Influence
Pixel3DMM: Versatile Screen-Space Priors for Single-Image 3D Face Reconstruction
Confident Block Diagonal Structure-Aware Invariable Graph Completion for Incomplete Multi-view Clustering
QuoKA: Query-Oriented KV Selection for Efficient LLM Prefill
Directed Semi-Simplicial Learning with Applications to Brain Activity Decoding
Foundation Models for Causal Inference via Prior-Data Fitted Networks
PRISM-Physics: Causal DAG-Based Process Evaluation for Physics Reasoning
Causally Robust Preference Learning with Reasons
ATTS: Asynchronous Test-Time Scaling via Conformal Prediction
Deep-ICE: The first globally optimal algorithm for empirical risk minimization of two-layer maxout and ReLU networks
Hallucination Reduction with CASAL: Contrastive Activation Steering for Amortized Learning
PolicyFlow: Policy Optimization with Continuous Normalizing Flow in Reinforcement Learning
RM-R1: Reward Modeling as Reasoning
Calibrated Information Bottleneck for Trusted Multi-modal Clustering
How Far Can Unsupervised RLVR Scale LLM Training?
Task-Related Token Compression in Multimodal Large Language Models from an Explainability Perspective
DM4CT: Benchmarking Diffusion Models for Computed Tomography Reconstruction
Monotone Near-Zero-Sum Games
Patching Gaps In LLM Reasoning With Interventional Training
HeurekaBench: A Benchmarking Framework for AI Co-scientist
Adaptive Gaussian Expansion for On-the-fly Category Discovery
ARM-FM: Automated Reward Machines via Foundation Models for Compositional Reinforcement Learning
Cannistraci-Hebb Training on Ultra-Sparse Spiking Neural Networks
Animating the Uncaptured: Humanoid Mesh Animation with Video Diffusion Models
Optimal Robust Subsidy Policies for Irrational Agent in Principal-Agent MDPs
Quantized Visual Geometry Grounded Transformer
The Counting Power of Transformers
GarmentGPT: Compositional Garment Pattern Generation via Discrete Latent Tokenization
Hessian-Enhanced Token Attribution (HETA): Interpreting Autoregressive LLMs
Behavior Learning
From Assistant to Independent Developer — Are GPTs Ready for Software Development?
Landscape of Thoughts: Visualizing the Reasoning Process of Large Language Models
Improving Set Function Approximation with Quasi-Arithmetic Neural Networks
Decoupling the Class Label and the Target Concept in Machine Unlearning
UniTrack: Differentiable Graph Representation Learning for Multi-Object Tracking
SCAD: Super-Class-Aware Debiasing for Long-Tailed Semi-Supervised Learning
More Thought, Less Accuracy? On the Dual Nature of Reasoning in Vision-Language Models
CHROMA: Consistent Harmonization of Multi-View Appearance via Bilateral Grid Prediction
Randomized Antipodal Search Done Right for Data Pareto Improvement of LLM Unlearning
Parallel Sampling from Masked Diffusion Models via Conditional Independence Testing
MambaVoiceCloning: Efficient and Expressive Text-to-Speech via State-Space Modeling and Diffusion Control
ConvRec-R1: Training LLM-based Conversational Recommender Systems with Reinforcement Learning
What Matters for Bioacoustic Encoding
Zephyrus: An Agentic Framework for Weather Science
TAO-Attack: Toward Advanced Optimization-Based Jailbreak Attacks for Large Language Models
Spectral Attention Steering for Prompt Highlighting
Ensembling Pruned Attention Heads For Uncertainty-Aware Efficient Transformers
Divide, Harmonize, Then Conquer It: Shooting Multi-Commodity Flow Problems with Multimodal Language Models
Score-based Greedy Search for Structure Identification of Partially Observed Linear Causal Models
Entering the Era of Discrete Diffusion Models: A Benchmark for Schrödinger Bridges and Entropic Optimal Transport
Instance-wise Adaptive Scheduling via Derivative-Free Meta-Learning
When Greedy Wins: Emergent Exploitation Bias in Meta-Bandit LLM Training
MesaNet: Sequence Modeling by Locally Optimal Test-Time Training
Computing Equilibrium beyond Unilateral Deviation
ViPRA: Video Prediction for Robot Actions
Music Flamingo: Scaling Music Understanding in Audio Language Models
UALM: Unified Audio Language Model for Understanding, Generation and Reasoning
Mean Estimation from Coarse Data: Characterizations and Efficient Algorithms
Fewer Battles, More Gain: An Information-Efficient Framework for Arena-based LLM Evaluation
Mechanistic Detection and Mitigation of Hallucination in Large Reasoning Models
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM
Lookup multivariate Kolmogorov-Arnold Networks
Maximizing Incremental Information Entropy for Contrastive Learning
Discrete Diffusion for Reflective Vision-Language-Action Models in Autonomous Driving
Attributing Response to Context: A Jensen–Shannon Divergence Driven Mechanistic Study of Context Attribution in Retrieval-Augmented Generation
Aligner, Diagnose Thyself: A Meta-Learning Paradigm for Fusing Intrinsic Feedback in Preference Alignment
DASH: Deterministic Attention Scheduling for High-throughput Reproducible LLM Training
Optimas: Optimizing Compound AI Systems with Globally Aligned Local Rewards
Privacy-Protected Causal Survival Analysis Under Distribution Shift
Continual Low-Rank Adapters for LLM-based Generative Recommender Systems
Faster Gradient Methods for Highly-smooth Stochastic Bilevel Optimization
Toward Principled Flexible Scaling for Self-Gated Neural Activation
Small Drafts, Big Verdict: Information-Intensive Visual Reasoning via Speculation
What matters for Representation Alignment: Global Information or Spatial Structure?
FSPO: Few-Shot Optimization of Synthetic Preferences Effectively Personalizes to Real Users
HYPER: A Foundation Model for Inductive Link Prediction with Knowledge Hypergraphs
Learning an Image Editing Model without Image Editing Pairs
RLAD: Training LLMs to Discover Abstractions for Solving Reasoning Problems
MLE-Smith: Scaling MLE Tasks with Automated Multi-agent Pipeline
Generalized Parallel Scaling with Interdependent Generations
Paradigm Shift of GNN Explainer from Label Space to Prototypical Representation Space
Subspace Kernel Learning on Tensor Sequences
Streaming Autoregressive Video Generation via Diagonal Distillation
Greater than the Sum of Its Parts: Building Substructure into Protein Encoding Models
Diagnosing and Improving Diffusion Models by Estimating Optimal Loss Value
No, of Course I Can! Deeper Fine-Tuning Attacks That Bypass Token-Level Safety Mechanisms
NewtonGen: Physics-consistent and Controllable Text-to-Video Generation via Neural Newtonian Dynamics
TVTSyn: Content-Synchronous Time-Varying Timbre for Streaming Voice Conversion and Anonymization
Controlling Repetition in Protein Language Models
Temporal Generalization: A Reality Check
Retrieval-of-Thought: Efficient Reasoning via Reusing Thoughts
CTRL&SHIFT: High-quality Geometry-Aware Object Manipulation in Visual Generation
CreatiDesign: A Unified Multi-Conditional Diffusion Transformer for Creative Graphic Design
LoRA meets Riemannion: Muon Optimizer for Parametrization-independent Low-Rank Adapters
SpectralGCD: Spectral Concept Selection and Cross-modal Representation Learning for Generalized Category Discovery
Global-Recent Semantic Reasoning on Dynamic Text-Attributed Graphs with Large Language Models
Quantitative Bounds for Length Generalization in Transformers
Mirror Flow Matching with Heavy-Tailed Priors for Generative Modeling on Convex Domains
Buckingham $\pi$-Invariant Test‑Time Projection for Robust PDE Surrogate Modeling
UNITE: Universal kNowledge Integration from Task-specific Experts
WIMFRIS: WIndow Mamba Fusion and Parameter Efficient Tuning for Referring Image Segmentation
GRO-RAG: Gradient-aware Re-rank Optimization for Multi-source Retrieval-Augmented Generation
InfoDet: A Dataset for Infographic Element Detection
Libra: Effective yet Efficient Load Balancing for Large-scale MoE Inference
FaSTA*: Fast-Slow Toolpath Agent with Subroutine Mining for Efficient Multi-turn Image Editing
InnovatorBench: Evaluating Agents’ Ability to Conduct Innovative AI Research
Evaluating and Improving Cultural Awareness of Reward Models for LLM Alignment
ARMOR: Aligning Secure and Safe Large Language Models via Meticulous Reasoning
Robust Adversarial Attacks Against Unknown Disturbance via Inverse Gradient Sample
A2ASecBench: A Protocol-Aware Security Benchmark for Agent-to-Agent Multi-Agent Systems
Fair Graph Machine Learning under Adversarial Missingness Processes
RESFL: An Uncertainty-Aware Framework for Responsible Federated Learning by Balancing Privacy, Fairness and Utility
Rectified Decoupled Dataset Distillation: A Closer Look for Fair and Comprehensive Evaluation
From Seeing to Experiencing: Scaling Navigation Foundation Models with Reinforcement Learning
UrbanVerse: Scaling Urban Simulation by Watching City-Tour Videos
VLMgineer: Vision-Language Models as Robotic Toolsmiths
Doxing via the Lens: Revealing Location-related Privacy Leakage on Multi-modal Large Reasoning Models
Generalizable Coarse-to-Fine Robot Manipulation via Language-Aligned 3D Keypoints
Reliable Weak-to-Strong Monitoring of LLM Agents
Measuring Physical-World Privacy Awareness of Large Language Models: An Evaluation Benchmark
SelvaBox: A high‑resolution dataset for tropical tree crown detection
CogMoE: Signal-Quality–Guided Multimodal MoE for Cognitive Load Prediction
CLARC: C/C++ Benchmark for Robust Code Search
Local Linear Attention: An Optimal Interpolation of Linear and Softmax Attention For Test-Time Regression
CircuitSense: A Hierarchical Circuit System Benchmark Bridging Visual Comprehension and Symbolic Reasoning in Engineering Design Process
From Reproduction to Replication: Evaluating Research Agents with Progressive Code Masking
Credit-Budgeted ICPC-Style Coding: When LLM Agents Must Pay for Every Decision
Refining Hybrid Genetic Search for CVRP via Reinforcement Learning-Finetuned LLM
GRAM-DTI: Adaptive Multimodal Representation Learning for Drug–Target Interaction Prediction
VoMP: Predicting Volumetric Mechanical Property Fields
Learning Admissible Heuristics for A*: Theory and Practice
Group Representational Position Embedding
Reinforcement Unlearning via Group Relative Policy Optimization
CortiLife: A Unified Framework for Cortical Representation Learning across the Lifespan
Frayed RoPE and Long Inputs: A Geometric Perspective
How Far Are LLMs from Professional Poker Players? Revisiting Game-Theoretic Reasoning with Agentic Tool Use
CLUTCH: Contextualized Language model for Unlocking Text-Conditioned Hand motion modelling in the wild
Beyond Softmax and Entropy: $f$-Regularized Policy Gradients with Coupled Parametrizations
Bilevel Optimization with Lower-Level Uniform Convexity: Theory and Algorithm
Vulcan: Crafting Compact Class-Specific Vision Transformers For Edge Intelligence
Following the Navigation: Enhancing Small Language Models Contextual Reasoning with LLM Guidance
VSF: Simple, Efficient, and Effective Negative Guidance in Few-Step Image Generation Models By Value Sign Flip
ProfBench: Multi-Domain Rubrics requiring Professional Knowledge to Answer and Judge
Fair Classification by Direct Intervention on Operating Characteristics
Latent Particle World Models: Self-supervised Object-centric Stochastic Dynamics Modeling
Out of the Shadows: Exploring a Latent Space for Neural Network Verification
From Natural Alignment to Conditional Controllability in Multimodal Dialogue
Hierarchical Entity-centric Reinforcement Learning with Factored Subgoal Diffusion
QPrompt-R1: Real-Time Reasoning for Domain-Generalized Semantic Segmentation via Group-Relative Query Alignment
LiveMoments: Reselected Key Photo Restoration in Live Photos via Reference-guided Diffusion
Taming Polysemanticity in LLMs: Theory-Grounded Feature Recovery via Sparse Autoencoders
Decision Aggregation under Quantal Response
BOLT: Decision‑Aligned Distillation and Budget-Aware Routing for Constrained Multimodal QA on Robots
CARL: Camera-Agnostic Representation Learning for Spectral Image Analysis
DistDF: Time-series Forecasting Needs Joint-distribution Wasserstein Alignment
IDER: IDEMPOTENT EXPERIENCE REPLAY FOR RELIABLE CONTINUAL LEARNING
Adaptive Attacks on Trusted Monitors Subvert AI Control Protocols
Learning to Generate Unit Test via Adversarial Reinforcement Learning
SpatialHand: Generative Object Manipulation from 3D Prespective
LLM-as-a-Prophet: Understanding Predictive Intelligence with Prophet Arena
Improving Block-Wise LLM Quantization by 4-bit Block-Wise Optimal Float (BOF4): Analysis and Variations
Randomization Boosts KV Caching, Learning Balances Query Load: A Joint Perspective
UniSplat: Unified Spatio-Temporal Fusion via 3D Latent Scaffolds for Dynamic Driving Scene Reconstruction
LLM as an Algorithmist: Enhancing Anomaly Detectors via Programmatic Synthesis
OmniField: Conditioned Neural Fields for Robust Multimodal Spatiotemporal Learning
Detecting Temporal Misalignment Attacks in Multimodal Fusion for Autonomous Driving
PetaGAIL++: Utility Optimized Private Trajectory Generation with Imitation Learning
SEED-SET: Scalable Evolving Experimental Design for System-level Ethical Testing
From Gradient Volume to Shapley Fairness: Towards Fair Multi-Task Learning
Uncertainty Matters in Dynamic Gaussian Splatting for Monocular 4D Reconstruction
MoE-GS: Mixture of Experts for Dynamic Gaussian Splatting
Shift-Tolerant Allocation via Black-Litterman Using Conditional Diffusion Estimates
DESIGNER: Design-Logic-Guided Multidisciplinary Data Synthesis for LLM Reasoning
Riesz Neural Operator for Solving Partial Differential Equations
TTT3R: 3D Reconstruction as Test-Time Training
Human3R: Everyone Everywhere All at Once
Curvature-Guided Task Synergy for Skeleton based Temporal Action Segmentation
On the Expressiveness of State Space Models via Temporal Logics
PM-KVQ: Progressive Mixed-precision KV Cache Quantization for Long-CoT LLMs
Unified In-Context Video Editing
Special Unitary Parameterized Estimators of Rotation
CR-Net: Scaling Parameter-Efficient Training with Cross-Layer Low-Rank Structure
Aurora: Towards Universal Generative Multimodal Time Series Forecasting
Label Smoothing Improves Machine Unlearning
Abstracting Robot Manipulation Skills via Mixture-of-Experts Diffusion Policies
High-dimensional limit theorems for SGD: Momentum and Adaptive Step-sizes
Watermark-based Attribution of AI-Generated Images
Rethinking LLM Reasoning: From Explicit Trajectories to Latent Representations
SARM: Stage-Aware Reward Modeling for Long Horizon Robot Manipulation
SealQA: Raising the Bar for Reasoning in Search-Augmented Language Models
FaithCoT-Bench: Benchmarking Instance-Level Faithfulness of Chain-of-Thought Reasoning
SurvHTE-Bench: A Benchmark for Heterogeneous Treatment Effect Estimation in Survival Analysis
Interpolation-Based Conditioning of Flow Matching Models for Bioisosteric Ligand Design
Data Selection for LLM Alignment Using Fine-Grained Preferences
MindPilot: Closed-loop Visual Stimulation Optimization for Brain Modulation with EEG-guided Diffusion
LoRAGen: Structure-Aware Weight Space Learning for LoRA Generation
Solving the Granularity Mismatch: Hierarchical Preference Learning for Long-Horizon LLM Agents
Multilingual Routing in Mixture-of-Experts
On Fairness of Task Arithmetic: The Role of Task Vectors
DHG-Bench: A Comprehensive Benchmark for Deep Hypergraph Learning
Deforming Videos to Masks: Flow Matching for Referring Video Segmentation
Directional Convergence, Benign Overfitting of Gradient Descent in leaky ReLU two-layer Neural Networks
FAME: $\underline{F}$ormal $\underline{A}$bstract $\underline{M}$inimal $\underline{E}$xplanation for neural networks
MotionStream: Real-Time Video Generation with Interactive Motion Controls
Multiplayer Nash Preference Optimization
DiffusionNFT: Online Diffusion Reinforcement with Forward Process
Dual-Space Smoothness for Robust and Balanced LLM Unlearning
VideoMathQA: Benchmarking Mathematical Reasoning via Multimodal Understanding in Video
DeepRAG: Thinking to Retrieve Step by Step for Large Language Models
Inference-Time Personalized Safety Control via Paired Difference-in-Means Intervention
From Curiosity to Caution: Mitigating Reward Hacking for Best-of-$N$ with Pessimism
Tree Search for LLM Agent Reinforcement Learning
TrustJudge: Inconsistencies of LLM-as-a-Judge and How to Alleviate Them
Synthesizing High-Quality Visual Question Answering from Medical Documents with Generator-Verifier LMMs
A New Paradigm for Genome-wide DNA Methylation Prediction Without Methylation Input
Reverse Distillation: Disentangling and Scaling Protein Language Model Representations
On Measuring Influence in Avoiding Undesired Future
RelayFormer: A Unified Local-Global Attention Framework for Scalable Image and Video Manipulation Localization
TokUR: Token-Level Uncertainty Estimation for Large Language Model Reasoning
PAS: Estimating the target Accuracy before domain adaptation
MMSearch-Plus: Benchmarking Provenance-Aware Search for Multimodal Browsing Agents
Zeros can be Informative: Masked Binary U-Net for Image Segmentation on Tensor Cores
Convergence Dynamics of Over-Parameterized Score Matching for a Single Gaussian
Kaleidoscope: In-language Exams for Massively Multilingual Vision Evaluation
DiCache: Let Diffusion Model Determine Its Own Cache
PoseX: AI Defeats Physics-based Methods on Protein Ligand Cross-Docking
Dual Goal Representations
When Priors Backfire: On the Vulnerability of Unlearnable Examples to Pretraining
Characterizing Pattern Matching and Its Limits on Compositional Task Structures
Finite-Time Convergence Analysis of ODE-based Generative Models for Stochastic Interpolants
Light-X: Generative 4D Video Rendering with Camera and Illumination Control
StPR: Spatiotemporal Preservation and Routing for Exemplar-Free Video Class-Incremental Learning
Unlocking the Essence of Beauty: Advanced Aesthetic Reasoning with Relative-Absolute Policy Optimization
Stacked from One: Multi-Scale Self-Injection for Context Window Extension
Automatic and Structure-Aware Sparsification of Hybrid Neural ODEs with Application to Glucose Prediction
Distributional Vision-Language Alignment by Cauchy-Schwarz Divergence
Decoupling Positional and Symbolic Attention in Transformers
Optimal transport unlocks end-to-end learning for single-molecule localization
Summaries as Centroids for Interpretable and Scalable Text Clustering
Architecture-Agnostic Test-Time Adaptation via Backprop-Free Embedding Alignment
DiffInk: Glyph- and Style-Aware Latent Diffusion Transformer for Text to Online Handwriting Generation
Understanding the Learning Phases in Self-Supervised Learning via Critical Periods
DreamCS: Geometry-Aware Text-to-3D Generation with Unpaired 3D Reward Supervision
GDPval: Evaluating AI Model Performance on Real-World Economically Valuable Tasks
Dragging with Geometry: From Pixels to Geometry-Guided Image Editing
Enhancing Shortcut Models with Cumulative Self-Consistency Loss for One-Step Diffusion
Streaming Drag-Oriented Interactive Video Manipulation: Drag Anything, Anytime!
Bridging Draft Policy Misalignment: Group Tree Optimization for Speculative Decoding
Scaling Attention via Feature Sparsity
From Spatial to Actions: Grounding Vision-Language-Action Model in Spatial Foundation Priors
Complexity Analysis of Normalizing Constant Estimation: from Jarzynski Equality to Annealed Importance Sampling and beyond
Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning
CineTrans: Learning to Generate Videos with Cinematic Transitions via Masked Diffusion Models
PonderLM: Pretraining Language Models to Ponder in Continuous Space
Proximal Diffusion Neural Sampler
AnyTouch 2: General Optical Tactile Representation Learning For Dynamic Tactile Perception
GoldenStart: Q-Guided Priors and Entropy Control for Distilling Flow Policies
KGOT: Unified Knowledge Graph and Optimal Transport Pseudo-Labeling for Molecule-Protein Interaction Prediction
Unsupervised Learning of Efficient Exploration: Pre-training Adaptive Policies via Self-Imposed Goals
On the Spectral Differences Between NTK and CNTK and Their Implications for Point Cloud Recognition
Non-Asymptotic Analysis of Efficiency in Conformalized Regression
CerebraGloss: Instruction-Tuning a Large Vision-Language Model for Fine-Grained Clinical EEG Interpretation
Beyond Text-to-Image: Liberating Generation with a Unified Discrete Diffusion Model
Multi-objective Large Language Model Alignment with Hierarchical Experts
SVD Provably Denoises Nearest Neighbor Data
Assembling the Mind's Mosaic: Towards EEG Semantic Intent Decoding
Echoes as Anchors: Probabilistic Costs and Attention Refocusing in LLM Reasoning
Is Graph Unlearning Ready for Practice? A Benchmark on Efficiency, Utility, and Forgetting
Repurposing Foundation Model for Generalizable Medical Time Series Classification
Fast and Stable Riemannian Metrics on SPD Manifolds via Cholesky Product Geometry
Enhancing Trustworthiness of Fine-Tuned LLMs via Regularized Subset Selection
IterResearch: Rethinking Long-Horizon Agents via Markovian State Reconstruction
Scaling Agents via Continual Pre-training
S2R-HDR: A Large-Scale Rendered Dataset for HDR Fusion
Expanding the Capability Frontier of LLM Agents with ZPD-Guided Data Synthesis
Towards Prompt-Robust Machine-Generated Text Detection
ADM-v2: Pursuing Full-Horizon Roll-out in Dynamics Models for Offline Policy Learning and Evaluation
Scaling Synthetic Task Generation for Agents via Exploration
SesaHand: Enhancing 3D Hand Reconstruction via Controllable Generation with Semantic and Structural Alignment
Frequency-aware Dynamic Gaussian Splatting
Efficient Discriminative Joint Encoders for Large Scale Vision-Language Reranking
Hey, That's My Model! Introducing Chain & Hash, An LLM Fingerprinting Technique
Activation Steering for LLM Alignment via a Unified ODE-Based Framework
Automated Interpretability Metrics Do Not Distinguish Trained and Random Transformers
Understanding and Relaxing the Limitations of Transformers for Linear Algebra
A Unification of Discrete, Gaussian, and Simplicial Diffusion
Wavelet Predictive Representations for Non-Stationary Reinforcement Learning
Thicker and Quicker: The Jumbo Token for Fast Plain Vision Transformers
Towards Text-Mask Consistency in Medical Image Segmentation
Exploratory Memory-Augmented LLM Agent via Hybrid On- and Off-Policy Optimization
VPI-Bench: Visual Prompt Injection Attacks for Computer-Use Agents
SafeDialBench: A Fine-Grained Safety Evaluation Benchmark for Large Language Models in Multi-Turn Dialogues with Diverse Jailbreak Attacks
Mitigating Noise Shift in Denoising Generative Models with Noise Awareness Guidance
Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling
Rethinking Layer Relevance in Large Language Models Beyond Cosine Similarity
Interleave-VLA: Enhancing Robot Manipulation with Image-Text Interleaved Instructions
Task-Adaptive Parameter-Efficient Fine-Tuning for Weather Foundation Models
Experience-based Knowledge Correction for Robust Planning in Minecraft
FlowCast: Advancing Precipitation Nowcasting with Conditional Flow Matching
Rating Quality of Diverse Time Series Data by Meta-learning from LLM Judgment
Anchor Frame Bridging for Coherent First-Last Frame Video Generation
Ref-Adv: Exploring MLLM Visual Reasoning in Referring Expression Tasks
DiffuGuard: How Intrinsic Safety is Lost and Found in Diffusion Large Language Models
ARES: Multimodal Adaptive Reasoning via Difficulty-Aware Token-Level Entropy Shaping
Revisual-R1: Advancing Multimodal Reasoning From Optimized Cold Start to Staged Reinforcement Learning
Emergence of Spatial Representation in an Actor-Critic Agent with Hippocampus-Inspired Sequence Generator
Goal-Aware Identification and Rectification of Misinformation in Multi-Agent Systems
Sample More to Think Less: Group Filtered Policy Optimization for Concise Reasoning
Generalized Spherical Neural Operators: Green’s Function Formulation
Statistical Guarantees in the Search for Less Discriminatory Algorithms
MMSU: A Massive Multi-task Spoken Language Understanding and Reasoning Benchmark
EmotionThinker: Prosody-Aware Reinforcement Learning for Explainable Speech Emotion Reasoning
TwinVLA: Data-Efficient Bimanual Manipulation with Twin Single-Arm Vision-Language-Action Models
ReTabAD: A Benchmark for Restoring Semantic Context in Tabular Anomaly Detection
Thought Branches: Interpreting LLM Reasoning Requires Resampling
Scaling Goal-conditioned Reinforcement Learning with Multistep Quasimetric Distances
Automated Stateful Specialization for Adaptive Agent Systems
Beyond Match Maximization and Fairness: Retention-Objectified Two-Sided Matching
Dual-Path Condition Alignment for Diffusion Transformers
Unlearning Isn't Invisible: Detecting Unlearning Traces in LLMs from Model Outputs
Safety Mirage: How Spurious Correlations Undermine VLM Safety Fine-Tuning and Can Be Mitigated by Machine Unlearning
Proper Velocity Neural Networks
Transferable and Stealthy Adversarial Attacks on Large Vision-Language Models
All Patches Matter, More Patches Better: Enhance AI-Generated Image Detection via Panoptic Patch Learning
DeMo: Decoupled Momentum Optimization
SceneCOT: Eliciting Chain-of-Thought Reasoning in 3D Scenes
Noisy but Valid: Robust Statistical Evaluation of LLMs with Imperfect Judges
Preserve and Sculpt: Manifold-Aligned Fine-tuning of Vision-Language Models for Few-Shot Learning
Beyond Pairwise: Empowering LLM Alignment With (Ranked) Choice Modeling
GhostEI-Bench: Do Mobile Agent Resilience to Environmental Injection in Dynamic On-Device Environments?
LeanForPhysics: Comprehensive Reasoning Framework for University-level Physics in Lean4
Training Deep Normalization-Free Spiking Neural Networks with Lateral Inhibition.
Exploiting Low-Dimensional Manifold of Features for Few-shot Whole Slide Image Classification
Beyond Markovian Drifts: Action-Biased Geometric Walks with Memory for Personalized Summarization
Anatomy-aware Representation Learning for Medical Ultrasound
Early Signs of Steganographic Capabilities in Frontier LLMs
Safe Continuous-time Multi-Agent Reinforcement Learning via Epigraph Form
Test-Time Accuracy-Cost Control in Neural Simulators via Recurrent-Depth
PAT3D: Physics-Augmented Text-to-3D Scene Generation
C-Voting: Confidence-Based Test-Time Voting without Explicit Energy Functions
Video Unlearning via Low-Rank Refusal Vector
Learning Boltzmann Generators via Constrained Mass Transport
CSRv2: Unlocking Ultra-Sparse Embeddings
SpeakerVid-5M: A Large-Scale High-Quality Dataset for Audio-Visual Dyadic Interactive Human Generation
Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent Collaboration
InftyThink: Breaking the Length Limits of Long-Context Reasoning in Large Language Models
Towards Strategic Persuasion with Language Models
Rainbow Padding: Mitigating Early Termination in Instruction-Tuned Diffusion LLMs
VerifyBench: Benchmarking Reference-based Reward Systems for Large Language Models
On The Expressive Power of GNN Derivatives
Rethinking Benign Relearning: Syntax as the Hidden Driver of Unlearning Failures
A2D: Any-Order, Any-Step Safety Alignment for Diffusion Language Models
TopoFormer: Topology Meets Attention for Graph Learning
Test-Time Scaling with Reflective Generative Model
The Polar Express: Optimal Matrix Sign Methods and their Application to the Muon Algorithm
Learning Retrieval Models with Sparse Autoencoders
Exploring Cross-Modal Flows for Few-Shot Learning
MRMR: A Realistic and Expert-Level Multidisciplinary Benchmark for Reasoning-Intensive Multimodal Retrieval
Towards Privacy-Guaranteed Label Unlearning in Vertical Federated Learning: Few-Shot Forgetting Without Disclosure
DES-LOC: Desynced Low Communication Adaptive Optimizers for Foundation Models
AutoQVLA: Not All Channels Are Equal in Vision-Language-Action Model's Quantization
Fresh in memory: Training-order recency is linearly encoded in language model activations
MT-DAO: Multi-Timescale Distributed Adaptive Optimizers with Local Updates
Rethinking Data Curation in LLM Training: Online Reweighting Offers Better Generalization than Offline Methods
Exo-Plore: Exploring Exoskeleton Control Space through Human-aligned Simulation
Correlated Policy Optimization in Multi-Agent Subteams
Spatial Reasoning with Vision-Language Models in Ego-Centric Multi-View Scenes
Seeing Across Views: Benchmarking Spatial Reasoning of Vision-Language Models in Robotic Scenes
WIMLE: Uncertainty‑Aware World Models with IMLE for Sample‑Efficient Continuous Control
SAGE: Spatial-visual Adaptive Graph Exploration for Visual Place Recognition
FlowNIB: An Information Bottleneck Analysis of Bidirectional vs. Unidirectional Language Models
Grounding and Enhancing Informativeness and Utility in Dataset Distillation
Source-Guided Flow Matching
Disentangled Representation Learning for Parametric Partial Differential Equations
InfBaGel: Human-Object-Scene Interaction Generation with Dynamic Perception and Iterative Refinement
Adversarial Déjà Vu: Jailbreak Dictionary Learning for Stronger Generalization to Unseen Attacks
Forge: Compiling a Unified Abstraction into Scalable Kernels for Linear Attention
NC-Bench and NCfold: A Benchmark and Closed-Loop Framework for RNA Non-Canonical Base-Pair Prediction
DispViT: Direct Stereo Disparity Regression with a Single-Stream Vision Transformer
Multi-modal Data Spectrum: Multi-modal Datasets are Multi-dimensional
Improving Code Localization with Repository Memory
Building spatial world models from sparse transitional episodic memories
Test-Time Training Done Right
Learning Concept Bottleneck Models from Mechanistic Explanations
The CoT Encyclopedia: Analyzing, Predicting, and Controlling how a Reasoning Model will Think
TRAJECT-Bench:A Trajectory-Aware Benchmark for Evaluating Agentic Tool Use
Better Learning-Augmented Spanning Tree Algorithms via Metric Forest Completion
RAIN-Merging: A Gradient-Free Method to Enhance Instruction Following in Large Reasoning Models with Preserved Thinking Format
LiTo: Surface Light Field Tokenization
Towards Self-Robust LLMs: Intrinsic Prompt Noise Resistance via CoIPO
D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI
Should We Still Pretrain Encoders with Masked Language Modeling?
Toward Practical Equilibrium Propagation: Brain-inspired Recurrent Neural Network with Feedback Regulation and Residual Connections
MRAD: Zero-Shot Anomaly Detection with Memory-Driven Retrieval
Unified Privacy Guarantees for Decentralized Learning via Matrix Factorization
SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning
CIAR: Interval-based Collaborative Decoding for Image Generation Acceleration
Choices Speak Louder than Questions
Self-Improving Vision-Language-Action Models with Data Generation via Residual RL
LLMs as Rules Oracles: Exploring Real-World Multimodal Reasoning in Tabletop Strategy Game Environments
VER: Vision Expert Transformer for Robot Learning via Foundation Distillation and Dynamic Routing
KANO: Kolmogorov-Arnold Neural Operator
A Genetic Algorithm for Navigating Synthesizable Molecular Spaces
TimeOmni-1: Incentivizing Complex Reasoning with Time Series in Large Language Models
HistoPrism: Unlocking Functional Pathway Analysis from Pan-Cancer Histology via Gene Expression Prediction
SCOPED: Score–Curvature Out-of-distribution Proximity Evaluator for Diffusion
Differentiable Model Predictive Control on the GPU
CLoD-GS: Continuous Level-of-Detail via 3D Gaussian Splatting
Autoregressive Visual Decoding from EEG Signals
Critical Confabulation: Can LLMs Hallucinate for Social Good?
Read the Room: Video Social Reasoning with Mental-Physical Causal Chains
Parameters vs. Context: Fine-Grained Control of Knowledge Reliance in Language Models
VideoAnchor: Reinforcing Subspace-Structured Visual Cues for Coherent Visual-Spatial Reasoning
Towards Robust Real-World Multivariate Time Series Forecasting: A Unified Framework for Dependency, Asynchrony, and Missingness
Structured Reasoning for LLMs: A Unified Framework for Efficiency and Explainability
OSIRIS: Bridging Analog Circuit Design and Machine Learning with Scalable Dataset Generation
Iterative Training of Physics-Informed Neural Networks with Fourier-enhanced Features
Directional Sheaf Hypergraph Networks: Unifying Learning on Directed and Undirected Hypergraphs
H$^3$DP: Triply‑Hierarchical Diffusion Policy for Visuomotor Learning
LongLive: Real-time Interactive Long Video Generation
SANA-Video: Efficient Video Generation with Block Linear Diffusion Transformer
Wiki-R1: Incentivizing Multimodal Reasoning for Knowledge-based VQA via Data and Sampling Curriculum
Unleashing Perception-Time Scaling to Multimodal Reasoning Models
On the Convergence Behavior of Preconditioned Gradient Descent Toward the Rich Learning Regime
Reasoning on Time-Series for Financial Technical Analysis
SwiReasoning: Switch-Thinking in Latent and Explicit for Pareto-Superior Reasoning LLMs
SongEcho: Cover Song Generation via Instance-Adaptive Element-wise Linear Modulation
NeuralOS: Towards Simulating Operating Systems via Neural Generative Models
Autoencoding-Free Context Compression for LLMs via Contextual Semantic Anchors
ICPO: Provable and Practical In-Context Policy Optimization for Test-Time Scaling
Guidance Matters: Rethinking the Evaluation Pitfall for Text-to-Image Generation
MASS: MoErging through Adaptive Subspace Selection
Evolutionary Caching to Accelerate Your Off-the-Shelf Diffusion Model
Designing Time Series Experiments in A/B Testing with Transformer Reinforcement Learning
Implicit Inversion turns CLIP into a Decoder
Next-ToBE: Probabilistic Next Token-Bag Exploitation for Activating Anticipatory Capacity in LLMs
Distilling the Thought, Watermarking the Answer: A Principle Semantic Guided Watermark for Reasoning Large Language Models
Language Models are Injective and Hence Invertible
Toward Universal and Transferable Jailbreak Attacks on Vision-Language Models
KeepLoRA: Continual Learning with Residual Gradient Adaptation
Purrception: Variational Flow Matching for Vector-Quantized Image Generation
A Benchmark for Deep Information Synthesis
Self-Improving Loops for Visual Robotic Planning
Benchmarking Empirical Privacy Protection for Adaptations of Large Language Models
Ego-Foresight: Self-supervised Learning of Agent-Aware Representations for Improved RL
ATGen: Adversarial Reinforcement Learning for Test Case Generation
Extending Sequence Length is Not All You Need: Effective Integration of Multimodal Signals for Gene Expression Prediction
Deterministic Bounds and Random Estimates of Metric Tensors on Neuromanifolds
Critique-RL: Training Critiquing Language Models Through Two-Stage RL for Improved Discrimination and Constructive Feedback
VeriTrail: Closed-Domain Hallucination Detection with Traceability
MobileKGQA: On-Device KGQA System on Dynamic Mobile Environments
Constraint Matters: Multi-Modal Representation for Reducing Mixed-Integer Linear programming
Robustness in the Face of Partial Identifiability in Reward Learning
Imitation Learning as Return Distribution Matching
DAG-Math: Graph-Guided Mathematical Reasoning in LLMs
Fair Conformal Classification via Learning Representation-Based Groups
Multi-LLM Adaptive Conformal Inference for Reliable LLM Response
LORE: Jointly Learning The Intrinsic Dimensionality and Relative Similarity Structure from Ordinal Data
Dynamic-dLLM: Dynamic Cache-Budget and Adaptive Parallel Decoding for Training-Free Acceleration of Diffusion LLM
On Predictability of Reinforcement Learning Dynamics for Large Language Models
Time Is All It Takes: Spike-Retiming Attacks on Event-Driven Spiking Neural Networks
From Embedding to Control: Representations for Stochastic Multi-Object Systems
StreamSplat: Towards Online Dynamic 3D Reconstruction from Uncalibrated Video Streams
Pixel to Gaussian: Ultra-Fast Continuous Super-Resolution with 2D Gaussian Modeling
Unbiased Gradient Estimation for Event Binning via Functional Backpropagation
A.I.R.: Enabling Adaptive, Iterative, and Reasoning-based Frame Selection For Video Question Answering
Revolutionizing Reinforcement Learning Framework for Diffusion Large Language Models
Parallel Multimodal Diffusion Language Models for Thinking-Aware Editing and Generation
Guided Policy Optimization under Partial Observability
Compose and Fuse: Revisiting the Foundational Bottlenecks in Multimodal Reasoning
Escaping Model Collapse via Synthetic Data Verification: Near-term Improvements and Long-term Convergence
Not All Bits Are Equal: How Model Scale Changes Memory-Optimal Reasoning
Draft-based Approximate Inference for LLMs
A Fictional Q&A Dataset for Studying Memorization and Knowledge Acquisition
EUBRL: Epistemic Uncertainty Directed Bayesian Reinforcement Learning
Reward Model Routing in Alignment
Low-pass Personalized Subgraph Federated Recommendation
CHAMMI-75: pre-training multi-channel models with heterogeneous microscopy images
Text2Grad: Reinforcement Learning from Natural Language Feedback
Decomposing Representation Space into Interpretable Subspaces with Unsupervised Learning
DeNOTS: Stable Deep Neural ODEs for Time Series
Dynamic Multimodal Activation Steering for Hallucination Mitigation in Large Vision-Language Models
BioCAP: Exploiting Synthetic Captions Beyond Labels in Biological Foundation Models
DeepPrim: a Physics-Driven 3D Short-term Weather Forecaster via Primitive Equation Learning
RefineStat: Efficient Exploration for Probabilistic Program Synthesis
EditScore: Unlocking Online RL for Image Editing via High-Fidelity Reward Modeling
SysMoBench: Evaluating AI on Formally Specifying Complex Real-World Systems
DiffSDA: Unsupervised Diffusion Sequential Disentanglement Across Modalities
Accelerating Diffusion Large Language Models with SlowFast Sampling: The Three Golden Principles
SafeMoE: Safe Fine-Tuning for MoE LLMs by Aligning Harmful Input Routing
M3CoTBench: Benchmark Chain-of-Thought of MLLMs in Medical Image Understanding
Sci2Pol: Evaluating and Fine-tuning LLMs on Scientific-to-Policy Brief Generation
SWE-RM: Execution-free Feedback for Software Engineering Agents
Gradient Descent Dynamics of Rank-One Matrix Denoising
Enhancing Molecular Property Predictions by Learning from Bond Modelling and Interactions
Frequency-Domain Better than Time-Domain for Causal Structure Recovery in Dynamical Systems on Networks
FastFlow: Accelerating The Generative Flow Matching Models with Bandit Inference
LoopFormer: Elastic-Depth Looped Transformers for Latent Reasoning via Shortcut Modulation
THE PATH OF LEAST RESISTANCE: GUIDING LLM REASONING TRAJECTORIES WITH PREFIX CONSENSUS
CTBench: Cryptocurrency Time Series Generation Benchmark
Trajectory-aware Shifted State Space Models for Online Video Super-Resolution
Emergent Coordination in Multi-Agent Language Models
LSA: Layer-wise Sparsity Allocation for Large Language Model Pruning Based on Minimal Linear Reconstruction Error
Active Learning of 3D Gaussian Splatting with Consistent Region Partition and Robust Pose Estimation
SupCLAP: Controlling Optimization Trajectory Drift in Audio-Text Contrastive Learning with Support Vector Regularization
How Learning Rate Decay Wastes Your Best Data in Curriculum-Based LLM Pretraining
Co-occurring Associated REtained concepts in Diffusion Unlearning
SP-VLA: A Joint Model Scheduling and Token Pruning Approach for VLA Model Acceleration
A Rich Knowledge Space for Scalable Deepfake Detection
Rapid Training of Hamiltonian Graph Networks Using Random Features
M4PQA: A Comprehensive QA Dataset for AI Research with Instance-Level Evaluation
Escaping the Homophily Trap: A Threshold-free Graph Outlier Detection Framework via Clustering-guided Edge Reweighting
Math Blind: Failures in Diagram Understanding Undermine Reasoning in MLLMs
Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models
Steering the Herd: A Framework for LLM-based Control of Social Learning
Know When to Abstain: Optimal Selective Classification with Likelihood Ratios
PROTDYN: A FOUNDATION PROTEIN LANGUAGE MODEL FOR THERMODYNAMICS AND DYNAMICS GENERATION
From What to Why: A Multi-Agent System for Evidence-based Chemical Reaction Condition Reasoning
Pisces: Cryptography-based Private Retrieval-Augmented Generation with Dual-Path Retrieval
DMAP: A Distribution Map for Text
Beyond Entity Correlations: Disentangling Event Causal Puzzles in Temporal Knowledge Graphs
Beyond Aggregation: Guiding Clients in Heterogeneous Federated Learning
A tale of two tails: Preferred and anti-preferred natural stimuli in visual cortex
CLEAR: Calibrated Learning for Epistemic and Aleatoric Risk
Group Verification-based Policy Optimization for Interactive Coding Agents
Unlocking the Potential of Weighting Methods in Federated Learning Through Communication Compression
Generative Value Conflicts Reveal LLM Priorities
STream3R: Scalable Sequential 3D Reconstruction with Causal Transformer
GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning
A Scalable Constant-Factor Approximation Algorithm for $W_p$ Optimal Transport
When Weak LLMs Speak with Confidence, Preference Alignment Gets Stronger
Trade in Minutes! Rationality-Driven Agentic System for Quantitative Financial Trading
Understanding the Robustness of Distributed Self-Supervised Learning Frameworks Against Non-IID Data
GenDR: Lighten Generative Detail Restoration
Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation
MoReBench: Evaluating Procedural and Pluralistic Moral Reasoning in Language Models, More than Outcomes
Knowing When to Quit: Probabilistic Early Exits for Speech Separation Networks
Recurrent Action Transformer with Memory
Thinking-Free Policy Initialization Makes Distilled Reasoning Models More Effective and Efficient Reasoners
ELMUR: External Layer Memory with Update/Rewrite for Long-Horizon RL
TS-DDAE: A novel Temporal-Spectral Denoising Diffusion AutoEncoder for Wireless Signal Recognition Model Pre-training
Memory, Benchmark & Robots: A Benchmark for Solving Complex Tasks with Reinforcement Learning
Towards Better Optimization For Listwise Preference in Diffusion Models
DriveAgent-R1: Advancing VLM-based Autonomous Driving with Active Perception and Hybrid Thinking
Command-V: Training-Free Representation Finetuning Transfer
SERUM: Simple, Efficient, Robust, and Unifying Marking for Diffusion-based Image Generation
The Effect of Attention Head Count on Transformer Approximation
Generative Modeling from Black-Box Corruptions via Self-Consistent Stochastic Interpolants
Salient Object Ranking via Cyclical Perception-Viewing Interaction Modeling
Many Eyes, One Mind: Temporal Multi-Perspective and Progressive Distillation for Spiking Neural Networks
PMI: Flow-Based Inversion Correction via Proximal Operator
ActiveDPO: Active Direct Preference Optimization for Sample-Efficient Alignment
Tackling the XAI Disagreement Problem with Adaptive Feature Grouping
Learning Brain Representation with Hierachical Visual Embeddings
The Intricate Dance of Prompt Complexity, Quality, Diversity and Consistency in T2I Models
Optimal Sparsity of Mixture-of-Experts Language Models for Reasoning Tasks
PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance
$AutoDrive\text{-}P^3$: Unified Chain of Perception–Prediction–Planning Thought via Reinforcement Fine-Tuning
Scaling Linear Attention with Sparse State Expansion
Automatic Dialectic Jailbreak: A Framework for Generating Effective Jailbreak Strategies
Continual Unlearning for Text-to-Image Diffusion Models: A Regularization Perspective
Exploring State-Space Models for Data-Specific Neural Representations
Learning to Reason Efficiently with Discounted Reinforcement Learning
ReLi3D: Relightable Multi-view 3D Reconstruction with Disentangled Illumination
A Biologically Plausible Dense Associative Memory with Exponential Capacity
Lossy Common Information in a Learnable Gray-Wyner Network
Statistical Advantage of Softmax Attention: Insights from Single-Location Regression
SAQ: Stabilizer-Aware Quantum Error Correction Decoder
Talking Points: Describing and Localizing Pixels
QKV Projections Require a Fraction of Their Memory
RoboOmni: Proactive Robot Manipulation in Omni-modal Context
Attention Smoothing Is All You Need For Unlearning
Boosting Multi-Domain Reasoning of LLMs via Curvature-Guided Policy Optimization
GAS: Improving Discretization of Diffusion ODEs via Generalized Adversarial Solver
MUSE: Model-Agnostic Tabular Watermarking via Multi-Sample Selection
Tricks or Traps? A Deep Dive into RL for LLM Reasoning
Deep Learning for Subspace Regression
MultiMat: Multimodal Program Synthesis for Procedural Materials using Large Multimodal Models
ARTDECO: Toward High-Fidelity On-the-Fly Reconstruction with Hierarchical Gaussian Structure and Feed-Forward Guidance
Locally Subspace-Informed Neural Operators for Efficient Multiscale PDE Solving
EventFlash: Towards Efficient MLLMs for Event-Based Vision
FFT-based Dynamic Subspace Selection for Low-Rank Adaptive Optimization of Large Language Models
VisionReasoner: Unified Reasoning-Integrated Visual Perception via Reinforcement Learning
DenseGRPO: From Sparse to Dense Reward for Flow Matching Model Alignment
Tractability via Low Dimensionality: The Parameterized Complexity of Training Quantized Neural Networks
mR3: Multilingual Rubric-Agnostic Reward Reasoning Models
LaVCa: LLM-assisted Visual Cortex Captioning
FullPart: Generating each 3D Part at Full Resolution
Retrospective Sparse Attention for Efficient Long-Context Generation
DeepTRACE: Auditing Deep Research AI Systems for Tracking Reliability Across Citations and Evidence
ProSafePrune: Projected Safety Pruning for Mitigating Over-Refusal in LLMs
OptimalThinkingBench: Evaluating Over and Underthinking in LLMs
The Matthew Effect of AI Programming Assistants: A Hidden Bias in Software Evolution
H2OFlow: Grounding Human-Object Affordances with 3D Generative Models and Dense Diffused Flows
Programming with Pixels: Can Computer-Use Agents do Software Engineering?
AutoDA-Timeseries: Automated Data Augmentation for Time Series
NarrLV: Towards a Comprehensive Narrative-Centric Evaluation for Long Video Generation
Test-Time Efficient Pretrained Model Portfolios for Time Series Forecasting
SocialJax: An Evaluation Suite for Multi-agent Reinforcement Learning in Sequential Social Dilemmas
S3OD: Towards Generalizable Salient Object Detection with Synthetic Data
Efficient Degradation-agnostic Image Restoration via Channel-Wise Functional Decomposition and Manifold Regularization
PERSONA: Dynamic and Compositional Inference-Time Personality Control via Activation Vector Algebra
Code2Bench: Scaling Source and Rigor for Dynamic Benchmark Construction
Rethinking Expressivity and Degradation-Awareness in Attention for All-in-One Blind Image Restoration
SeeDNorm: Self-Rescaled Dynamic Normalization
AssoMem: Scalable Memory QA with Multi-Signal Associative Retrieval
Inverse Virtual Try-On: Generating Multi-Category Product-Style Images from Clothed Individuals
Rethinking the Gold Standard: Why Discrete Curvature Fails to Fully Capture Over-squashing in GNNs?
CurES: From Gradient Analysis to Efficient Curriculum Learning for Reasoning LLMs
Learning From the Past with Cascading Eligibility Traces
The Less You Depend, The More You Learn: Synthesizing Novel Views from Sparse, Unposed Images without Any 3D Knowledge
Benchmarking Open-ended Segmentation
Emergent Misalignment is Easy, Narrow Misalignment is Hard
UniUGG: Unified 3D Understanding and Generation via Geometric-Semantic Encoding
Shuffling the Data, Extrapolating the Step: Sharper Bias In Constant Step-Size SGD
UniRestorer: Universal Image Restoration via Adaptively Estimating Image Degradation at Proper Granularity
Characterizing Human Semantic Navigation in Concept Production as Trajectories in Embedding Space
Invert4TVG: A Temporal Video Grounding Framework with Inversion Tasks Preserving Action Understanding Ability
MENLO: From Preferences to Proficiency – Evaluating and Modeling Native-like Quality Across 47 Languages
Geometric Autoencoder Priors for Bayesian Inversion: Learn First Observe Later
Predicting Kernel Regression Learning Curves from Only Raw Data Statistics
DTO-KD: Dynamic Trade-off Optimization for Effective Knowledge Distillation
SPIKE-RL: Video-LLMs meet Bayesian Surprise
AetherCode: Evaluating LLMs’ Ability to Win In Premier Programming Competitions
OPRIDE: Efficient Offline Preference-based Reinforcement Learning via In-Dataset Exploration
Offline Reinforcement Learning with Adaptive Feature Fusion
EntropyLong: Effective Long-Context Training via Predictive Uncertainty
Boosting Agentic Reasoning in LLM Judges via Tool-Integrated Reinforcement Learning
MnemoDyn: Learning Resting State Dynamics from $40$K FMRI sequences
FS-KAN: Permutation Equivariant Kolmogorov-Arnold Networks via Function Sharing
MedAgentGym: A Scalable Agentic Training Environment for Code-Centric Reasoning in Biomedical Data Science
Can Vision–Language Models Assess Graphic Design Aesthetics? A Benchmark, Evaluation, and Dataset Perspective.
Unlocking the Power of Co-Occurrence in CLIP: A DualPrompt-Driven Method for Training-Free Zero-Shot Multi-Label Classification
BiasScope: Towards Automated Detection of Bias in LLM-as-a-Judge Evaluation
Beyond Outliers: A Study of Optimizers Under Quantization
Behavioral Embeddings of Programs: A Quasi-Dynamic Approach for Optimization Prediction
Mapping Overlaps in Benchmarks through Perplexity in the Wild
AlignFlow: Improving Flow-based Generative Models with Semi-Discrete Optimal Transport
ATLAS: Alibaba Dataset and Benchmark for Learning-Augmented Scheduling
WideSearch: Benchmarking Agentic Broad Info-Seeking
Scaling Laws and Spectra of Shallow Neural Networks in the Feature Learning Regime
Enhancing Vision Transformers for Object Detection via Context-Aware Token Selection and Packing
WINA: Weight Informed Neuron Activation for Accelerating Large Language Model Inference
Front-Loading Reasoning: The Synergy between Pretraining and Post-Training Data
Nemotron-CC-Math: A 133 Billion-Token-Scale High Quality Math Pretraining Dataset
Scalable Spatio-Temporal SE(3) Diffusion for Long-Horizon Protein Dynamics
Falcon: Fast Proximal Linearization of Normalized Cuts for Unsupervised Image Segmentation
RLP: Reinforcement as a Pretraining Objective
Human or Machine? A Preliminary Turing Test for Speech-to-Speech Interaction
Revisiting Parameter Server in LLM Post-Training
TRIM: Hybrid Inference via Targeted Stepwise Routing in Multi-Step Reasoning Tasks
Beyond Grid-Locked Voxels: Neural Response Functions for Continuous Brain Encoding
UltraLLaDA: Scaling the Context Length to 128K for Diffusion Large Language Models
Toward Faithful Retrieval-Augmented Generation with Sparse Autoencoders
Beyond Visual Reconstruction Quality: Object Perception-aware 3D Gaussian Splatting for Autonomous Driving
Enhancing Language Model Reasoning with Structured Multi-Level Modeling
Unified Vision-Language-Action Model
On the Computational Limits of AI4S-RL : A Unified $\varepsilon$-$N$ Analysis
When Silence Is Golden: Can LLMs Learn to Abstain in Temporal QA and Beyond?
Any-Order Flexible Length Masked Diffusion
ReST-KV: Robust KV Cache Eviction with Layer-wise Output Reconstruction and Spatial-Temporal Smoothing
Lean Finder: Semantic Search for Mathlib That Understands User Intents
The Mind's Transformer: Computational Neuroanatomy of LLM-Brain Alignment
A Brain Graph Foundation Model: Pre-Training and Prompt-Tuning across Broad Atlases and Disorders
Photon: Speedup Volume Understanding with Efficient Multimodal Large Language Models
SCRAPL: Scattering Transform with Random Paths for Machine Learning
When MLLMs Meets Compression Distortion: A Coding Paradigm Tailored to MLLMs
PETRI: Learning Unified Cell Embeddings from Unpaired Modalities via Early-Fusion Joint Reconstruction
CAPSUL: A Comprehensive Human Protein Benchmark for Subcellular Localization
An Empirical Study of Attention and Diversity for Adaptive Visual Token Pruning in Large Vision-Language Models
Robust Test-time Video-Text Retrieval: Benchmarking and Adapting for Query Shifts
BranchGRPO: Stable and Efficient GRPO with Structured Branching in Diffusion Models
Stability Under Scrutiny: Benchmarking Representation Paradigms for Online HD Mapping
ENACT: Evaluating Embodied Cognition with World Modeling of Egocentric Interaction
Lifelong Embodied Navigation Learning
Auto-RT: Automatic Jailbreak Strategy Exploration for Red-Teaming Large Language Models
Reinforcement Learning for Machine Learning Engineering Agents
Policy Newton Algorithm in Reproducing Kernel Hilbert Space
Attention, Please! Revisiting Attentive Probing Through the Lens of Efficiency
Stop Wasting Your Tokens: Towards Efficient Runtime Multi-Agent Systems
Inference-Time Dynamic Modality Selection for Incomplete Multimodal Classification
Avoid Catastrophic Forgetting with Rank-1 Fisher from Diffusion Models
GTM: A General Time-series Model for Enhanced Representation Learning of Time-Series data
AutoGPS: Automated Geometry Problem Solving via Multimodal Formalization and Deductive Reasoning
NGS-Marker: Robust Native Watermarking for 3D Gaussian Splatting
Don’t Pass@$k$: A Bayesian Framework for Large Language Model Evaluation
Mesh Splatting for End-to-end Multiview Surface Reconstruction
Evaluating Cross-Modal Reasoning Ability and Problem Charactaristics with Multimodal Item Response Theory
Subquadratic Algorithms and Hardness for Attention with Any Temperature
Auditing Black-Box LLM APIs with a Rank-Based Uniformity Test
ACPBench Hard: Unrestrained Reasoning about Action, Change, and Planning
MobileIPL: Enhancing Mobile Agents Thinking Process via Iterative Preference Learning
Concept-based Adversarial Attack: a Probabilistic Perspective
Learning from the Electronic Structure of Molecules across the Periodic Table
Anchored Supervised Fine-Tuning
Breaking and Fixing Defenses Against Control Flow Hijacking in Multi-Agent Systems
VowelPrompt: Hearing Speech Emotions from Text via Vowel-level Prosodic Augmentation
Understanding In-Context Learning on Structured Manifolds: Bridging Attention to Kernel Methods
Conformal Prediction with Corrupted Labels: Uncertain Imputation and Robust Re-weighting
REAP the Experts: Why Pruning Prevails for One-Shot MoE compression
HSSBench: Benchmarking Humanities and Social Sciences Ability for Multimodal Large Language Models
TrajFlow: Nation-wide Pseudo GPS Trajectory Generation with Flow Matching Models
SimBench: Benchmarking the Ability of Large Language Models to Simulate Human Behaviors
WATS: Wavelet-Aware Temperature Scaling for Reliable Graph Neural Networks
RedSage: A Cybersecurity Generalist LLM
Task-Aware Data Selection via Proxy-Label Enhanced Distribution Matching for LLM Finetuning
PriorGuide: Test-Time Prior Adaptation for Simulation-Based Inference
DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning
Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs
Demystifying Robot Diffusion Policies: Action Memorization and a Simple Lookup Table Alternative
MAVEN: A Mesh-Aware Volumetric Encoding Network for Simulating 3D Flexible Deformation
On Natural Ways to Generate and Their Provable Power
station2radar: query‑conditioned gaussian splatting for precipitation field
Verification of the Implicit World Model in a Generative Model via Adversarial Sequences
CoEmoGen: Towards Semantically-Coherent and Scalable Emotional Image Content Generation
LazyDrag: Enabling Stable Drag-Based Editing on Multi-Modal Diffusion Transformers via Explicit Correspondence
Improving Long-Range Interactions in Graph Neural Simulators via Hamiltonian Dynamics
RoboInter: A Holistic Intermediate Representation Suite Towards Robotic Manipulation
Influence-Preserving Proxies for Gradient-Based Data Selection in LLM FineTuning
Near-Optimal Sample Complexity Bounds for Constrained Average-Reward MDPs
Prompt and Parameter Co-Optimization for Large Language Models
FastAvatar: Towards Unified and Fast 3D Avatar Reconstruction with Large Gaussian Reconstruction Transformers
Mitigating Semantic Collapse in Generative Personalization with Test-Time Embedding Adjustment
Consistent Text-to-Image Generation via Scene De-Contextualization
SAGA: Structural Aggregation Guided Alignment with Dynamic View and Neighborhood Order Selection for Multiview Graph Domain Adaptation
Continuous multinomial logistic regression for neural decoding
Goedel-Prover-V2: Scaling Formal Theorem Proving with Scaffolded Data Synthesis and Self-Correction
Attention Sinks and Compression Valleys in LLMs are Two Sides of the Same Coin
MicroMix: Efficient Mixed-Precision Quantization with Microscaling Formats for Large Language Models
MoSA: Motion-Coherent Human Video Generation via Structure-Appearance Decoupling
MIRACLE: Model-free Imitation and Reinforcement Learning for Adaptive Cut-Selection
InfoBridge: Mutual Information estimation via Bridge Matching
ChartGalaxy: A Dataset for Infographic Chart Understanding and Generation
BoGrape: Bayesian optimization over graphs with shortest-path encoded
Musculoskeletal simulation of limb movement biomechanics in Drosophila melanogaster
CoLA: Co-Calibrated Logit Adjustment for Long-Tailed Semi-Supervised Learning
Physics-Inspired All-Pair Interaction Learning for 3D Dynamics Modeling
ETGS: Explicit Thermodynamics Gaussian Splatting for Dynamic Thermal Reconstruction
On the Sample Complexity of GNNs
Training Dynamics Impact Post-Training Quantization Robustness
Temporal superposition and feature geometry of RNNs under memory demands
Composer: A Search Framework for Hybrid Neural Architecture Design
It's All Just Vectorization: einx, a Universal Notation for Tensor Operations
Speculative Actions: A Lossless Framework for Faster AI Agents
Error Feedback for Muon and Friends
Sample Smart, Not Hard: Correctness-First Decoding for Better Reasoning in LLMs
LLM2Fx-Tools: Tool Calling for Music Post-Production
Tighter Performance Theory of FedExProx
A Resolution-Agnostic Geometric Transformer for Chromosome Modeling Using Inertial Frame
Unmasking Backdoors: An Explainable Defense via Gradient-Attention Anomaly Scoring for Pre-trained Language Models
Bilinear relational structure fixes reversal curse and enables consistent model editing
Erase or Hide? Suppressing Spurious Unlearning Neurons for Robust Unlearning
RobotArena $\infty$: Unlimited Robot Benchmarking via Real-to-Sim Translation
Transformers are Inherently Succinct
Alignment-Weighted DPO: A principled reasoning approach to improve alignment
UrbanFeel:A Comprehensive Benchmark for Temporal and Perceptual Understanding of City Scenes through Human Perspective
Automating the Refinement of Reinforcement Learning Specifications
MolecularIQ: Characterizing Chemical Reasoning Capabilities Through Symbolic Verification on Molecular Graphs
Guidance Watermarking for Diffusion Models
PICABench: How Far are We from Physical Realistic Image Editing?
Omni-Weather: Unified Multimodal Foundation Model for Weather Generation and Understanding
Factuality Matters: When Image Generation and Editing Meet Structured Visuals
LinearSR: Unlocking Linear Attention for Stable and Efficient Image Super-Resolution
Formalising Human-in-the-Loop: Computational Reductions, Failure Modes, and Legal-Moral Responsibility
Provable Guarantees for Automated Circuit Discovery in Mechanistic Interpretability
COMAL: A Convergent Meta-Algorithm for Aligning LLMs with General Preferences
Towards Faithful Reasoning in Remote Sensing: A Perceptually-Grounded GeoSpatial Chain-of-Thought for Vision-Language Models
SpatialViz-Bench: A Cognitively-Grounded Benchmark for Diagnosing Spatial Visualization in MLLMs
How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks
Realtime Video Frame Interpolation using One-Step Diffusion Sampling
HOTA: Hamiltonian framework for Optimal Transport Advection
Adaptive Mamba Neural Operators
Learning Video Generation for Robotic Manipulation with Collaborative Trajectory Control
TreeGrad-Ranker: Feature Ranking via $O(L)$-Time Gradients for Decision Trees
Controllable Video Generation with Provable Disentanglement
DeepAFL: Deep Analytic Federated Learning
DepthLM: Metric Depth from Vision Language Models
CaReBench: A Fine-grained Benchmark for Video Captioning and Retrieval
TokenSeek: Memory Efficient Fine Tuning via Instance-Aware Token Ditching
Fast training of accurate physics-informed neural networks without gradient descent
A Scalable Inter-edge Correlation Modeling in CopulaGNN for Link Sign Prediction
TripleSumm: Adaptive Triple-Modality Fusion for Video Summarization
Sparsity-promoting Fine-tuning for Equivariant Materials Foundation Model
ConRep4CO: Contrastive Representation Learning of Combinatorial Optimization Instances across Types
MambaSL: Exploring Single-Layer Mamba for Time Series Classification
Quantifying Cross-Attention Interaction in Transformers for Interpreting TCR-pMHC Binding
Uncertainty as Feature Gaps: Epistemic Uncertainty Quantification of LLMs in Contextual Question-Answering
Efficient Best-of-Both-Worlds Algorithms for Contextual Combinatorial Semi-Bandits
Positional Preservation Embedding for Multimodal Large Language Models
Latent Wasserstein Adversarial Imitation Learning
Gaia2: Benchmarking LLM Agents on Dynamic and Asynchronous Environments
Chasing the Tail: Effective Rubric-based Reward Modeling for Large Language Model Post-Training
Multi-Agent Guided Policy Optimization
Winter Soldier: Backdooring Language Models at Pre-Training with Indirect Data Poisoning
WearVox: An Egocentric Multichannel Voice Assistant Benchmark for Wearables
Compositional Generalization from Learned Skills via CoT Training: A Theoretical and Structural Analysis for Reasoning
INSTANT: Compressing Gradients and Activations for Resource-Efficient Training
MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization
SeRI: Gradient-Free Sensitive Region Identification in Decision-Based Black-Box Attacks
Markovian Transformers for Informative Language Modeling
LayerSync: Self-aligning Intermediate Layers
Towards Revealing the Effect of Batch Size Scheduling on Pre-training
On the Wasserstein Geodesic Principal Component Analysis of probability measures
Byzantine-Robust Federated Learning with Learnable Aggregation Weights
Multimodal LLM-assisted Evolutionary Search for Programmatic Control Policies
COMI: Coarse-to-fine Context Compression via Marginal Information Gain
A Function-Centric Graph Neural Network Approach for Predicting Electron Densities
DOPPLER: Dual-Policy Learning for Device Assignment in Asynchronous Dataflow Graphs
Take Note: Your Molecular Dataset Is Probably Aligned
WholeBodyVLA: Towards Unified Latent VLA for Whole-body Loco-manipulation Control
GeoFAR: Geography-Informed Frequency-Aware Super-Resolution for Climate Data
PHyCLIP: $\ell_1$-Product of Hyperbolic Factors Unifies Hierarchy and Compositionality in Vision-Language Representation Learning
ODEBrain: Continuous-Time EEG Graph for Modeling Dynamic Brain Networks
DaVinci: Reinforcing Visual-Structural Syntax in MLLMs for Generalized Scientific Diagram Parsing
Jacobian Aligned Random Forests
Flow Matching Policy Gradients
Obscure but Effective: Classical Chinese Jailbreak Prompt Optimization via Bio-Inspired Search
Topology and geometry of the learning space of ReLU networks: connectivity and singularities
ProofBridge: Auto-Formalization of Natural Language Proofs in Lean via Joint Embeddings
Robust Optimization for Mitigating Reward Hacking with Correlated Proxies
Beyond a Million Tokens: Benchmarking and Enhancing Long-Term Memory in LLMs
Why is Your Language Model a Poor Implicit Reward Model?
CompassNav: Steering From Path Imitation to Decision Understanding In Navigation
Towards a Certificate of Trust: Task-Aware OOD Detection for Scientific AI
ES-dLLM: Efficient Inference for Diffusion Large Language Models by Early-Skipping
Computational Bottlenecks for Denoising Diffusions
Flow Along the $K$-Amplitude for Generative Modeling
What Matters for Batch Online Reinforcement Learning in Robotics?
Eigen-1: Scientific Reasoning through Adaptive Multi-Agent Refinement and Monitor-based RAG
Improving Semantic Proximity in English-Centric Information Retrieval through Cross-Lingual Alignment
From Parameters to Behaviors: Unsupervised Compression of the Policy Space
Breaking Agent Backbones: Evaluating the Security of Backbone LLMs in AI Agents
Continuum Transformers Perform In-Context Learning by Operator Gradient Descent
Poly-attention: a general scheme for higher-order self-attention
Dynamic Early Exit in Reasoning Models
Stop Tracking Me! Proactive Defense Against Attribute Inference Attack in LLMs
Look-ahead Reasoning with a Learned Model in Imperfect Information Games
Let's Explore Step by Step: Generating Provable Formal Statements with Deductive Exploration
OmniSTVG: Toward Spatio-Temporal Omni-Object Video Grounding
Divide and Abstract: Autoformalization via Decomposition and Abstraction Learning
Seesaw: Accelerating Training by Balancing Batch Size and Learning Rate Scheduling
Hubble: a Model Suite to Advance the Study of LLM Memorization
LaplacianFormer:Rethinking Linear Attention with Laplacian Kernel
PAGE-4D: Disentangled Pose and Geometry Estimation for 4D Perception
Hierarchical Semantic-Acoustic Modeling via Semi-Discrete Residual Representations for Expressive End-to-End Speech Synthesis
Self-Predictive Representations for Combinatorial Generalization in Behavioral Cloning
Q-RAG: Long Context Multi‑Step Retrieval via Value‑Based Embedder Training
Learning from Algorithm Feedback: One-Shot SAT Solver Guidance with GNNs
Rescue: Retrieval Augmented Secure Code Generation
Universal Value-Function Uncertainties
Trapped by simplicity: When Transformers fail to learn from noisy features
NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale
Understanding the Mixture-of-Experts with Nadaraya-Watson Kernel
Matching multiple experts: on the exploitability of multi-agent imitation learning
BiasFreeBench: a Benchmark for Mitigating Bias in Large Language Model Responses
FLoRG: Federated Fine-tuning with Low-rank Gram Matrices and Procrustes Alignment
Overlap-Adaptive Regularization for Conditional Average Treatment Effect Estimation
An Open-Ended Benchmark and Formal Framework for Adjuvant Research with MLLM
An Orthogonal Learner for Individualized Outcomes in Markov Decision Processes
One Patch Doesn’t Fit All: Adaptive Patching for Native-Resolution Multimodal Large Language Models
Efficient and Sharp Off-Policy Learning under Unobserved Confounding
TAMMs:~Change Understanding and Forecasting in Satellite Image Time Series with a Temporal-Aware Multimodal Model
Unifying Complexity-Theoretic Perspectives on Provable Explanations
BrowseNet: Knowledge Graph-Based Associative Memory for Contextual Information Retrieval
GUI-Shift: Enhancing VLM-Based GUI Agents through Self-supervised Reinforcement Learning
HiPRAG: Hierarchical Process Rewards for Efficient Agentic Retrieval Augmented Generation
Implicit Bias and Loss of Plasticity in Matrix Completion: Depth Promotes Low-Rankness
CompMarkGS: Robust Watermarking for Compressed 3D Gaussian Splatting
FlashRNN: Unlocking Parallel Training of Nonlinear RNNs for Large Language Models
Incentive-Aligned LLM Summaries
e3: Learning to Explore Enables Extrapolation of Test-Time Compute for LLMs
A Formal Controllability Toolkit for Black-Box Generative Models
Point Prompting: Counterfactual Tracking with Video Diffusion Models
COMPASS: Robust Feature Conformal Prediction for Medical Segmentation Metrics
GneissWeb: Preparing High Quality Data for LLMs at Scale
Latent Speech-Text Transformer
Arbitrary-Order Block SignSGD for Memory-Efficient LLM Fine-Tuning
Unified 3D Scene Understanding Through Physical World Modeling
Intrinsic Lorentz Neural Network
FlashDLM: Accelerating Diffusion Language Model Inference via Efficient KV Caching and Guided Diffusion
ChronoPlay: A Framework for Modeling Dual Dynamics and Authenticity in Game RAG Benchmarks
Scalable Training for Vector-Quantized Networks with 100% Codebook Utilization
Towards Bridging the Gap between Large-Scale Pretraining and Efficient Finetuning for Humanoid Control
Fine-Grained Class-Conditional Distribution Balancing for Debiased Learning
Understanding the Dynamics of Forgetting and Generalization in Continual Learning via the Neural Tangent Kernel
VERIFY: A Novel Multi-Domain Dataset Grounding LTL in Contextual Natural Language via Provable Intermediate Logic
GIQ: Benchmarking 3D Geometric Reasoning of Vision Foundation Models with Simulated and Real Polyhedra
Cancer-Myth: Evaluating Large Language Models on Patient Questions with False Presuppositions
CollectiveKV: Decoupling and Sharing Collaborative Information in Sequential Recommendation
Variation-aware Flexible 3D Gaussian Editing
Zebra-CoT: A Dataset for Interleaved Vision-Language Reasoning
Stochastic Optimal Control for Continuous-Time fMRI Representation Learning
The Serial Scaling Hypothesis
Mango-GS: Enhancing Spatio-Temporal Consistency in Dynamic Scenes Reconstruction using Multi-Frame Node-Guided 4D Gaussian Splatting
PACEbench: A Framework for Evaluating Practical AI Cyber-Exploitation Capabilities
Compositional amortized inference for large-scale hierarchical Bayesian models
TAPTRv3: Spatial and Temporal Context Foster Robust Tracking of Any Point in Long Video
Universal Properties of Activation Sparsity in Modern Large Language Models
RL Squeezes, SFT Expands: A Comparative Study of Reasoning LLMs
Self-Refining Vision Language Model for Robotic Failure Detection and Reasoning
Training-Free Text-Guided Color Editing with Multi-Modal Diffusion Transformer
The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs
FedDAG: Clustered Federated Learning via Global Data and Gradient Integration for Heterogeneous Environments
Parameter-Efficient Reinforcement Learning using Prefix Optimization
ST-WebAgentBench: A Benchmark for Evaluating Safety and Trustworthiness in Web Agents
Beyond Frequency: Scoring-Driven Debiasing for Object Detection via Blueprint-Prompted Image Synthesis
High-dimensional Analysis of Synthetic Data Selection
WebWeaver: Structuring Web-Scale Evidence with Dynamic Outlines for Open-Ended Deep Research
Improving Discrete Diffusion Unmasking Policies Beyond Explicit Reference Policies
HippoTune: A Hippocampal Associative Loop–Inspired Fine-Tuning Method for Continual Learning
Stable-LoRA: Stabilizing Feature Learning of Low-Rank Adaption
Learning Unified Representation of 3D Gaussian Splatting
One for Two: A Unified Framework for Imbalanced Graph Classification via Dynamic Balanced Prototype
BIRD-INTERACT: Re-imagining Text-to-SQL Evaluation via Lens of Dynamic Interactions
Long-Context Generalization with Sparse Attention
Direct Reward Fine-Tuning on Poses for Single Image to 3D Human in the Wild
BOTS: A Unified Framework for Bayesian Online Task Selection in LLM Reinforcement Finetuning
IGU-LoRA: Adaptive Rank Allocation via Integrated Gradients and Uncertainty-Aware Scoring
Biologically Plausible Learning via Bidirectional Spike-Based Distillation
Exploratory Causal Inference in SAEnce
Peng's Q($\lambda$) for Conservative Value Estimation in Offline Reinforcement Learning
Spike-based Digital Brain: a novel fundamental model for brain activity analysis
Unlocking Long-Horizon Agentic Search with Large-Scale End-to-End RL
Neural Theorem Proving for Verification Conditions: A Real-World Benchmark
In-The-Flow Agentic System Optimization for Effective Planning and Tool Use
PICS: Pairwise Image Compositing with Spatial Interactions
OpenThoughts: Data Recipes for Reasoning Models
High-dimensional Mean-Field Games by Particle-based Flow Matching
Multimodal Dataset Distillation via Phased Teacher Models
A Two-Phase Deep Learning Framework for Adaptive Time-Stepping in High-Speed Flow Modeling
ChronoEdit: Towards Temporal Reasoning for In-Context Image Editing and World Simulation
Enhancing Visual Token Representations for Video Large Language Models via Training-free Spatial-Temporal Pooling and Gridding
ZIP-RC: Zero-overhead Inference-time Prediction of Reward and Cost for Adaptive and Interpretable Generation
wd1: Weighted Policy Optimization for Reasoning in Diffusion Language Models
Empowering Efficiency and Efficacy in WebAgent via Enabling Info-Rich Seeking
Sem-MoE: Semantic-aware Model-Data Collaborative Scheduling for Efficient MoE Inference
Robust Multi-Objective Controlled Decoding of Large Language Models
Boomerang Distillation Enables Zero-Shot Model Size Interpolation
One Model for All Tasks: Leveraging Efficient World Models in Multi-Task Planning
Splat Feature Solver
Data-Centric Lessons To Improve Speech-Language Pretraining
Expert Merging in Sparse Mixture of Experts with Nash Bargaining
Learning Massively Multitask World Models for Continuous Control
Alignment-Enhanced Integration of Connectivity and Spectral Sparse in Dynamic Sparse Training of LLM
Drugging the Undruggable: Benchmarking and Modeling Fragment-Based Screening
Search Arena: Analyzing Search-Augmented LLMs
RMAAT: Astrocyte-Inspired Memory Compression and Replay for Efficient Long-Context Transformers
Developmental Federated Tuning: A Cognitive-Inspired Paradigm for Efficient LLM Adaptation
How to train data-efficient LLMs
Batch Pruning by Activation Stability
Strategic Dishonesty Can Undermine AI Safety Evaluations of Frontier LLMs
From Sparse to Dense: Spatio-Temporal Fusion for Multi-View 3D Human Pose Estimation with DenseWarper
FROST: Filtering Reasoning Outliers with Attention for Efficient Reasoning
NOVA3R: Non-pixel-aligned Visual Transformer for Amodal 3D Reconstruction
Measurement Score-Based Diffusion Model
Multi-LCB: Extending LiveCodeBench to Multiple Programming Languages
Triangle Multiplication is All You Need for Biomolecular Structure Representations
A Unified Federated Framework for Trajectory Data Preparation via LLMs
IAGA: Identity-Aware Gaussian Approximation for Efficient 3D Molecular Generation
PoliCon: Evaluating LLMs on Achieving Diverse Political Consensus Objectives
Discrete Compositional Generation via General Soft Operators and Robust Reinforcement Learning
Discovering and Steering Interpretable Concepts in Large Generative Music Models
Adversarially Pretrained Transformers may be Universally Robust In-Context Learners
Ice Cream Doesn’t Cause Drowning: Benchmarking LLMs Against Statistical Pitfalls in Causal Inference
Coupled Transformer Autoencoder for Disentangling Multi-Region Neural Latent Dynamics
Samples Are Not Equal: A Sample Selection Approach for Deep Clustering
ZeroTuning: Unlocking the Initial Token's Power to Enhance Large Language Models Without Training
VeriEquivBench: An Equivalence Score for Ground-Truth-Free Evaluation of Formally Verifiable Code
Taming Imperfect Process Verifiers: A Sampling Perspective on Backtracking
TIMESLIVER : SYMBOLIC-LINEAR DECOMPOSITION FOR EXPLAINABLE TIME SERIES CLASSIFICATION
Does the Data Processing Inequality Reflect Practice? On the Utility of Low-Level Tasks
Visual symbolic mechanisms: Emergent symbol processing in Vision Language Models
Safeguarding Multimodal Knowledge Copyright in the RAG-as-a-Service Environment
CubeBench: Diagnosing Interactive, Long-Horizon Physical Intelligence under Partial Observations
RankFlow: Property-aware Transport for Protein Optimization
AstaBench: Rigorous Benchmarking of AI Agents with a Scientific Research Suite
A One-shot Framework for Directed Evolution of Antibodies
Neural Collapse in Multi-Task Learning
ARINBEV: Bird's-Eye View Layout Estimation with Conditional Autoregressive Model
Multi-Resolution Score-Based Variational Graphical Diffusion for Causal Inference on Latent Systems
Building a Foundational Guardrail for General Agentic Systems via Synthetic Data
Efficient Test-Time Scaling for Small Vision-Language Models
Play to Generalize: Learning to Reason Through Game Play
Variational Deep Learning via Implicit Regularization
PolySHAP: Extending KernelSHAP with Interaction-Informed Polynomial Regression
Efficient Multi-objective Prompt Optimization via Pure-exploration Bandits
Meta-Adaptive Prompt Distillation for Few-Shot Visual Question Answering
TableMaster: A Recipe to Advance Table Understanding with Language Models
PMark: Towards Robust and Distortion-free Semantic-level Watermarking with Channel Constraints
TEST-TIME SCALING IN DIFFUSION LLMS VIA HIDDEN SEMI-AUTOREGRESSIVE EXPERTS
Spilled Energy in Large Language Models
Knowledge Editing with Subspace-Aware Key-Value Mappings
Reevaluating Policy Gradient Methods for Imperfect-Information Games
Beyond Raw Detection Scores: Markov-Informed Calibration for Boosting Machine-Generated Text Detection
Learning Correlated Reward Models: Statistical Barriers and Opportunities
Much Ado About Noising: Do Flow Models Actually Make Better Control Policies?
Consistency-Driven Calibration and Matching for Few-Shot Class Incremental Learning
HEIST: A Graph Foundation Model for Spatial Transcriptomics and Proteomics Data
ProtoKV: Long-context Knowledges Are Already Well-Organized Before Your Query
EditVerse: Unifying Image and Video Editing and Generation with In-Context Learning
On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification
Partially Equivariant Reinforcement Learning in Symmetry-Breaking Environments
Human Uncertainty-Aware Data Selection and Automatic Labeling in Visual Question Answering
Sharp asymptotic theory for Q-learning with \texttt{LD2Z} learning rate and its generalization
Beyond RLHF and NLHF: Population-Proportional Alignment under an Axiomatic Framework
Learning to Lie: Reinforcement Learning Attacks Damage Human-AI Teams and Teams of LLMs
EMBridge: Enhancing Gesture Generalization from EMG Signals Through Cross-modal Representation Learning
Decoding Inner Speech with an End-to-End Brain-to-Text Neural Interface
From Vicious to Virtuous Cycles: Synergistic Representation Learning for Unsupervised Video Object-Centric Learning
Learning to Orchestrate Agents in Natural Language with the Conductor
HiVid: LLM-Guided Video Saliency For Content-Aware VOD And Live Streaming
VidGuard-R1: AI-Generated Video Detection and Explanation via Reasoning MLLMs and RL
Empowering LLM Tool Invocation with Tool-call Reward Model
Condition Matters in Full-head 3D GANs
Why DPO is a Misspecified Estimator and How to Fix It
Beyond Binary Rewards: Training LMs to Reason About Their Uncertainty
Task Tokens: A Flexible Approach to Adapting Behavior Foundation Models
IndicVisionBench: Benchmarking Cultural and Multilingual Understanding in VLMs
Unleashing LLMs in Bayesian Optimization: Preference-Guided Framework for Scientific Discovery
A Near-Optimal Best-of-Both-Worlds Algorithm for Federated Bandits
PredNext: Explicit Cross-View Temporal Prediction for Unsupervised Learning in Spiking Neural Networks
Human-Object Interaction via Automatically Designed VLM-Guided Motion Policy
Knowledgeable Language Models as Black-Box Optimizers for Personalized Medicine
Mitigating the Curse of Detail: Scaling Arguments for Feature Learning and Sample Complexity
Detect, Decide, Unlearn: A Transfer-Aware Framework for Continual Learning
Rethinking Residual Errors in Compensation-based LLM Quantization
SE-Diff: Simulator and Experience Enhanced Diffusion Model for Comprehensive ECG Generation
Beyond Magic Words: Sharpness-Aware Prompt Evolving for Robust Large Language Models with TARE
UniF$^2$ace: A $\underline{Uni}$fied $\underline{F}$ine-grained $\underline{Face}$ Understanding and Generation Model
Characterizing and Mitigating Reasoning Drift in Large Language Models
ChatInject: Abusing Chat Templates for Prompt Injection in LLM Agents
HiMAE: Hierarchical Masked Autoencoders Discover Resolution-Specific Structure in Wearable Time Series
Time-To-Inconsistency: A Survival Analysis of Large Language Model Robustness to Adversarial Attacks
Omni-IML: Towards Unified Interpretable Image Manipulation Localization
Kimi-Dev: Agentless Training as Skill Prior for SWE-agents
Safety Instincts: LLMs Learn to Trust Their Internal Compass for Self-Defense
Light of Normals: Unified Feature Representation for Universal Photometric Stereo
Test-Time Mixture of World Models for Embodied Agents in Dynamic Environments
ArtUV: Artist-style UV Unwrapping
PHAT: Modeling Period Heterogeneity for Multivariate Time Series Forecasting
Prompt-MII: Meta-Learning Instruction Induction for LLMs
On Universality of Deep Equivariant Networks
HARDTESTGEN: A High-Quality RL Verifier Generation Pipeline for LLM Algorithimic Coding
Matched Data, Better Models: Target Aligned Data Filtering with Sparse Features
Video Scene Segmentation with Genre and Duration Signals
Homeostatic Adaptation of Optimal Population Codes under Metabolic Stress
Dyslexify: A Mechanistic Defense Against Typographic Attacks in CLIP
Multihead Mixture of Experts for Classification of Gigapixel Pathology Images
SemHiTok: A Unified Image Tokenizer via Semantic-Guided Hierarchical Codebook for Multimodal Understanding and Generation
Implicit Bias of Per-sample Adam on Separable Data: Departure from the Full-batch Regime
Membrane Potential Perturbation Dynamic Is Total Variation
SHIELD: Suppressing Hallucinations In LVLM Encoders via Bias and Vulnerability Defense
Risk-Sensitive Agent Compositions
MIRA: Memory-Integrated Reinforcement Learning Agent with Limited LLM Guidance
Rewriting Pre-Training Data Boosts LLM Performance in Math and Code
Is Finer Better? The Limits of Microscaling Formats in Large Language Models
SimuHome: A Temporal- and Environment-Aware Benchmark for Smart Home LLM Agents
Improving Black-Box Generative Attacks via Generator Semantic Consistency
Soft Quality-Diversity Optimization
Incomplete Multi-View Multi-Label Classification via Shared Codebook and Fused-Teacher Self-Distillation
RF-DETR: Neural Architecture Search for Real-Time Detection Transformers
Discovering Novel LLM Experts via Task-Capability Coevolution
AVoCaDO: An Audiovisual Video Captioner Driven by Temporal Orchestration
FSD-CAP: Fractional Subgraph Diffusion with Class-Aware Propagation for Graph Feature Imputation
Hidden Breakthroughs in Language Model Training
AutoCode: LLMs as Problem Setters for Competitive Programming
Towards Spatial Supersensing in Video
TP-Spikformer: Token Pruned Spiking Transformer
VIRTUE: Visual-Interactive Text-Image Universal Embedder
Learning Semi-Structured Sparsity for LLMs via Shared and Context-Aware Hypernetwork
Explainable Token-level Noise Filtering for LLM Fine-tuning Datasets
Frame Guidance: Training-Free Guidance for Frame-Level Control in Video Diffusion Model
The Lattice Geometry of Neural Network Quantization: A Short Equivalence Proof of GPTQ and Babai's algorithm
BARREL: Boundary-Aware Reasoning for Factual and Reliable LRMs
Efficient Autoregressive Inference for Transformer Probabilistic Models
Information Shapes Koopman Representation
Can Language Models Discover Scaling Laws?
Specialization after Generalization: Towards Understanding Test-Time Training in Foundation Models
UrbanGS: Efficient and Scalable Architecture for Geometrically Accurate Large-Scene Reconstruction
Provably Explaining Neural Additive Models
Curse of Slicing: Why Sliced Mutual Information is a Deceptive Measure of Statistical Dependence
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search
Exploring Specular Reflection Inconsistency for Generalizable Face Forgery Detection
NatADiff: Adversarial Boundary Guidance for Natural Adversarial Diffusion
Sparse Imagination for Efficient Visual World Model Planning
Convergence of Muon with Newton-Schulz
TSPulse: Tiny Pre-Trained Models with Disentangled Representations for Rapid Time-Series Analysis
VideoReasonBench: Can MLLMs Perform Vision-Centric Complex Video Reasoning?
SwiftTS: A Swift Selection Framework for Time Series Pre-trained Models via Multi-task Meta-Learning
Adaptive Regularization for Large-Scale Sparse Feature Embedding Models
Copy-Paste to Mitigate Large Language Model Hallucinations
WithAnyone: Toward Controllable and ID Consistent Image Generation
Deep Global-sense Hard-negative Discriminative Generation Hashing for Cross-modal Retrieval
Bayes Adaptive Monte Carlo Tree Search for Offline Model-based Reinforcement Learning
Locality-Attending Vision Transformer
Self-Supervised Evolution Operator Learning for High-Dimensional Dynamical Systems
Perception-R1: Advancing Multimodal Reasoning Capabilities of MLLMs via Visual Perception Reward
SpatialLadder: Progressive Training for Spatial Reasoning in Vision-Language Models
Learn the Ropes, Then Trust the Wins: Self-imitation with Progressive Exploration for Agentic Reinforcement Learning
ManagerBench: Evaluating the Safety-Pragmatism Trade-off in Autonomous LLMs
IDEAL: Data Equilibrium Adaptation for Multi-Capability Language Model Alignment
Membership Inference Attacks Against Fine-tuned Diffusion Language Models
Imagine How To Change: Explicit Procedure Modeling for Change Captioning
Reinforcement Learning from Dynamic Critic Feedback for Free-Form Generations
Color3D: Controllable and Consistent 3D Colorization with Personalized Colorizer
Long-range Modeling and Processing of Multimodal Event Sequences
CL-DPS: A Contrastive Learning Approach to Blind Nonlinear Inverse Problem Solving via Diffusion Posterior Sampling
Metric $k$-clustering using only Weak Comparison Oracles
Universal Multi-Domain Translation via Diffusion Routers
Let Features Decide Their Own Solvers: Hybrid Feature Caching for Diffusion Transformers
3DSMT: A Hybrid Spiking Mamba-Transformer for Point Cloud Analysis
MotionWeaver: Holistic 4D-Anchored Framework for Multi-Humanoid Image Animation
EXP-Bench: Can AI Conduct AI Research Experiments?
Dens3R: A Foundation Model for 3D Geometry Prediction
FilMaster: Bridging Cinematic Principles and Generative AI for Automated Film Generation
Steering MoE LLMs via Expert (De)Activation
InputDSA: Demixing, then comparing recurrent and externally driven dynamics
Steerable Adversarial Scenario Generation through Test-Time Preference Alignment
Unveiling Perceptual Artifacts: A Fine-Grained Benchmark for Interpretable AI-Generated Image Detection
Interference-Isolated Elastic Weight Consolidation and Knowledge Calibration for Incremental Object Detection
Uni-DPO: A Unified Paradigm for Dynamic Preference Optimization of LLMs
Rethinking Driving World Model as Synthetic Data Generator for Perception Tasks
RiskPO: Risk-based Policy Optimization with Verifiable Reward for LLM Post-Training
Look Carefully: Adaptive Visual Reinforcements in Multimodal Large Language Models for Hallucination Mitigation
SAIL: Self-Amplified Iterative Learning for Diffusion Model Alignment with Minimal Human Feedback
ReForm: Reflective Autoformalization with Prospective Bounded Sequence Optimization
TEMPFLOW-GRPO: WHEN TIMING MATTERS FOR GRPO IN FLOW MODELS
EgoTwin: Dreaming Body and View in First Person
Reducing Class-Wise Performance Disparity via Margin Regularization
Content-Aware Mamba for Learned Image Compression
One Skill, Many Websites: Learning Generalizable Skills Through Polymorphic Abstraction
SurfSplat: Conquering Feedforward 2D Gaussian Splatting with Surface Continuity Priors
GRL-SNAM: Geometric Reinforcement Learning with Differential Hamiltonians for Navigation and Mapping in Unknown Environments
MolLangBench: A Comprehensive Benchmark for Language-Prompted Molecular Structure Recognition, Editing, and Generation
Unified Biomolecular Trajectory Generation via Pretrained Variational Bridge
Structured Flow Autoencoders: Learning Structured Probabilistic Representations with Flow Matching
Prune-then-Quantize or Quantize-then-Prune? Understanding the Impact of Compression Order in Joint Model Compression
MoM: Linear Sequence Modeling with Mixture-of-Memories
SoLoPO: Unlocking Long-Context Capabilities in LLMs via Short-to-Long Preference Optimization
TableDART: Dynamic Adaptive Multi-Modal Routing for Table Understanding
AutoDrive-R²: Incentivizing Reasoning and Self-Reflection Capacity for VLA Model in Autonomous Driving
On the Wings of Imagination: Conflicting Script-based Multi-role Framework for Humor Caption Generation
Towards Efficient Optimizer Design for LLM via Structured Fisher Approximation with a Low-Rank Extension
SAE as a Crystal Ball: Interpretable Features Predict Cross-domain Transferability of LLMs without Training
Harnessing Temporal Databases for Systematic Evaluation of Factual Time-Sensitive Question-Answering in LLMs
Point-Focused Attention Meets Context-Scan State Space: Robust Biological Visual Perception for Point Cloud Representation
Fair in Mind, Fair in Action? A Synchronous Benchmark for Understanding and Generation in UMLLMs
lmgame-Bench: How Good are LLMs at Playing Games?
Towards True Speech-to-Speech Models Without Text Guidance
SAM-Veteran: An MLLM-Based Human-like SAM Agent for Reasoning Segmentation
A Dense Subset Index for Collective Query Coverage
The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution
Curriculum Reinforcement Learning from Easy to Hard Tasks Improves LLM Reasoning
Masked Generative Policy for Robotic Control
DiffWind: Physics-Informed Differentiable Modeling of Wind-Driven Object Dynamics
Pairwise is Not Enough: Hypergraph Neural Networks for Multi-Agent Pathfinding
DiSRouter: Distributed Self-Routing for LLM Selections
From Abstract to Contextual: What LLMs Still Cannot Do in Mathematics
Birch SGD: A Tree Graph Framework for Local and Asynchronous SGD Methods
GALAX: Graph-Augmented Language Model for Explainable Reinforcement-Guided Subgraph Reasoning in Precision Medicine
Unbalanced Soft-Matching Distance For Neural Representational Comparison With Partial Unit Correspondence
Pixel-Level Residual Diffusion Transformer: Scalable 3D CT Volume Generation
Completing Missing Annotation: Multi-Agent Debate for Accurate and Scalable Relevant Assessment for IR Benchmarks
Chessformer: A Unified Architecture for Chess Modeling
Amortising Inference and Meta-Learning Priors in Neural Networks
Scaling Laws of SignSGD in Linear Regression: When Does It Outperform SGD?
Unified and Efficient Multi-view Clustering from Probabilistic Perspective
GlowQ: Group-Shared LOw-Rank Approximation for Quantized LLMs
What Generative Search Engines Like and How to Optimize Web Content Cooperatively
Sampling-aware Adversarial Attacks Against Large Language Models
Frequency-Balanced Retinal Representation Learning with Mutual Information Regularization
VidBridge-R1: Bridging QA and Captioning for RL-based Video Understanding Models with Intermediate Proxy Tasks
Masked Skill Token Training for Hierarchical Off-Dynamics Transfer
LaSeR: Reinforcement Learning with Last-Token Self-Rewarding
Implicit Models: Expressive Power Scales with Test-Time Compute
Inverse Reinforcement Learning with Dynamic Reward Scaling for LLM Alignment
LLMs Must Think Thrice to Solve Executable Counterfactuals
PALC: Preference Alignment via Logit Calibration
VLM-Guided Adaptive Negative Prompting for Creative Generation
CardioComposer: Leveraging Differentiable Geometry for Compositional Control of Anatomical Diffusion Models
AWM: Accurate Weight-Matrix Fingerprint for Large Language Models
Benchmarking Stochastic Approximation Algorithms for Fairness-Constrained Training of Deep Neural Networks
ChemEval: A Multi-level and Fine-grained Chemical Capability Evaluation for Large Language Models
Flow of Spans: Generalizing Language Models to Dynamic Span-Vocabulary via GFlowNets
FlowRL: Matching Reward Distributions for LLM Reasoning
FMIP: Joint Continuous-Integer Flow For Mixed-Integer Linear Programming
HiCache: A Plug-in Scaled-Hermite Upgrade for Taylor-Style Cache-then-Forecast Diffusion Acceleration
Contrastive Predictive Coding Done Right for Mutual Information Estimation
GeoDiv: Framework for Measuring Geographical Diversity in Text-to-Image Models
HBO: Hierarchical Balancing Optimization for Fine-Tuning Large Language Models
Toward Efficient Exploration by Large Language Model Agents
Demystifying Emergent Exploration in Goal-Conditioned RL
Using Reinforcement Learning to Train Large Language Models to Explain Human Decisions
LAMDA: A Longitudinal Android Malware Benchmark for Concept Drift Analysis
Text2Arch: A Dataset for Generating Scientific Architecture Diagrams from Natural Language Descriptions
Mastering Sparse CUDA Generation through Pretrained Models and Deep Reinforcement Learning
Robust Denoising Neural Reranker for Recommender Systems
**TandemFoilSet**: Datasets for Flow Field Prediction of Tandem-Airfoil Through the Reuse of Single Airfoils
A Study on PAVE Specification for Learnware
Fed-Duet: Dual Expert-Orchestrated Framework for Continual Federated Vision-Language Learning
Semantic-aware Wasserstein Policy Regularization for Large Language Model Alignment
Stackelberg Learning from Human Feedback: Preference Optimization as a Sequential Game
Neural Synchrony Between Socially Interacting Language Models
CARD: Towards Conditional Design of Multi-agent Topological Structures
ContextGen: Contextual Layout Anchoring for Identity-Consistent Multi-Instance Generation
Fairness-Aware Multi-view Evidential Learning with Adaptive Prior
Flow-Based Single-Step Completion for Efficient and Expressive Policy Learning
Decoding Open-Ended Information Seeking Goals from Eye Movements in Reading
Dual-Branch Representations with Dynamic Gated Fusion and Triple-Granularity Alignment for Deep Multi-View Clustering
Gogo: Group-wise granularity-ordered codec for stable and efficient speech generation
RF-MatID: Dataset and Benchmark for Radio Frequency Material Identification
Sequences of Logits Reveal the Low Rank Structure of Language Models
Automatic Stage Lighting Control: Is it a Rule-Driven Process or Generative Task?
TileLang: Bridge Programmability and Performance in Modern Neural Kernels
Triple-BERT: Do We Really Need MARL for Order Dispatch on Ride-Sharing Platforms?
Cortical Policy: A Dual-Stream View Transformer for Robotic Manipulation
SceneTransporter: Optimal Transport-Guided Compositional Latent Diffusion for Single-Image Structured 3D Scene Generation
Unveiling Super Experts in Mixture-of-Experts Large Language Models
SpeechJudge: Towards Human-Level Judgment for Speech Naturalness
TTS Can Speak in Any Style with Any Voice
VoxPrivacy: A Benchmark for Evaluating Interactional Privacy of Speech Language Models
$p\textrm{-less}$ Sampling: A Robust Hyperparameter-Free Approach for LLM Decoding
Graph homophily booster: Rethinking the role of discrete features on heterophilic graphs
Characteristic Root Analysis and Regularization for Linear Time Series Forecasting
Virtual Community: An Open World for Humans, Robots, and Society
Output Supervision Can Obfuscate the Chain of Thought
Probing in the Dark: State Entropy Maximization for POMDPs
Self-Speculative Decoding Accelerates Lossless Inference in Any-Order and Any-Subset Autoregressive Models
Localized Concept Erasure in Text-to-Image Diffusion Models via High-Level Representation Misdirection
CoRA: Boosting Time Series Foundation Models for Multivariate Forecasting through Correlation-aware Adapter
d$^2$Cache: Accelerating Diffusion-Based LLMs via Dual Adaptive Caching
Continuously Augmented Discrete Diffusion model for Categorical Generative Modeling
OrthAlign: Orthogonal Subspace Decomposition for Non-Interfering Multi-Objective Alignment
Independence Test for Linear Non-Gaussian Data and Applications in Causal Discovery
Error Notebook-Guided, Training-Free Part Retrieval in 3D CAD Assemblies via Vision-Language Models
Conditional Independent Component Analysis For Estimating Causal Structure with Latent Variables
DiffusionBlocks: Block-wise Neural Network Training via Diffusion Interpretation
Teaching Metric Distance to Discrete Autoregressive Language Models
Compute-Optimal Quantization-Aware Training
Hybrid Deep Searcher: Scalable Parallel and Sequential Search Reasoning
Learning Molecular Chirality via Chiral Determinant Kernels
Not-a-Bandit: Provably No-Regret Drafter Selection in Speculative Decoding for LLMs
Dual Perspectives on Non-Contrastive Self-Supervised Learning
Designing Affine-Invariant Neural Networks for Photometric Corruption Robustness and Generalization
SAIR: Enabling Deep Learning for Protein-Ligand Interactions with a Synthetic Structural Dataset
AutoCodeBench: Large Language Models are Automatic Code Benchmark Generators
A Graph Meta-Network for Learning on Kolmogorov–Arnold Networks
Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces
AsyncBEV: Cross-modal flow alignment in Asynchronous 3D Object Detection
On the Benefits of Weight Normalization for Overparameterized Matrix Sensing
Variational Inference for Cyclic Learning
Learning Explicit Single-Cell Dynamics Using ODE Representations
EvoTest: Evolutionary Test-Time Learning for Self-Improving Agentic Systems
InfoTok: Adaptive Discrete Video Tokenizer via Information-Theoretic Compression
From Predictors to Samplers via the Training Trajectory
Seeing but Not Believing: Probing the Disconnect Between Visual Attention and Answer Correctness in VLMs
SpareTrain: Fault-Tolerant LLM Training via Low-Cost Dual Modular Redundancy
Align Your Structures: Generating Trajectories with Structure Pretraining for Molecular Dynamics
Online Decision Making with Generative Action Sets
SpectraLLM: Uncovering the Ability of LLMs for Molecule Structure Elucidation from Multi-Spectra
Random Label Prediction Heads for Studying and Controlling Memorization in Deep Neural Networks
DiffVax: Optimization-Free Image Immunization Against Diffusion-Based Editing
In-Context Algebra
Semantic Voting: A Self-Evaluation-Free Approach for Efficient LLM Self-Improvement on Unverifiable Open-ended Tasks
Differentiable Lifting for Topological Neural Networks
Why Ask One When You Can Ask $k$? Learning-to-Defer to the Top-$k$ Experts
ATEX-CF: Attack-Informed Counterfactual Explanations for Graph Neural Networks
Augmented Radiance Field: A General Framework for Enhanced Gaussian Splatting
Scaling Group Inference for Diverse and High-Quality Generation
SmartChunk Retrieval: Query-Aware Chunk Compression with Planning for Efficient Document RAG
BTZSC: A Benchmark for Zero-Shot Text Classification Across Cross-Encoders, Embedding Models, and Rerankers
Bridging Kolmogorov Complexity and Deep Learning: Asymptotically Optimal Description Length Objectives for Transformers
A Revisit of Active Sequential Prediction-Powered Mean Estimation
Flatter Tokens are More Valuable for Speculative Draft Model Training
RestoreVAR: Visual Autoregressive Generation for All-in-One Image Restoration
Empowering Multi-Robot Cooperation via Sequential World Models
ContextIF: Enhancing Instruction-Following through Context Reward
AgentFold: Long-Horizon Web Agents with Proactive Context Folding
Towards Reliable Detection of Empty Space: Conditional Marked Point Processes for Object Detection
Physics-informed learning under mixing: How physical knowledge speeds up learning
MCIF: Multimodal Crosslingual Instruction-Following Benchmark from Scientific Talks
Bayesian Evidence-Driven Prototype Evolution for Federated Domain Adaptation
Log Probability Tracking of LLM APIs
AnyUp: Universal Feature Upsampling
StylOS: Multi-View 3D Stylization with Single-Forward Gaussian Splatting
Action-Free Offline-To-Online RL via Discretised State Policies
Scalable Oversight via Partitioned Human Supervision
MCbiF: Measuring Topological Autocorrelation in Multiscale Clusterings via 2-Parameter Persistent Homology
QVGen: Pushing the Limit of Quantized Video Generative Models
Bringing Stability to Diffusion: Decomposing and Reducing Variance of Training Masked Diffusion Models
ERTACache: Error Rectification and Timesteps Adjustment for Efficient Diffusion
From Collapse to Control: Understanding and Extending Context Length in Emerging Hybrid Models via Universal Position Interpolation
YoNoSplat: You Only Need One Model for Feedforward 3D Gaussian Splatting
Intrinsic training dynamics of deep neural networks
MoRA: Mobility as the Backbone for Geospatial Representation Learning at Scale
Breaking the Total Variance Barrier: Sharp Sample Complexity for Linear Heteroscedastic Bandits with Fixed Action Set
FLARE: Fully Integration of Vision-Language Representations for Deep Cross-Modal Understanding
Phantom-Data: Towards a General Subject-Consistent Video Generation Dataset
Visual Multi-Agent System: Mitigating Hallucination Snowballing via Visual Flow
Towards Understanding Subliminal Learning: When and How Hidden Biases Transfer
VTool-R1: VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool Use
Asymptotic analysis of shallow and deep forgetting in replay with neural collapse
Beyond Fixed: Training-Free Variable-Length Denoising for Diffusion Large Language Models
Reference Guided Skill Discovery
AceReason-Nemotron 1.1: Advancing Math and Code Reasoning through SFT and RL Synergy
Contact Wasserstein Geodesics for Non-Conservative Schrödinger Bridges
Global and Local Topology-Aware Graph Generation via Dual Conditioning Diffusion
Rodrigues Network for Learning Robot Actions
Beyond English-Centric Training: How Reinforcement Learning Improves Cross-Lingual Reasoning in LLMs
SMAN-Bench: A Cross-System Benchmark for Mobile Agents under Single- and Multi-path, Ambiguous, and Noisy Tasks
scDFM: Distributional Flow Matching Model for Robust Single-Cell Perturbation Prediction
GenCape: Structure-Inductive Generative Modeling for Category-Agnostic Pose Estimation
Sapiens2
Policy Likelihood-based Query Sampling and Critic-Exploited Reset for Efficient Preference-based Reinforcement Learning
E²LoRA: Efficient and Effective Low-Rank Adaptation with Entropy-Guided Adaptive Sharing
Selective Data Removal for Distributional Machine Unlearning
Query-Specific Causal Graph Pruning Under Tiered Knowledge
HUMOF: Human Motion Forecasting in Interactive Social Scenes
Time Optimal Execution of Action Chunk Policies Beyond Demonstration Speed
Self-Evolving Vision-Language Models for Image Quality Assessment via Voting and Ranking
FastVMT: Eliminating Redundancy in Video Motion Transfer
RECODE: A Benchmark for Research Code DEvelopment with Interactive Human Feedback
Initialization Schemes for Kolmogorov–Arnold Networks: An Empirical Study
Learning To Draft: Adaptive Speculative Decoding with Reinforcement Learning
VisualPRM400K: An Effective Dataset for Training Multimodal Process Reward Models
DRBench: A Realistic Benchmark for Enterprise Deep Research
Strictly Constrained Generative Modeling via Split Augmented Langevin Sampling
Online Learning and Equilibrium Computation with Ranking Feedback
Multi-ReduNet: Interpretable Class-Wise Decomposition of ReduNet
DRIFT: Decompose, Retrieve, Illustrate, then Formalize Theorems
Rolling Forcing: Autoregressive Long Video Diffusion in Real Time
T1: One-to-One Channel-Head Binding for Multivariate Time-Series Imputation
Don't Shift the Trigger: Robust Gradient Ascent for Backdoor Unlearning
LRIM: a Physics-Based Benchmark for Provably Evaluating Long-Range Capabilities in Graph Learning
Why We Need New Benchmarks for Local Intrinsic Dimension Estimation
Large Depth Completion Model from Sparse Observations
Robotic Manipulation by Imitating Generated Videos Without Physical Demonstrations
SRT: Super-Resolution for Time Series via Disentangled Rectified Flow
RegionE: Adaptive Region-Aware Generation for Efficient Image Editing
HARP: Hallucination Detection via Reasoning Subspace Projection
HumanPCR: Probing MLLM Capabilities in Diverse Human-Centric Scenes
CellDuality: Unlocking Biological Reasoning in LLMs with Self-Supervised RLVR
DeepEyes: Incentivizing "Thinking with Images" via Reinforcement Learning
Debugging Concept Bottleneck Models through Removal and Retraining
CARL: Preserving Causal Structure in Representation Learning
GOOD: Geometry-guided Out-of-Distribution Modeling for Open-set Test-time Adaptation in Point Cloud Semantic Segmentation
Why Do Unlearnable Examples Work: A Novel Perspective of Mutual Information
Diffusion Models as Dataset Distillation Priors
PixelCraft: A Multi-Agent system for High-Fidelity Visual Reasoning on Structured Images
Procedural Mistake Detection via Action Effect Modeling
MoGen: Detailed Neuronal Morphology Generation via Point Cloud Flow Matching
Feed-forward Human Performance Capture via Progressive Canonical Space Updates
QUEST: A robust attention formulation using query-modulated spherical attention
CDBridge: A Cross-omics Post-training Bridge Strategy for Context-aware Biological Modeling
Semi-Parametric Contextual Pricing with General Smoothness
Beyond Skeletons: Learning Animation Directly from Driving Videos with Same2X Training Strategy
Rethinking Bottlenecks in Safety Fine-Tuning of Vision Language Models
Efficient-SAM2: Accelerating SAM2 with Object-Aware Visual Encoding and Memory Retrieval
Exploring the Design Space of Transition Matching
Naming to Learn: Class Incremental Learning for Vision-Language Model with Unlabeled Data
WebFactory: Automated Compression of Foundational Language Intelligence into Grounded Web Agents
MotionGPT3: Human Motion as a Second Modality
Communication-Efficient Decentralized Optimization via Double-Communication Symmetric ADMM
How to Square Tensor Networks and Circuits Without Squaring Them
CO3: CONTRASTING CONCEPTS COMPOSE BETTER
HeuriGym: An Agentic Benchmark for LLM-Crafted Heuristics in Combinatorial Optimization
WAFT: Warping-Alone Field Transforms for Optical Flow
Detecting and Mitigating Memorization in Diffusion Models through Anisotropy of the Log-Probability
Sculptor: Empowering LLMs with Cognitive Agency via Active Context Management
Local Reinforcement Learning with Action-Conditioned Root Mean Squared Q-Functions
Online Prediction of Stochastic Sequences with High Probability Regret Bounds
INO-SGD: Addressing Utility Imbalance under Individualized Differential Privacy
ICDiffAD: Implicit Conditioning Diffusion Model for Time Series Anomaly Detection
Efficient Submodular Maximization for Sums of Concave over Modular Functions
Leveraging Explanation to Improve Generalization of Meta Reinforcement Learning
Generalizable End-to-End Tool-Use RL with Synthetic CodeGym
Exploring Real-Time Super-Resolution: Benchmarking and Fine-Tuning for Streaming Content
A Problem-Oriented Perspective and Anchor Verification for Code Optimization
SR-Scientist: Scientific Equation Discovery With Agentic AI
MCPMark: A Benchmark for Stress-Testing Realistic and Comprehensive MCP Use
Boosting Medical Visual Understanding From Multi-Granular Language Learning
Null-Space Filtering for Data-free Continual Model Merging: Preserving Transparency, Promoting Fidelity
FRIEDA: Benchmarking Multi-Step Cartographic Reasoning in Vision-Language Models
GNN Explanations that do not Explain and How to find Them
$\textbf{Re}^{2}$: Unlocking LLM Reasoning via Reinforcement Learning with Re-solving
Go Beyond Earth: Understanding Human Actions and Scenes in Microgravity Environments
Towards Physically Executable 3D Gaussian for Embodied Navigation
Accelerating Materials Design via LLM-Guided Evolutionary Search
ReCAPA: Hierarchical Predictive Correction to Mitigate Cascading Failures
VideoPhy-2: A Challenging Action-Centric Physical Commonsense Evaluation in Video Generation
Neural Graduated Assignment for Maximum Common Edge Subgraphs
FlashVID: Efficient Video Large Language Models via Training-free Tree-based Spatiotemporal Token Merging
MMReD: a Cross-Modal Benchmark for Dense Context Reasoning
Beyond the Heatmap: A Rigorous Evaluation of Component Impact in MCTS-Based TSP Solvers
DiffTrans: Differentiable Geometry-Materials Decomposition for Reconstructing Transparent Objects
When Foundation Models are One-Liners: Limitations and Future Directions for Time Series Anomaly Detection
Unifying Diffusion and Autoregression for Generalizable Vision-Language-Action Model
Let's (not) just put things in Context: Test-time Training for Long-context LLMs
Analysis of approximate linear programming solution to Markov decision problem with log barrier function
Cautious Weight Decay
MMTok: Multimodal Coverage Maximization for Efficient Inference of VLMs
Learning Koopman Representations with Controllability Guarantees
Cross-Embodiment Offline Reinforcement Learning for Heterogeneous Robot Datasets
Intrinsic Entropy of Context Length Scaling in LLMs
Hierarchical Encoding Tree with Modality Mixup for Cross-modal Hashing
Grasp Any Region: Prompting MLLM to Understand the Dense World
Smooth Reading: Bridging the Gap of Recurrent LLM to Self-Attention LLM on Long-Context Understanding
Any-step Generation via N-th Order Recursive Consistent Velocity Field Estimation
NewtonBench: Benchmarking Generalizable Scientific Law Discovery in LLM Agents
PosterCraft: Rethinking High-Quality Aesthetic Poster Generation in a Unified Framework
Sample-efficient evidence estimation of score based priors for model selection
ToolWeaver: Weaving Collaborative Semantics for Scalable Tool Use in Large Language Models
Semantic-Enhanced Time-Series Forecasting via Large Language Models
Tokenisation over Bounded Alphabets is Hard
Improved high-dimensional estimation with Langevin dynamics and stochastic weight averaging
Discern Truth from Falsehood: Reducing Over-Refusal via Contrastive Refinement
The Unseen Bias: How Norm Discrepancy in Pre-Norm MLLMs Leads to Visual Information Loss
Discrete Guidance Matching: Exact Guidance for Discrete Flow Matching
Gen-DFL: Decision-Focused Generative Learning for Robust Decision Making
3D Aware Region Prompted Vision Language Model
Towards Sequence Modeling Alignment between Tokenizer and Autoregressive Model
Modal Aphasia: Can Unified Multimodal Models Describe Images From Memory?
InnoGym: Benchmarking the Innovation Potential of AI Agents
Where Did It Go Wrong? Attributing Undesirable LLM Behaviors via Representation Gradient Tracing
Motion Prior Distillation in Time Reversal Sampling for Generative Inbetweening
SERQ: Saliency-Aware Low-Rank Error Reconstruction for LLM Quantization
From Assumptions to Actions: Turning LLM Reasoning into Uncertainty-Aware Planning for Embodied Agents
Inlier-Centric Post-Training Quantization for Object Detection Models
Adaptive Logit Adjustment for Debiasing Multimodal Language Models
Rethinking Pareto Frontier: On the Optimal Trade-offs in Fair Classification
Prior-aware and Context-guided Group Sampling for Active Probabilistic Subsampling
The Quest for Generalizable Motion Generation: Data, Model, and Evaluation
Co-rewarding: Stable Self-supervised RL for Eliciting Reasoning in Large Language Models
Exploring the Limits of Sub-Billion Language Model Reasoners with Open Training Recipes
SketchingReality: From Freehand Scene Sketches to Photorealistic Images
MeanCache: From Instantaneous to Average Velocity for Accelerating Flow Matching Inference
Long-Context Attention Benchmark: From Kernel Efficiency to Distributed Context Parallelism
Trust The Typical
BLADE: Block-Sparse Attention Meets Step Distillation for Efficient Video Generation
When Thinking Backfires: Mechanistic Insights into Reason-induced Misalignment
CroCoDiLight: Repurposing Cross-View Completion Encoders for Relighting
Detective SAM: Adaptive AI-Image Forgery Localization
Ground Slow, Move Fast: A Dual-System Foundation Model for Generalizable Vision-Language Navigation
MaRS: Memory-Adaptive Routing for Reliable Capacity Expansion and Knowledge Retention
Adaptive Social Learning via Mode Policy Optimization for Language Agents
Multimodal Prompt Optimization: Why Not Leverage Multiple Modalities for MLLMs
Conditionally Whitened Generative Models for Probabilistic Time Series Forecasting
Interaction Field Matching: Overcoming Limitations of Electrostatic Models
Part-X-MLLM: Part-aware 3D Multimodal Large Language Model
Spurious Correlation-Aware Embedding Regularization for Worst-Group Robustness
Not Search, But Scan: Benchmarking MLLMs on Scan-Oriented Academic Paper Reasoning
MARS: Reinforcing Multi-Agent Reasoning of LLMs through Self-Play in Strategic Games
Generative Bayesian Optimization: Generative Models as Acquisition Functions
Direct Doubly Robust Estimation of Conditional Quantile Contrasts
Multi-Domain Transferable Graph Gluing for Building Graph Foundation Models
Distill-SynthKG: Distilling Knowledge Graph Synthesis Workflow for Improved Coverage and Efficiency
Bayesian Robust Cooperative Multi-Agent Reinforcement Learning Against Unknown Adversaries
Semi-Supervised Preference Optimization with Limited Feedback
There Was Never a Bottleneck in Concept Bottleneck Models
EasyTune: Efficient Step-Aware Fine-Tuning for Diffusion-Based Motion Generation
Teach to Reason Safely: Policy-Guided Safety Tuning for MLRMs
Flow-Disentangled Feature Importance
Ghost in the Cloud: Your Geo-Distributed Large Language Models Training is Easily Manipulated
Federated Graph-Level Clustering Network with Dual Knowledge Separation
ReactDance: Hierarchical Representation for High-Fidelity and Coherent Long-Form Reactive Dance Generation
Memento: Toward an All-Day Proactive Assistant for Ultra-Long Streaming Video
EditBench: Evaluating LLM Abilities to Perform Real-World Instructed Code Edits
Benchmarking Large Vision-Language Models on Fine-Grained Image Tasks: A Comprehensive Evaluation
Autonomous Play with Correspondence-Driven Trajectory Warping
Two-Layer Convolutional Autoencoders Trained on Normal Data Provably Detect Unseen Anomalies
FASA: FREQUENCY-AWARE SPARSE ATTENTION
Composition-Grounded Instruction Synthesis for Visual Reasoning
SlotGCG: Exploiting the Positional Vulnerability in LLMs for Jailbreak Attacks
Learning Posterior Predictive Distributions for Node Classification from Synthetic Graph Priors
Negotiated Reasoning: On Provably Addressing Relative Over-Generalization
Separable Neural Networks: Approximation Theory, NTK Regime, and Preconditioned Gradient Descent
FreeKV: Boosting KV Cache Retrieval for Efficient LLM Inference
Revenue Maximization Under Sequential Price Competition Via The Estimation Of $s$-Concave Demand Functions
A Unifying View of Coverage in Linear Off-policy Evaluation
SuperF: Neural Implicit Fields for Multi-Image Super-Resolution
Inoculation Prompting: Eliciting traits from LLMs during training can reduce trait expression at test-time
D&R: Recovery-based AI-Generated Text Detection via a Single Black-box LLM Call
A Hidden Semantic Bottleneck in Conditional Embeddings of Diffusion Transformers
Softmax Transformers are Turing-Complete
Your Agent May Misevolve: Emergent Risks in Self-evolving LLM Agents
BézierFlow: Learning Bézier Stochastic Interpolant Schedulers for Few-Step Generation
Language Model Planning from an Information Theoretic Perspective
Unmute the Patch Tokens: Rethinking Probing in Multi-Label Audio Classification
Draw-In-Mind: Rebalancing Designer-Painter Roles in Unified Multimodal Models Benefits Image Editing
Autoregressive-based Progressive Coding for Ultra-Low Bitrate Image Compression
Generative Adversarial Post-Training Mitigates Reward Hacking in Live Human-AI Music Interaction
Sample Efficient Offline RL via T-Symmetry Enforced Latent State-Stitching
Nudging the Boundaries of LLM Reasoning
Efficient Orthogonal Fine-Tuning with Principal Subspace Adaptation
BindWeave: Subject-Consistent Video Generation via Cross-Modal Integration
AutoQD: Automatic Discovery of Diverse Behaviors with Quality-Diversity Optimization
R1-Code-Interpreter: LLMs Reason with Code via Supervised and Multi-stage Reinforcement Learning
Algorithm Generation via Creative Ideation
The Art of Scaling Reinforcement Learning Compute for LLMs
Dataset Distillation as Pushforward Optimal Quantization
Revisiting Matrix Sketching in Linear Bandits: Achieving Sublinear Regret via Dyadic Block Sketching
Online Decision-Focused Learning
vAttention: Verified Sparse Attention via Sampling
Preference-based Policy Optimization from Sparse-reward Offline Dataset
Clustering by Denoising: Latent plug-and-play diffusion for single-cell embeddings
Beyond Scattered Acceptance: Fast and Coherent Inference for DLMs via Longest Stable Prefixes
WSVD: Weighted Low-Rank Approximation for Fast and Efficient Execution of Low-Precision Vision-Language Models
Aegis: Automated Error Generation and Identification for Multi-Agent Systems
Translate Policy to Language: Flow Matching Generated Rewards for LLM Explanations
Dancing in Chains: Strategic Persuasion in Academic Rebuttal via Theory of Mind
Towards Cognitively-Faithful Decision-Making Models to Improve AI Alignment
FieryGS: In-the-Wild Fire Synthesis with Physics-Integrated Gaussian Splatting
Why Keep Your Doubts to Yourself? Trading Visual Uncertainties in Multi-Agent Bandit Systems
Fracture-GS: Dynamic Fracture Simulation with Physics-Integrated Gaussian Splatting
Symmetry-Aware Bayesian Optimization via Max Kernels
AlphaSAGE: Structure-Aware Alpha Mining via GFlowNets for Robust Exploration
MathNet: A Global Multimodal Benchmark for Mathematical Reasoning and Retrieval
OWL : Geometry-Aware Spatial Reasoning for Audio Large Language Models
OmniNav: A Unified Framework for Prospective Exploration and Visual-Language Navigation
vCache: Verified Semantic Prompt Caching
A Statistical Benchmark for Diffusion Posterior Sampling Algorithms
VideoNSA: Native Sparse Attention Scales Video Understanding
Pitfalls in Evaluating Language Model Forecasters
LaDiR: Latent Diffusion Enhances LLMs for Text Reasoning
Death of the Novel(ty): Beyond N-Gram Novelty as a Metric for Textual Creativity
Latent Geometry-Driven Network Automata for Complex Network Dismantling
Model-based Offline RL via Robust Value-Aware Model Learning with Implicitly Differentiable Adaptive Weighting
pySpatial: Generating 3D Visual Programs for Zero-Shot Spatial Reasoning
CTC-DRO: Robust Optimization for Reducing Language Disparities in Speech Recognition
GoR: A Unified and Extensible Generative Framework for Ordinal Regression
Joint Shadow Generation and Relighting via Light-Geometry Interaction Maps
Diffusion Bridge Variational Inference for Deep Gaussian Processes
Latent Adaptation of Foundation Policies for Sim-to-Real Transfer
AutoMetrics: Approximate Human Judgments with Automatically Generated Evaluators
Neural Optimal Transport Meets Multivariate Conformal Prediction
Efficient Quantization of Mixture-of-Experts with Theoretical Generalization Guarantees
Variational Autoencoding Discrete Diffusion with Enhanced Dimensional Correlations Modeling
RedTeamCUA: Realistic Adversarial Testing of Computer-Use Agents in Hybrid Web-OS Environments
In Agents We Trust, but Who Do Agents Trust? Latent Preferences Steer LLM Generations
Constant Degree Matrix-Driven Incomplete Multi-View Clustering via Connectivity-Structure and Embedding Tensor Learning
MetaEmbed: Scaling Multimodal Retrieval at Test-Time with Flexible Late Interaction
When Agents “Misremember” Collectively: Exploring the Mandela Effect in LLM-based Multi-Agent Systems
Riemannian Zeroth-Order Gradient Estimation with Structure-Preserving Metrics for Geodesically Incomplete Manifolds
Sign-SGD via Parameter-Free Optimization
DeepEyesV2: Toward Agentic Multimodal Model
Vertically Unified Agents for Graph Retrieval-Augmented Complex Reasoning
GPTailor: Large Language Model Pruning Through Layer Cutting and Stitching
Differentially Private Domain Discovery
I-DRUID: Layout to image generation via instance-disentangled representation and unpaired data
PartSAM: A Scalable Promptable Part Segmentation Model Trained on Native 3D Data
KnowledgeSmith: Uncovering Knowledge Updating in LLMs with Model Editing and Unlearning
Improving LLM-based Global Optimization with Search Space Partitioning
Mixture of Contexts for Long Video Generation
villa-X: Enhancing Latent Action Modeling in Vision-Language-Action Models
Local Success Does Not Compose: Benchmarking Large Language Models for Compositional Formal Verification
RealBench: A Benchmark for Complex Physical Systems with Real-World Data
A Statistical Learning Perspective on Semi-dual Adversarial Neural Optimal Transport Solvers
VideoAgentTrek: Computer-Use Pretraining from Unlabeled Videos
Skill Learning via Policy Diversity Yields Identifiable Representations for Reinforcement Learning
GPS: Directed Acyclic Graph guided Proactive Information Seeking in Large Language Models
CoMem: Compositional Concept-Graph Memory for Continual Vision–Language Learning
Learning to Reason via Mixture-of-Thought for Logical Reasoning
OpenAgentSafety: A Comprehensive Framework For Evaluating Real-World AI Agent Safety
ContextBench: Modifying Contexts for Targeted Latent Activation and Behaviour Elicitation
Code Driven Planning with Domain-Adaptive Selector
Benchmarking ECG Foundational Models: A Reality Check Across Clinical Tasks
SocialHarmBench: Revealing LLM Vulnerabilities to Socially Harmful Requests
Codified Finite-state Machines for Role-playing
Learning More with Less: A Dynamic Dual-Level Down-Sampling Framework for Efficient Policy Optimization
A Noise is Worth Diffusion Guidance
Slow-Fast Policy Optimization: Reposition-Before-Update for LLM Reasoning
A Framework for Studying AI Agent Behavior: Evidence from Consumer Choice Experiments
Optimal Aggregation of LLM and PRM Signals for Efficient Test-Time Scaling
Structure-Aware Graph Hypernetworks for Neural Program Synthesis
ChainMPQ: Interleaved Text-Image Reasoning Chains for Mitigating Relation Hallucinations
Scaling Laws Revisited: Modeling the Role of Data Quality in Language Model Pretraining
Demystifying Deep Search: A Holistic Evaluation with Hint-free Multi-Hop Questions and Factorised Metrics
SIGMA-GEN: STRUCTURE AND IDENTITY GUIDED MULTI-SUBJECT ASSEMBLY FOR IMAGE GENERATION
One-Step Video Restoration via Diffusion Adversarial Post-Training
Attack-Resistant Watermarking for AIGC Image Forensics via Diffusion-based Semantic Deflection
Attend to the Active: Structure-Aware Dynamic Attention in LLMs for Compositional Instruction Following
Beyond Noisy-TVs: Noise-Robust Exploration Via Learning Progress Monitoring
Flow Actor-Critic for Offline Reinforcement Learning
Visual Planning: Let's Think Only with Images
Expert Divergence Learning for MoE-based Language Models
$\alpha$-DPO: Robust Preference Alignment for Diffusion Models via $\alpha$ Divergence
Efficient algorithms for Incremental Metric Bipartite Matching
Explainable LLM Unlearning through Reasoning
RL for Reasoning by Adaptively Revealing Rationales
EgoWorld: Translating Exocentric View to Egocentric View using Rich Exocentric Observations
Grounding Generative Planners in Verifiable Logic: A Hybrid Architecture for Trustworthy Embodied AI
ARMs: Adaptive Red-Teaming Agent against Multimodal Models with Plug-and-Play Attacks
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning
xRFM: Accurate, scalable, and interpretable feature learning models for tabular data
KernelFusion: Zero-Shot Blind Super-Resolution via Patch Diffusion
FormalML: A Benchmark for Evaluating Formal Subgoal Completion in Machine Learning Theory
MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer
Leveraging Pretrained Knowledge at Inference Time: LoRA-Gated Contrastive Decoding for Multilingual Factual Language Generation in Adapted LLMs
FlowSymm: Physics–Aware, Symmetry–Preserving Graph Attention for Network Flow Completion
Any-Order Any-Subset AutoRegressive Model
From Large to Small: Transferring CUDA Optimization Expertise via Reasoning Graph
InSight-o3: Empowering Multimodal Foundation Models with Generalized Visual Search
Image is All You Need: Towards Efficient and Effective Large Language Model-Based Recommender Systems
Don't Forget Its Variance! The Minimum Path Variance Principle for Accurate and Stable Score-Based Density Ratio Estimation
ADEPT: Continual Pretraining via Adaptive Expansion and Dynamic Decoupled Tuning
I Predict Therefore I Am: Is Next Token Prediction Enough to Learn Human-Interpretable Concepts from Data?
Holistic Agent Leaderboard: The Missing Infrastructure for AI Agent Evaluation
Solving Football by Exploiting Equilibrium Structure of 2p0s Differential Games with One-Sided Information
EffiVMT: Video Motion Transfer via Efficient Spatial-Temporal Decoupled Finetuning
Mobile-GS: Real-time Gaussian Splatting for Mobile Devices
Nesterov Finds GRAAL: Optimal and Adaptive Gradient Method for Convex Optimization
LightRetriever: A LLM-based Text Retrieval Architecture with Extremely Faster Query Inference
HiTeA: Hierarchical Temporal Alignment for Training-Free Long-Video Temporal Grounding
GLASS Flows: Efficient Inference for Reward Alignment of Flow and Diffusion Models
Revisiting Nonstationary Kernel Design for Multi-Output Gaussian Processes
Partition Generative Modeling: Masked Modeling Without Masks
Adaptive Conformal Prediction via Mixture-of-Experts Gating Similarity
Splat the Net: Radiance Fields with Splattable Neural Primitives
The Shape of Adversarial Influence: Characterizing LLM Latent Spaces with Persistent Homology
SumRA: Parameter Efficient Fine-tuning with Singular Value Decomposition and Summed Orthogonal Basis
DTP: Delta-Guided Two Stage Pruning for Mamba-based Multimodal Large Language Models
UniHand: A Unified Model for Diverse Controlled 4D Hand Motion Modeling
OWLEYE: ZERO-SHOT LEARNER FOR CROSSDOMAIN GRAPH DATA ANOMALY DETECTION
Learnable Fractional Superlets with a Spectro-Temporal Emotion Encoder for Speech Emotion Recognition
Estimating Semantic Alphabet Size for LLM Uncertainty Quantification
Robust Fine-tuning of Vision-Language-Action Robot Policies via Parameter Merging
Fusing Pixels and Genes: Spatially-Aware Learning in Computational Pathology
Text summarization via global structure awareness
Adapt Data to Model: Adaptive Transformation Optimization for Domain-shared Time Series Foundation Models
Learning residue level protein dynamics with multiscale Gaussians
Action-Guided Attention for Video Action Anticipation
EditAnyShape: Shape-Aware Image Editing via Trajectory-Guided Region Control
Weight Space Representation Learning on Diverse NeRF Architectures
PluriHarms: Benchmarking the Full Spectrum of Human Judgments on AI Harm
Reinforcement Learning Fine-Tuning Enhances Activation Intensity and Diversity in the Internal Circuitry of LLMs
EgoDex: Learning Dexterous Manipulation from Large-Scale Egocentric Video
Noisy-Pair Robust Representation Alignment for Positive-Unlabeled Learning
TRAC: Tensor-Train based Across-layer Compression for Parameter-Efficient Fine-Tuning
XQC: Well-conditioned Optimization Accelerates Deep Reinforcement Learning
Muon Outperforms Adam in Tail-End Associative Memory Learning
Learning is Forgetting; LLM Training As Lossy Compression
NFT: Bridging Supervised Learning and Reinforcement Learning in Math Reasoning
Resurfacing the Instance-only Dependent Label Noise Model through Loss Correction
DiffPBR: Point-Based Rendering via Spatial-Aware Residual Diffusion
GenCP: Towards Generative Modeling Paradigm of Coupled physics with Application to Fluid-Structure Interaction
Sequential Information Bottleneck Fusion: Towards Robust and Generalizable Multi-Modal Brain Tumor Segmentation
Closing the Safety Gap: Surgical Concept Erasure in Visual Autoregressive Models
Hystar: Hypernetwork-driven Style-adaptive Retrieval via Dynamic SVD Modulation
AVERE: Improving Audiovisual Emotion Reasoning with Preference Optimization
Signal in the Noise: Polysemantic Interference Transfers and Predicts Cross-Model Influence
Neural Force Field: Few-shot Learning of Generalized Physical Reasoning
MIDAS: Multi-Image Dispersion and Semantic Reconstruction for Jailbreaking MLLMs
Learning of Population Dynamics: Inverse Optimization Meets JKO Scheme
Beyond Uniformity: Sample and Frequency Meta Weighting for Post-Training Quantization of Diffusion Models
FLoC: Facility Location-Based Efficient Visual Token Compression for Long Video Understanding
PLANETALIGN: A Comprehensive Python Library for Benchmarking Network Alignment
SIGMark: Scalable In-Generation Watermark with Blind Extraction for Video Diffusion
Lyra: Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation
Agent Data Protocol
OR-PRM: A Process Reward Model for Algorithmic Problem in Operations Research
Physics vs Distributions: Pareto Optimal Flow Matching with Physics Constraints
Hierarchical Prototype Learning for Semantic Segmentation
STDDN: A Physics-Guided Deep Learning Framework for Crowd Simulation
STaMP: Sequence Transformation and Mixed Precision for Low-Precision Activation Quantization
THE SELF-RE-WATERMARKING TRAP: FROM EXPLOIT TO RESILIENCE
When Flatness Does (Not) Guarantee Adversarial Robustness
AMemGym: Interactive Memory Benchmarking for Assistants in Long-Horizon Conversations
CogFlow: Bridging Perception and Reasoning through Knowledge Internalization for Visual Mathematical Problem Solving
CERTIFIED VS. EMPIRICAL ADVERSARIAL ROBUSTNESS VIA HYBRID CONVOLUTIONS WITH ATTENTION STOCHASTICITY
Manipulation as in Simulation: Enabling Accurate Geometry Perception in Robots
From Medical Records to Diagnostic Dialogues: A Clinical-Grounded Approach and Dataset for Psychiatric Comorbidity
Urban Socio-Semantic Segmentation with Vision-Language Reasoning
Continuous Chain of Thought: Parallel Exploration and Reasoning through a Theoretical Lens
To Infinity and Beyond: Tool-Use Unlocks Length Generalization in State Space Models
RAPID$^3$: Tri-Level Reinforced Acceleration Policies for Diffusion Transformer
Stable coresets: Unleashing the power of uniform sampling
PoLi-RL: A Point-to-List Reinforcement Learning Framework for Conditional Semantic Textual Similarity
Physically Valid Biomolecular Interaction Modeling with Gauss-Seidel Projection
Efficient Morphology–Control Co-Design via Stackelberg PPO under Non-Differentiable Leader–Follower Interfaces
The Hot Mess of AI: How Does Misalignment Scale With Model Intelligence and Task Complexity?
Lifelong Learning with Behavior Consolidation for Vehicle Routing
DRIFT: Learning from Abundant User Dissatisfaction in Real-World Preference Learning
ProofFlow: A Dependency Graph Approach to Faithful Proof Autoformalization
Beyond Hearing: Learning Task-agnostic ExG Representations from Earphones via Physiology-informed Tokenization
Influence without Confounding: Causal Discovery from Temporal Data with Long-term Carry-over Effects
Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play
Good allocations from bad estimates
MMDuet2: Enhancing Proactive Interaction of Video MLLMs with Multi-Turn Reinforcement Learning
Let's Think in Two Steps: Mitigating Agreement Bias in MLLMs with Self-Grounded Verification
Extending Fourier Neural Operators for Modeling Parameterized and Coupled PDEs
t-SNE Exaggerates Clusters, Provably
Operationalizing Data Minimization for Privacy-Preserving LLM Prompting
$PhyWorldBench$: A Comprehensive Evaluation of Physical Realism in Text-to-Video Models
Probing to Refine: Reinforcement Distillation of LLM Reasoners via Explanatory Inversion
PMDformer: Patch-Mean Decoupling Transformer for Long-term Forecasting
SafeFlowMatcher: Safe and Fast Planning using Flow Matching with Control Barrier Functions
Expert Heads: Robust Evidence Identification for Large Language Models
Enhancing Complex Symbolic Logical Reasoning of Large Language Models via Sparse Multi-Agent Debate
K-Sort Eval: Efficient Preference Evaluation for Visual Generation via Corrected VLM-as-a-Judge
Learning Flexible Forward Trajectories for Masked Molecular Diffusion
Towards Efficient Constraint Handling in Neural Solvers for Routing Problems
Displacement-Resistant Extensions of DPO with Nonconvex $f$-Divergences
Model Predictive Adversarial Imitation Learning for Planning from Observation
Learning from Noisy Preferences: A Semi-Supervised Learning Approach to Direct Preference Optimization
R-Horizon: How Far Can Your Large Reasoning Model Really Go in Breadth and Depth?
NAIPv2: Debiased Pairwise Learning for Efficient Paper Quality Estimation
R2PS: Worst-Case Robust Real-Time Pursuit Strategies under Partial Observability
InterActHuman: Multi-Concept Human Animation with Layout-Aligned Audio Conditions
OVSeg3R: Learn Open-vocabulary Instance Segmentation from 2D via 3D Reconstruction
PRISM: Festina Lente Proactivity—Risk-Sensitive, Uncertainty-Aware Deliberation for Proactive Agents
Use the Online Network If You Can: Towards Fast and Stable Reinforcement Learning
SURGE: Surprise-Guided Token Reduction for Efficient Video Understanding with VLMs
ShapeGen4D: Towards High Quality 4D Shape Generation from Videos
$\mathbf{T^3}$: Reducing Belief Deviation in Reinforcement Learning for Active Reasoning
OmniText: A Training-Free Generalist for Controllable Text-Image Manipulation
MOSS: Efficient and Accurate FP8 LLM Training with Microscaling and Automatic Scaling
Self-Supervised Learning from Structural Invariance
Are we measuring oversmoothing in graph neural networks correctly?
SAM 3: Segment Anything with Concepts
Language-Instructed Vision Embeddings for Controllable and Generalizable Perception
Real-Time Robot Execution with Masked Action Chunking
Meta-Router: Bridging Gold-standard and Preference-based Evaluations in LLM Routing
LinguaMap: Which Layers of LLMs Speak Your Language and How to Tune Them?
BED-LLM: Intelligent Information Gathering with LLMs and Bayesian Experimental Design
PERK: Long-Context Reasoning as Parameter-Efficient Test-Time Learning
MAS$^2$: Self-Generative, Self-Configuring, Self-Rectifying Multi-Agent Systems
Safety at One Shot: Patching Fine-Tuned LLMs with A Single Instance
ICaRus: Identical Cache Reuse for Efficient Multi-Model Inference
The Gaussian-Head OFL Family: One-Shot Federated Learning from Client Global Statistics
Object-Centric World Models from Few-Shot Annotations for Sample-Efficient Reinforcement Learning
Scaling Atomistic Protein Binder Design with Generative Pretraining and Test-Time Compute
Jailbreaking the Matrix: Nullspace Steering for Controlled Model Subversion
TangoFlux: Text to Audio Generation with CLAP-Ranked Preference Optimization
Bias Similarity Measurement: A Black-Box Audit of Fairness Across LLMs
GOLDILOCS: GENERAL OBJECT-LEVEL DETECTION AND LABELING OF CHANGES IN SCENES
FM4NPP: A Scaling Foundation Model for Nuclear and Particle Physics
Controllable diffusion-based generation for multi-channel biological data
On the Eligibility of LLMs for Counterfactual Reasoning: A Decompositional Study
Mixture-of-World Models: Scaling Multi-Task Reinforcement Learning with Modular Latent Dynamics
DAComp: Benchmarking Data Agents across the Full Data Intelligence Lifecycle
ASCIIEval: Benchmarking Models' Visual Perception in Text Strings via ASCII Art
Towards Sustainable Investment Policies Informed by Opponent Shaping
ResearchRubrics: A Benchmark of Prompts and Rubrics For Deep Research Agents
WorldEdit: Towards Open-World Image Editing with a Knowledge-Informed Benchmark
PINFDiT: Energy-Based Physics-Informed Diffusion Transformers for General-purpose Time Series Tasks
Investigating Redundancy in Multimodal Large Language Models with Multiple Vision Encoders
Train Once, Answer All: Many Pretraining Experiments for the Cost of One
Tree-based Dialogue Reinforced Policy Optimization for Red-Teaming Attacks
EmoPrefer: Can Large Language Models Understand Human Emotion Preferences?
ToolTree: Efficient LLM Tool Planning via Dual-Feedback Monte Carlo Tree Search and Bidirectional Pruning
KLAS: Using Similarity to Stitch Neural Networks for an Improved Accuracy-Efficiency Tradeoff
Omni-View: Unlocking How Generation Facilitates Understanding in Unified 3D Model based on Multiview images
Bridging Degradation Discrimination and Generation for Universal Image Restoration
Householder-Diagonalized Linear Attention (HDLA): Utilizing Enhanced Decay Mechanism for Efficient Sequence Modeling
Light Differentiable Logic Gate Networks
Faithful Bi-Directional Model Steering via Distribution Matching and Distributed Interchange Interventions
Occupancy Reward Shaping: Improving Credit Assignment for Offline Goal-Conditioned Reinforcement Learning
Diverse Text-to-Image Generation via Contrastive Noise Optimization
Latent Planning Emerges with Scale
HiddenEcho: Mitigating Noise Amplification in Differentially Private LLMs with Hidden-State Correction
The Geometry of LLM Quantization: GPTQ as Babai's Nearest Plane Algorithm
Erase to Improve: Erasable Reinforcement Learning for Search-Augmented LLMs
V2P-Bench: Evaluating Video-Language Understanding with Visual Prompts for Better Human-Model Interaction
PatchRefiner V2: Fast and Lightweight Real-Domain High-Resolution Metric Depth Estimation
Depth Anything 3: Recovering the Visual Space from Any Views
Enhancing Stability of Physics-Informed Neural Network Training Through Saddle-Point Reformulation
A^2TG: Adaptive Anisotropic Textured Gaussians for Efficient 3D Scene Representation
Adaptive Acquisition Selection for Bayesian Optimization with Large Language Models
Do Large Language Models Know What They Are Capable Of?
AdS-GNN - a Conformally Equivariant Graph Neural Network
Robustify Spiking Neural Networks via Dominant Singular Deflation under Heterogeneous Training Vulnerability
Grouping Nodes with known Value Differences: A lossless UCT-based Abstraction Algorithm
Doubly-Robust LLM-as-a-Judge: Externally Valid Estimation with Imperfect Personas
VOGUE: Unified Understanding, Generation, and Editing for Videos
Bures-Wasserstein Flow Matching for Graph Generation
A General Spatio-Temporal Backbone with Scalable Contextual Pattern Bank for Urban Continual Forecasting
Rectifying LLM Thought from Lens of Optimization
Mix-Ecom: Towards Mixed-Type E-Commerce Dialogues with Complex Domain Rules
G-reasoner: Foundation Models for Unified Reasoning over Graph-structured Knowledge
ViPER: Empowering the Self-Evolution of Visual Perception Abilities in Vision-Language Models
Enhanced Generative Model Evaluation with Clipped Density and Coverage
Guided Flow Policy: Learning from High-Value Actions in Offline Reinforcement Learning
Incentivizing LLM Reasoning via Reinforcement Learning with Functional Monte Carlo Tree Search
Deep Learning with Learnable Product-Structured Activations
On Robustness of Vision-Language-Action Model against Multi-Modal Perturbations
Detecting Invariant Manifolds in ReLU-Based RNNs
Entropy-preserving reinforcement learning
Beyond Length: Quantifying Long-Range Information for Long-Context LLM Pretraining Data
Toward Effective Tool-Integrated Reasoning via Self-Evolved Preference Learning
Why Prototypes Collapse: Diagnosing and Preventing Partial Collapse in Prototypical Self-Supervised Learning
Agentic Reinforced Policy Optimization
Concepts' Information Bottleneck Models
AudioTrust: Benchmarking The Multifaceted Trustworthiness of Audio Large Language Models
Fast Convergence of Natural Gradient Descent for Over-parameterized Physics-Informed Neural Networks
Cache-to-Cache: Direct Semantic Communication Between Large Language Models
MARTI: A Framework for Multi-Agent LLM Systems Reinforced Training and Inference
Structural Prognostic Event Modeling for Multimodal Cancer Survival Analysis
FastGHA: Generalized Few-Shot 3D Gaussian Head Avatars with Real-Time Animation
Learning Adaptive Distribution Alignment with Neural Characteristic Function for Graph Domain Adaptation
MergeTune: Continued Fine-Tuning of Vision-Language Models
Video-GPT via Next Clip Diffusion
EDINET-Bench: Evaluating LLMs on Complex Financial Tasks using Japanese Financial Statements
LearnPruner: Rethinking Attention-based Token Pruning in Vision Language Models
RL of Thoughts: Navigating LLM Reasoning with Inference-time Reinforcement Learning
Can Speech LLMs Think while Listening?
Enhancing Sparse Event Detection in Healthcare Time-Series via Adaptive Gate of Context–Detail Interaction
Robust Fine-Tuning from Non-Robust Pretrained Models: Mitigating Suboptimal Transfer With Epsilon-Scheduling
Gistify: Codebase-Level Understanding via Runtime Execution
FREAK: A Fine-grained Hallucination Evaluation Benchmark for Advanced MLLMs
The Quest for Efficient Reasoning: A Data-Centric Benchmark to CoT Distillation
InternSVG: Towards Unified SVG Tasks with Multimodal Large Language Models
Synergizing Understanding and Generation with Interleaved Analyzing-Drafting Thinking
Taming Curvature: Architecture Warm-up for Stable Transformer Training
Joint Audio-Video Diffusion Transformer with Hierarchical Spatio-Temporal Prior Synchronization
Branched Schrödinger Bridge Matching
Test-Time Alignment for Large Language Models via Textual Model Predictive Control
FALCON: Few-step Accurate Likelihoods for Continuous Flows
Improved Quality, Synchrony, and Preference Alignment for Joint Audio-Video Generation
LEGACY: A Lightweight Dynamic Gradient Compression Strategy for Distributed Deep Learning
Training LLMs with LogicReward for Faithful and Rigorous Reasoning
Efficient Regression-based Training of Normalizing Flows for Boltzmann Generators
Benchmarking Overton Pluralism in LLMs
ACE: Attribution-Controlled Knowledge Editing for Multi-hop Factual Recall
QuRL: Rubrics As Judge For Open-Ended Question Answering
Routing, Cascades, and User Choice for LLMs
Polynomial Convergence of Riemannian Diffusion Models
Plan then Act: Bi-level CAD Command Sequence Generation
ContextPRM: Leveraging Contextual Coherence for multi-domain Test-Time Scaling
Fast Proteome-Scale Protein Interaction Retrieval via Residue-Level Factorization
Towards Understanding the Shape of Representations in Protein Language Models
CLIP Behaves like a Bag-of-Words Model Cross-modally but not Uni-modally
PreciseCache: Precise Feature Caching for Efficient and High-fidelity Video Generation
Sparse CLIP: Co-Optimizing Interpretability and Performance in Contrastive Learning
VeriRole: Verifiable Role-Awareness through Hint-Guided Reinforcement Learning
Multimodal Aligned Semantic Knowledge for Unpaired Image-text Matching
UniHM: Unified Dexterous Hand Manipulation with Vision Language Model
Scaling Up, Speeding Up: A Benchmark of Speculative Decoding for Efficient LLM Test-Time Scaling
Adaptive Concept Discovery for Interpretable Few-Shot Text Classification
Tell me Habibi, is it Real or Fake?
FACET: A Fragment-Aware Conformer Ensemble Transformer
MeSH: Memory-as-State-Highways for Recursive Transformers
A-TPT: Angular Diversity Calibration Properties for Test-Time Prompt Tuning of Vision-Language Models
Functional MRI Time Series Generation via Wavelet-Based Image Transform and Spectral Flow Matching for Brain Disorder Identification
ComGS: Efficient 3D Object-Scene Composition via Surface Octahedral Probes
Pulp Motion: Framing-aware multimodal camera and human motion generation
Panda: A pretrained forecast model for chaotic dynamics
STAR: Similarity-guided Teacher-Assisted Refinement for Super-Tiny Function Calling Models
TimeSeriesExamAgent: Creating TimeSeries Reasoning Benchmarks at Scale
Unveiling the Potential of Diffusion Large Language Model in Controllable Generation
You Point, I Learn: Online Adaptation of Interactive Segmentation Models for Handling Distribution Shifts in Medical Imaging
iLLaVA: An Image is Worth Fewer Than 1/3 Input Tokens in Large Multimodal Models
Syncphony: Synchronized Audio-to-Video Generation with Diffusion Transformers
Does FLUX Already Know How to Perform Physically Plausible Image Composition?
Disentanglement of Variations with Multimodal Generative Modeling
Avey Bidirectional Architecture
Learning Escorted Protocols For Multistate Free-Energy Estimation
Dynamic Speculative Agent Planning
Generalization of RLVR Using Causal Reasoning as a Testbed
SWINGARENA: Adversarial Programming Arena for Long-context GitHub Issue Solving
CARE: Covariance-Aware and Rank-Enhanced Decomposition for Enabling Multi-Head Latent Attention
OmniPortrait: Fine-Grained Personalized Portrait Synthesis via Pivotal Optimization
Evolving Graph Structured Programs for Circuit Generation with Large Language Models
Graph-based Nearest Neighbors with Dynamic Updates via Random Walk-Based Analysis
Sublinear Time Quantum Algorithm for Attention Approximation
Navigating the Accuracy-Size Trade-Off with Flexible Model Merging
Bayesian Attention Mechanism: A Probabilistic Framework for Positional Encoding and Context Length Extrapolation
Incentivizing Consistent, Effective and Scalable Reasoning Capability in Audio LLMs via Reasoning Process Rewards
Theoretical Analysis of Contrastive Learning under Imbalanced Data: From Training Dynamics to a Pruning Solution
Branch and Bound Search for Exact MAP Inference in Credal Networks
SCI-Verifier: Scientific Verifier with Thinking
EVALUATING MEMORY IN LLM AGENTS VIA INCRE- MENTAL MULTI-TURN INTERACTIONS
ComPhy: Composing Physical Models with end-to-end Alignment
JALMBench: Benchmarking Jailbreak Vulnerabilities in Audio Language Models
MATA: A Trainable Hierarchical Automaton System for Multi-Agent Visual Reasoning
Synthesising Counterfactual Explanations via Label-Conditional Gaussian Mixture Variational Autoencoders
FlexLoRA: Entropy-Guided Flexible Low-Rank Adaptation
EgoBrain: Synergizing Minds and Eyes For Human Action Understanding
Vision Language Models are Biased
BiasBusters: Uncovering and Mitigating Tool Selection Bias in Large Language Models
On the Generalization Capacities of MLLMs for Spatial Intelligence
sleep2vec: Unified Cross-Modal Alignment for Heterogeneous Nocturnal Biosignals
TD-MoE: Tensor Decomposition for MoE Models
Product-Quantised Image Representation for High-Quality Image Synthesis
What Exactly Does Guidance Do in Masked Discrete Diffusion Models
Beyond Real: Imaginary Extension of Rotary Position Embeddings for Long-Context LLMs
Learning Domain-Aware Task Prompt Representations for Multi-Domain All-in-One Image Restoration
TS$^2$: Training with Sparsemax+, Testing with Softmax for Accurate and Diverse LLM Fine-Tuning
Verifying Chain-of-Thought Reasoning via its Computational Graph
Improving Reasoning for Diffusion Language Models via Group Diffusion Policy Optimization
Latent-to-Data Cascaded Diffusion Models for Unconditional Time Series Generation
PointRePar : SpatioTemporal Point Relation Parsing for Robust Category-Unified 3D Tracking
Edit-Based Flow Matching for Temporal Point Processes
Adaptive Data-Knowledge Alignment in Genetic Perturbation Prediction
On the Impossibility of Separating Intelligence from Judgment: The Computational Intractability of Filtering for AI Alignment
Discrete Bayesian Sample Inference for Graph Generation
Holdout-Loss-Based Data Selection for LLM Finetuning via In-Context Learning
DynamicInfer: Runtime-Aware Sparse Offloading for LLMs Inference on a Consumer-Grade GPU
CoMind: Towards Community-Driven Agents for Machine Learning Engineering
Gradient-Aligned Calibration for Post-Training Quantization of Diffusion Models
Signal Structure-Aware Gaussian Splatting for Large-Scale Scene Reconstruction
PERSISTENCE SPHERES: BI-CONTINUOUS REPRESENTATIONS OF PERSISTENCE DIAGRAMS.
Recover Cell Tensor: Diffusion-Equivalent Tensor Completion for Fluorescence Microscopy Imaging
CoNavBench: Collaborative Long-Horizon Vision-Language Navigation Benchmark
Learning to Reason in Structured In-context Environments with Reinforcement Learning
LeRobot: An Open-Source Library for End-to-End Robot Learning
FIRE: Frobenius-Isometry Reinitialization for Balancing the Stability–Plasticity Tradeoff
STORK: Faster Diffusion and Flow Matching Sampling by Resolving both Stiffness and Structure-Dependence
ParaS2S: Benchmarking and Aligning Spoken Language Models for Paralinguistic-aware Speech-to-Speech Interaction
Boosting Open Set Recognition Performance through Modulated Representation Learning
TIGaussian: Disentangle Gaussians for Spatial-Awared Text-Image-3D Alignment
A Bayesian Nonparametric Framework For Learning Disentangled Representations
SparseEval: Efficient Evaluation of Large Language Models by Sparse Optimization
Domain Expansion: A Latent Space Construction Framework for Multi-Task Learning
ASMIL: Attention-Stabilized Multiple Instance Learning for Whole-Slide Imaging
ActiveCQ: Active Estimation of Causal Quantities
U-MARVEL: Unveiling Key Factors for Universal Multimodal Retrieval via Embedding Learning with MLLMs
Neuron-Level Analysis of Cultural Understanding in Large Language Models
Diffusion Negative Preference Optimization Made Simple
LINK: Learning Instance-level Knowledge from Vision-Language Models for Human-Object Interaction Detection
How reinforcement learning after next-token prediction facilitates learning
Conditional Advantage Estimation for Reinforcement Learning in Large Reasoning Models
Online Navigation Refinement: Achieving Lane-Level Guidance by Associating Standard-Definition and Online Perception Maps
One Demo Is All It Takes: Planning Domain Derivation with LLMs from A Single Demonstration
On the Expressive Power of GNNs for Boolean Satisfiability
LINGOLY-TOO: Disentangling Reasoning from Knowledge with Templatised Orthographic Obfuscation
ThinKV: Thought-Adaptive KV Cache Compression for Efficient Reasoning Models
LoRA-Mixer: Coordinate Modular LoRA Experts Through Serial Attention Routing
PRISM: Controllable Diffusion for Compound Image Restoration with Scientific Fidelity
Aria: an Agent for Retrieval and Iterative Auto-Formalization via Dependency Graph
ON THE ROLE OF IMPLICIT REGULARIZATION OF STOCHASTIC GRADIENT DESCENT IN GROUP ROBUSTNESS
MIMIC: Mask-Injected Manipulation Video Generation with Interaction Control
Diagnosing Failures in Generalization from Task-Relevant Representational Geometry
Accelerated Parallel Tempering via Neural Transports
ProofOptimizer: Training Language Models to Simplify Proofs without Human Demonstrations
PoSh: Using Scene Graphs to Guide LLMs-as-a-Judge for Detailed Image Descriptions
TrojanTO: Action-Level Backdoor Attacks Against Trajectory Optimization Models
Are Global Dependencies Necessary? Scalable Time Series Forecasting via Local Cross-Variate Modeling
Unifying Stable Optimization and Reference Regularization in RLHF
One Life to Learn: Inferring Symbolic World Models for Stochastic Environments from Unguided Exploration
TrustGen: A Platform of Dynamic Benchmarking on the Trustworthiness of Generative Foundation Models
Overcoming Joint Intractability with Lossless Hierarchical Speculative Decoding
HEEGNet: Hyperbolic Embeddings for EEG
DRIFT-Net: A Spectral-Coupled Neural Operator for PDEs Learning
ACCORD: Alleviating Concept Coupling through Dependence Regularization for Text-to-Image Diffusion Personalization
DELTA-Code: How RL Unlocks and Transfers New Programming Algorithms in LLMs
ProReGen: Progressive Residual Generation under Attribute Correlations
Energy-Regularized Sequential Model Editing on Hyperspheres
VLSU: Mapping the Limits of Joint Multimodal Understanding for AI Safety
Knowledge Distillation for Large Language Models through Residual Learning
Personalized Collaborative Learning with Affinity-Based Variance Reduction
MC-Search: Evaluating and Enhancing Multimodal Agentic Search with Structured Long Reasoning Chains
BioBO: Biology-informed Bayesian Optimization for Perturbation Design
A Federated Generalized Expectation-Maximization Algorithm for Mixture Models with an Unknown Number of Components
Regulating Internal Evidence Flows for Robust Learning Under Spurious Correlations
Robustness in Text-Attributed Graph Learning: Insights, Trade-offs, and New Defenses
LumiTex: Towards High-Fidelity PBR Texture Generation with Illumination Context
WebDevJudge: Evaluating (M)LLMs as Critiques for Web Development Quality
Ensemble Prediction of Task Affinity for Efficient Multi-Task Learning
RADAR: Reasoning–Ability and Difficulty-Aware Routing in Language Models
Nüwa: Mending the Spatial Integrity Torn by VLM Token Pruning
FS-DFM: Fast and Accurate Long Text Generation with Few-Step Diffusion Language Models
Foundation Visual Encoders Are Secretly Few-Shot Anomaly Detectors
SpatiaLab: Can Vision–Language Models Perform Spatial Reasoning in the Wild?
DPQuant: Efficient and Private Model Training via Dynamic Quantization Scheduling
Echo: Towards Advanced Audio Comprehension via Audio-Interleaved Reasoning
Regularized Latent Dynamics Prediction is a Strong Baseline For Behavioral Foundation Models
ASIDE: Architectural Separation of Instructions and Data in Language Models
Systematic Biosafety Evaluation of DNA Language Models under Jailbreak Attacks
Training-free Counterfactual Explanation for Temporal Graph Model Inference
LLMS ON TRIAL: Evaluating Judicial Fairness For Large Language Models
Beyond Membership: Limitations of Add/Remove Adjacency in Differential Privacy
Learning to Reason for Hallucination Span Detection
MobileRL: Online Agentic Reinforcement Learning for Mobile GUI Agents
Does “Do Differentiable Simulators Give Better Policy Gradients?” Give Better Policy Gradients?
Understanding Transformers for Time Series: Rank Structure, Flow-of-ranks, and Compressibility
Segment-Level Attribution for Selective Learning of Long Reasoning Traces
Part-level Semantic-guided Contrastive Learning for Fine-grained Visual Classification
Understanding the Implicit Biases of Design Choices for Time Series Foundation Models
Calibrating Verbalized Confidence with Self-Generated Distractors
Learning a Game by Paying the Agents
Bee: A High-Quality Corpus and Full-Stack Suite to Unlock Advanced Fully Open MLLMs
Curation Leaks: Membership Inference Attacks against Data Curation for Machine Learning
When a Robot is More Capable than a Human: Learning from Constrained Demonstrators
Convergence of Regret Matching in Potential Games and Constrained Optimization
Unsupervised Invariant Risk Minimization
Demystifying and Enhancing the Efficiency of Large Language Model Based Search Agents
LEGATO: Large-scale End-to-end Generalizable Approach to Typeset OMR
Decomposition of Concept-Level Rules in Visual Scenes
Don't Settle Too Early: Self-Reflective Remasking for Diffusion Language Models
SONA: Learning Conditional, Unconditional, and Mismatching-Aware Discriminator
Improved Object-Centric Diffusion Learning with Registers and Contrastive Alignment
Improving Classifier-Free Guidance in Masked Diffusion: Low-Dim Theoretical Insights with High-Dim Impact
Joint Discriminative-Generative Modeling via Dual Adversarial Training
Concept-TRAK: Understanding how diffusion models learn concepts through concept attribution
Zero-shot Human Pose Estimation using Diffusion-based Inverse solvers
A General Framework for Black-Box Attacks Under Cost Asymmetry
A Statistical Theory of Overfitting for Imbalanced Classification
TurboBoA: Faster and Exact Attention-aware Quantization without Backpropagation
SpecBranch: Speculative Decoding via Hybrid Drafting and Rollback-Aware Branch Parallelism
Bayesian Parameter Shift Rules in Variational Quantum Eigensolvers
Dual Language Models: Balancing sample-efficiency and overfitting resilience
Efficient Credal Prediction through Decalibration
Beyond the Known: An Unknown-Aware Large Language Model for Open-Set Text Classification
Reducing information dependency does not cause training data privacy. Adversarially non-robust features do.
EAMET: ROBUST MASSIVE MODEL EDITING VIA EMBEDDING ALIGNMENT OPTIMIZATION
Reasoning Models Can be Accurately Pruned Via Chain-of-Thought Reconstruction
AdaCache: Adaptive Caching and Context Augmentation for Efficient LLM Serving
InfoNCE Induces Gaussian Distribution
No Prompt Left Behind: Exploiting Zero-Variance Prompts in LLM Reinforcement Learning via Entropy-Guided Advantage Shaping
Synthesizing Multimodal Verifiable Game Data to Boost VLMs' General Reasoning
VARestorer: One-Step VAR Distillation for Real-World Image Super-Resolution
TRACE: Your Diffusion Model is Secretly an Instance Edge Detector
Grounded Test-Time Adaptation for LLM Agents
Exponential-Wrapped Mechanisms: Differential Privacy on Hadamard Manifolds Made Practical
STAIRS-Former: Spatio-Temporal Attention with Interleaved Recursive Structure TransFormer for Offline Mulit-task Multi-agent Reinforcement Learning
PCB-Bench: Benchmarking LLMs for Printed Circuit Board Placement and Routing
Taming Score-Based Denoisers in ADMM: A Convergent Plug-and-Play Framework
Medical Interpretability and Knowledge Maps of Large Language Models
Study of Training Dynamics for Memory-Constrained Fine-Tuning
Story-Iter: A Training-free Iterative Paradigm for Long Story Visualization
DemoGrasp: Universal Dexterous Grasping from a Single Demonstration
Angle K-Means
Token-level Data Selection for Safe LLM Fine-tuning
Octax: Accelerated CHIP-8 Arcade Environments for Reinforcement Learning in JAX
Learning-Time Encoding Shapes Unlearning in LLMs
InfGen: Scenario Generation as Next Token Group Prediction
Sharpness-Aware Machine Unlearning
TEN-DM: Topology-Enhanced Diffusion Model for Spatio-Temporal Event Prediction
MedLesionVQA: A Multimodal Benchmark Emulating Clinical Visual Diagnosis for Body Surface Health
On the Reasoning Abilities of Masked Diffusion Language Models
Robust LLM Unlearning via Post Judgment and Multi-round Thinking
EGG-SR: Embedding Symbolic Equivalence into Symbolic Regression via Equality Graph
Fast Language Generation through Discrete Diffusion Divergence Instruct
Post-training Large Language Models for Diverse High-Quality Responses
Low rank adaptation of chemical foundation models generate effective odorant representations
Quartet of Diffusions: Structure-Aware Point Cloud Generation through Part and Symmetry Guidance
PepTri: Tri-Guided All-Atom Diffusion for Peptide Design via Physics, Evolution, and Mutual Information
RoRE: Rotary Ray Embedding for Generalised Multi-Modal Scene Understanding
Cutting the Skip: Training Residual-Free Transformers
ReVeal: Self-Evolving Code Agents via Reliable Self-Verification
SASFT: Sparse Autoencoder-guided Supervised Finetuning to Mitigate Unexpected Code-Switching in LLMs
COSMO-INR: Complex Sinusoidal Modulation for Implicit Neural Representations
Beyond In-Domain Detection: SpikeScore for Cross-Domain Hallucination Detection
A New Approach to Controlling Linear Dynamical Systems
VaseVQA-3D: Benchmarking 3D VLMs on Ancient Greek Pottery
Efficient Multimodal Spatial Reasoning via Dynamic and Asymmetric Routing
Your VAR Model is Secretly an Efficient and Explainable Generative Classifier
Delving into Spectral Clustering with Vision-Language Representations
LongHorizonUI: A Unified Framework for Robust long-horizon Task Automation of GUI Agent
CFO: Learning Continuous-Time PDE Dynamics via Flow-Matched Neural Operators
PRO-MOF: Policy Optimization with Universal Atomistic Models for Controllable MOF Generation
Will AI Tell Lies to Save Sick Children? Litmus-Testing AI Values Prioritization with AIRiskDilemmas
Breaking Safety Paradox with Feasible Dual Policy Iteration
UniCalli: A Unified Diffusion Framework for Column-Level Generation and Recognition of Chinese Calligraphy
MARS - A Foundational Map Auto-Regressor
PixNerd: Pixel Neural Field Diffusion
Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning
In-Context Algorithm Emulation in Fixed-Weight Transformers
From Five Dimensions to Many: Large Language Models as Precise and Interpretable Psychological Profilers
Test-time Verification via Optimal Transport: Coverage, ROC, & Sub-optimality
No outlier channels but with outlier blocks
Extreme Weather Nowcasting via Local Precipitation Pattern Prediction
FlexProtein: Joint Sequence and Structure Pretraining for Protein Modeling
Explaining Grokking and Information Bottleneck through Neural Collapse Emergence
Fine-Grained Privacy Extraction from Retrieval-Augmented Generation Systems by Exploiting Knowledge Asymmetry
Counterfactual Explanations on Robust Perceptual Geodesics
Analyzing the Training Dynamics of Image Restoration Transformers: A Revisit to Layer Normalization
Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory
MILR: Improving Multimodal Image Generation via Test-Time Latent Reasoning
Saddle-To-Saddle Dynamics in Deep ReLU Networks: Low-Rank Bias in the First Saddle Escape
Contrastive Diffusion Guidance for Spatial Inverse Problems
SparseD: Sparse Attention for Diffusion Language Models
Inverse IFEval: Can LLMs Unlearn Stubborn Training Conventions to Follow Real Instructions?
ConvT3: Structured State Kernels for Convolutional State Space Models
Decentralized Nonconvex Optimization under Heavy-Tailed Noise: Normalization and Optimal Convergence
Neural Compression of 3D Meshes using Sparse Implicit Representation
IC-Custom: Diverse Image Customization via In-Context Learning
Pruning as a Cooperative Game: Surrogate-Assisted Layer Contribution Estimation for Large Language Models
Single-Loop Byzantine-Resilient Federated Bilevel Optimization
AdaRank: Adaptive Rank Pruning for Enhanced Model Merging
Beyond Static Vision: Scene Dynamic Field Unlocks Intuitive Physics Understanding in Multi-modal Large Language Models
CircuitNet 3.0: A Multi-Modal Dataset with Task-Oriented Augmentation for AI-Driven Circuit Design
HAMLET: Hyperadaptive Agent-based Modeling for Live Embodied Theatrics
SyncTrack: Rhythmic Stability and Synchronization in Multi-Track Music Generation
Task-free Adaptive Meta Black-box Optimization
Lightweight Transformer for EEG Classification via Balanced Signed Graph Algorithm Unrolling
Embodied Agents Meet Personalization: Investigating Challenges and Solutions Through the Lens of Memory Utilization
SHE-LoRA: Selective Homomorphic Encryption for Federated Tuning with Heterogeneous LoRA
Learning to summarize user information for personalized reinforcement learning from human feedback
Learning Heterogeneous Degradation Representation for Real-World Super-Resolution
Tracing and Reversing Edits in LLMs: A Study on Rank-One Model Edits
Do LLM Agents Know How to Ground, Recover, and Assess? A Benchmark for Epistemic Competence in Information-Seeking Agents
Test-Time Iterative Error Correction for Efficient Diffusion Models
MARS-Sep: Multimodal-Aligned Reinforced Sound Separation
GaitSnippet: Gait Recognition Beyond Unordered Sets and Ordered Sequences
(U)NFV: Supervised and Unsupervised Neural Finite Volume Methods for Solving Hyperbolic PDEs
Multilevel Control Functional
InfoMosaic-Bench: Evaluating Multi-Source Information Seeking in Tool-Augmented Agents
Chain-of-Context Learning: Dynamic Constraint Understanding for Multi-Task VRPs
Animal behavioral analysis and neural encoding with transformer-based self-supervised pretraining
Improving Human-AI Coordination through Online Adversarial Training and Generative Models
Fast Escape, Slow Convergence: Learning Dynamics of Phase Retrieval under Power-Law Data
Towards a Foundation Model for Crowdsourced Label Aggregation
Scalable In-Context Q-Learning
CaRe-BN: Precise Moving Statistics for Stabilizing Spiking Neural Networks in Reinforcement Learning
Half-order Fine-Tuning for Diffusion Model: A Recursive Likelihood Ratio Optimizer
DRIFT: Divergent Response in Filtered Transformations for Robust Adversarial Defense
The Coverage Principle: How Pre-Training Enables Post-Training
Fast and Interpretable Protein Substructure Alignment via Optimal Transport
CodeQuant: Unified Clustering and Quantization for Enhanced Outlier Smoothing in Low-Precision Mixture-of-Experts
Moving Beyond Diffusion: Hierarchy-to-Hierarchy Autoregression for fMRI-to-Image Reconstruction
Tug-of-War No More: Harmonizing Accuracy and Robustness in Vision-Language Models via Stability-Aware Task Vector Merging
Enabling Fine-Tuning of Direct Feedback Alignment via Feedback-Weight Matching
UniFlow: A Unified Pixel Flow Tokenizer for Visual Understanding and Generation
From Broad Exploration to Stable Synthesis: Entropy-Guided Optimization for Autoregressive Image Generation
VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling
The Forecast After the Forecast: A Post-Processing Shift in Time Series
Generalizing Linear Autoencoder Recommenders with Decoupled Expected Quadratic Loss
Certifying the Full YOLO Pipeline: A Probabilistic Verification Approach
G4Splat: Geometry-Guided Gaussian Splatting with Generative Prior
From Prediction to Perfection: Introducing Refinement to Autoregressive Image Generation
Think Then Embed: Generative Context Improves Multimodal Embedding
Oracle-efficient Hybrid Learning with Constrained Adversaries
RefineBench: Evaluating Refinement Capability in Language Models
Multi-Marginal Flow Matching with Adversarially Learnt Interpolants
Towards Dynamic Interleaving Optimizers
Accelerating Inference for Multilayer Neural Networks with Quantum Computers
DualMap: Enabling Both Cache Affinity and Load Balancing for Distributed LLM Serving
Cooperative Sheaf Neural Networks
Beyond Linear Probes: Dynamic Safety Monitoring for Language Models
OmniCT: Towards a Unified Slice-Volume LVLM for Comprehensive CT Analysis
LogART: Pushing the Limit of Efficient Logarithmic Post-Training Quantization
Knowledge Externalization: Reversible Unlearning and Modular Retrieval in Multimodal Large Language Models
Robust Preference Optimization: Aligning Language Models with Noisy Preference Feedback
Atomic HINs: Entity-Attribute Duality for Heterogeneous Graph Modeling
PatchDNA: A Flexible and Biologically-Informed Alternative to Tokenization for DNA
U2-BENCH: Benchmarking Large Vision-Language Models on Ultrasound Understanding
FideDiff: Efficient Diffusion Model for High-Fidelity Image Motion Deblurring
Hot Fuzz: Temperature-Tunable Composition of Diffusion models with Fuzzy Logic
SONATA: Synergistic Coreset Informed Adaptive Temporal Tensor Factorization
Zero-Sacrifice Lifelong Adversarial Defense for Pre-Trained Encoders
Implicit 4D Gaussian Splatting for Fast Motion with Large Inter-Frame Displacements
Robust Deep Reinforcement Learning against Adversarial Behavior Manipulation
R4: Nested Reasoning-Retrieval for Reward Modeling in Role-Playing Agents
Concept Insertion Success over Time in Diffusion Models through Prompt-Conditioned Interventions
Pushing on Multilingual Reasoning Models with Language-Mixed Chain-of-Thought
Ads that Stick: Near-Optimal Ad Optimization through Psychological Behavior Models
How Do Transformers Learn to Associate Tokens: Gradient Leading Terms Bring Mechanistic Interpretability
Permutation-Consistent Variational Encoding for Incomplete Multi-View Multi-Label Classification
Multi-Condition Conformal Selection
Antithetic Noise in Diffusion Models
SubDyve: Subgraph-Driven Dynamic Propagation for Virtual Screening Enhancement Controlling False Positive
SPICE: Submodular Penalized Information–Conflict Selection for Efficient Large Language Model Training
Predictive Differential Training Guided by Training Dynamics
Learnable Sparsity for Vision Generative Models
LiFR-Seg: Anytime High-Frame-Rate Segmentation via Event-Guided Propagation
Provably Tracking Equivalent Mechanistic Interpretations Across Neural Networks
Compactness and Consistency: A Conjoint Framework for Deep Graph Clustering
Misaligned Roles, Misplaced Images: Structural Input Perturbations Expose Multimodal Alignment Blind Spots
SAFA-SNN: Sparsity-Aware On-Device Few-Shot Class-Incremental Learning with Fast-Adaptive Structure of Spiking Neural Network
Can LLMs Refuse Questions They Do Not Know? Measuring Knowledge-Aware Refusal in Factual Tasks
Splat and Distill: Augmenting Teachers with Feed-Forward 3D Reconstruction For 3D-Aware Distillation
Disentangled Robot Learning via Separate Forward and Inverse Dynamics Pretraining
Reasoning in Space via Grounding in the World
Fastcar: Cache Attentive Replay for Fast Auto-Regressive Video Generation on the Edge
CogniMap3D: Cognitive 3D Mapping and Rapid Retrieval
TTOM: Test-Time Optimization and Memorization for Compositional Video Generation
HDR-4DGS: High Dynamic Range 4D Gaussian Splatting from Alternating-exposure Monocular Videos
Just Do It!? Computer-Use Agents Exhibit Blind Goal-Directedness
Counterfactual Reasoning for Retrieval-Augmented Generation
DGNet: Learning Spatiotemporal PDEs with Discrete Green Networks
Risk Phase Transitions in Spiked Regression: Alignment Driven Benign and Catastrophic Overfitting
S$^2$-Guidance: Stochastic Self-Guidance for Training-Free Enhancement of Diffusion Models
Learning Global Hypothesis Space for Enhancing Synergistic Reasoning Chain
MoDr: Mixture-of-Depth-Recurrent Transformers for Test-Time Reasoning
Radiometrically Consistent Gaussian Surfels for Inverse Rendering
On-the-Fly Adaptation to Quantization: Configuration-Aware LoRA for Efficient Fine-Tuning of Quantized LLMs
OrthoSolver: A Neural Proper Orthogonal Decomposition Solver For PDEs
Unsupervised Representation Learning for 3D Mesh Parameterization with Semantic and Visibility Objectives
Compositional-ARC: Assessing Systematic Generalization in Abstract Spatial Reasoning
Brain-IT: Image Reconstruction from fMRI via Brain-Interaction Transformer
PEERING INTO THE UNKNOWN: ACTIVE VIEW SELECTION WITH NEURAL UNCERTAINTY MAPS FOR 3D RECONSTRUCTION
RouterArena: An Open Platform for Comprehensive Comparison of LLM Routers
Memory-T1: Reinforcement Learning for Temporal Reasoning in Multi-session Agents
Revisiting Sharpness-Aware Minimization: A More Faithful and Effective Implementation
From Static Benchmarks to Dynamic Protocol: Agent-Centric Text Anomaly Detection for Evaluating LLM Reasoning
Video-STAR: Reinforcing Open-Vocabulary Action Recognition with Tools
REI-Bench: Can Embodied Agents Understand Vague Human Instructions in Task Planning?
No Caption, No Problem: Caption-Free Membership Inference via Model-Fitted Embeddings
Hallucination-aware Intermediate Representation Editing in Large Vision-Lanugage Models
Quasi-Equivariant Metanetworks
ShieldedCode: Learning Robust Representations for Virtual Machine Protected Code
Optimizer Choice Matters For The Emergence of Neural Collapse
reAR: Rethinking Visual Autoregressive Models via Token-wise Consistency Regularization
Tequila: Deadzone-free Ternary Quantization for Large Language Models
Bayesian Neural Networks for Functional ANOVA Model
K²-Agent: Co-Evolving Know-What and Know-How for Hierarchical Mobile Device Control
Missingness Bias Calibration in Feature Attribution Explanations
CLAUSE: Agentic Neuro-Symbolic Knowledge Graph Reasoning via Dynamic Learnable Context Engineering
GlobeDiff: State Diffusion Process for Partial Observability in Multi-Agent System
R-Zero: Self-Evolving Reasoning LLM from Zero Data
SELF-HARMONY: LEARNING TO HARMONIZE SELF-SUPERVISION AND SELF-PLAY IN TEST-TIME REINFORCEMENT LEARNING
Readout Representation: Redefining Neural Codes by Input Recovery
ScaleLong: A Multi-Timescale Benchmark for Long Video Understanding
Out-of-Distribution Graph Models Merging
Measuring Audio's Impact on Correctness: Audio-Contribution-Aware Post-Training of Large Audio Language Models
From Sequential to Parallel: Reformulating Dynamic Programming as GPU Kernels for Large-Scale Stochastic Combinatorial Optimization
CheckMate! Watermarking Graph Diffusion Models in Polynomial Time
RECAST: Expanding the Boundaries of LLMs' Complex Instruction Following with Multi-Constraint Data
Vlaser: Vision-Language-Action Model with Synergistic Embodied Reasoning
SEED: Towards More Accurate Semantic Evaluation for Visual Brain Decoding
Tucker-FNO: Tensor Tucker-Fourier Neural Operator and its Universal Approximation Theory
The logical expressiveness of topological neural networks
ECHO: Toward Contextual Seq2Seq Paradigms in Large EEG Models
SiNGER: A Clearer Voice Distills Vision Transformers Further
Sparse but Critical: A Token-Level Analysis of Distributional Shifts in RLVR Fine-Tuning of LLMs
Joint Distillation for Fast Likelihood Evaluation and Sampling in Flow-based Models
VFScale: Intrinsic Reasoning through Verifier-Free Test-time Scalable Diffusion Model
Remotely Detectable Robot Policy Watermarking
Mixture-of-Visual-Thoughts: Exploring Context-Adaptive Reasoning Mode Selection for General Visual Reasoning
FlyPrompt: Brain-Inspired Random-Expanded Routing with Temporal-Ensemble Experts for General Continual Learning
Fantastic Tractor-Dogs and How Not to Find Them With Open-Vocabulary Detectors
Lost in the Non-convex Loss Landscape: How to Fine-tune the Large Time Series Model?
WebWatcher: Breaking New Frontiers of Vision-Language Deep Research Agent
What Lies Beyond the View? Actively Constructing Spatial Beliefs in Foundation Models
CoLLMLight: Cooperative Large Language Model Agents for Network-Wide Traffic Signal Control
UFO-4D: Unposed Feedforward 4D reconstruction from Two Images
Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Method
CodeGenGuard: A Robust Watermark for Code Generation Models
Influence Dynamics and Stagewise Data Attribution
A Scalable Distributed Framework for Multimodal GigaVoxel Image Registration
From movement to cognitive maps: recurrent neural networks reveal how locomotor development shapes hippocampal spatial coding
Instance-Dependent Fixed-Budget Pure Exploration in Reinforcement Learning
LEAP: Local ECT-Based Learnable Positional Encodings for Graphs
When Is Diversity Rewarded in Cooperative Multi-Agent Learning?
LoongRL: Reinforcement Learning for Advanced Reasoning over Long Contexts
Unveiling the Cognitive Compass: Theory-of-Mind–Guided Multimodal Emotion Reasoning
P3D: Highly Scalable 3D Neural Surrogates for Physics Simulations with Global Context
Astra: General Interactive World Model with Autoregressive Denoising
Learning from Historical Activations in Graph Neural Networks
Universal Inverse Distillation for Matching Models with Real-Data Supervision (No GANs)
CooperTrim: Adaptive Data Selection for Uncertainty-Aware Cooperative Perception
Dichotomous Diffusion Policy Optimization
WorldSplat: Gaussian-Centric Feed-Forward 4D Scene Generation for Autonomous Driving
ReCogDrive: A Reinforced Cognitive Framework for End-to-End Autonomous Driving
Presenting a Paper is an Art: Self-Improvement Aesthetic Agents for Academic Presentations
ARMOR: High-Performance Semi-Structured Pruning via Adaptive Matrix Factorization
USTBench: Benchmarking and Dissecting Spatiotemporal Reasoning Capabilities of LLMs as Urban Agents
CounselBench: A Large-Scale Expert Evaluation and Adversarial Benchmarking of Large Language Models in Mental Health Question Answering
A Single Architecture for Representing Invariance Under Any Space Group
Conformal Prediction for Long-Tailed Classification
SPARTA: Scalable and Principled Benchmark of Tree-Structured Multi-hop QA over Text and Tables
Graphon Cross-Validation: Assessing Models on Network Data
NePTune: A Neuro-Pythonic Framework for Tunable Compositional Reasoning on Vision-Language
Query-Guided Spatial–Temporal–Frequency Interaction for Music Audio–Visual Question Answering
Conformal Robustness Control: A New Strategy for Robust Decision
VGR: Visual Grounded Reasoning
A High Quality Dataset and Reliable Evaluation for Interleaved Image-Text Generation
Improving Autoregressive Video Modeling with History Understanding
FinSearchComp: Towards a Realistic, Expert-Level Evaluation of Financial Search and Reasoning
Exposing Mixture and Annotating Confusion for Active Universal Test-Time Adaptation
Latent Diffusion Model without Variational Autoencoder
SimpleGVR: A Simple Baseline for Latent-Cascaded Generative Video Super-Resolution
PU-BENCH: A UNIFIED BENCHMARK FOR RIGOROUS AND REPRODUCIBLE PU LEARNING
Easier Painting Than Thinking: Can Text-to-Image Models Set the Stage, but Not Direct the Play?
AdaViewPlanner: Adapting Video Diffusion Models for Viewpoint Planning in 4D Scenes
Graph Random Features for Scalable Gaussian Processes
RESTRAIN: From Spurious Votes to Signals — Self-Training RL with Self-Penalization
Soft-Di[M]O: Improved one-step Image Discrete Model
A Schrödinger Eigenfunction Method for Long-Horizon Stochastic Optimal Control
D$^2$GS: Depth-and-Density Guided Gaussian Splatting for Stable and Accurate Sparse-View Reconstruction
MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models
No Prior, No Leakage: Revisiting Reconstruction Attacks in Trained Neural Networks
Inference-Time Scaling of Discrete Diffusion Models via Importance Weighting and Optimal Proposal Design
Omni-Captioner: Data Pipeline, Models, and Benchmark for Omni Detailed Perception
Peak-Return Greedy Slicing: Subtrajectory Selection for Transformer-based Offline RL
Simulation to Rules: A Dual-VLM Framework for Formal Visual Planning
GradPCA: Leveraging NTK Alignment for Reliable Out-of-Distribution Detection
Towards Greater Leverage: Scaling Laws for Efficient Mixture-of-Experts Language Models
STVG-R1: Incentivizing Instance-Level Reasoning and Grounding in Videos via Reinforcement Learning
Differentially Private Equilibrium Finding in Polymatrix Games
MicroVerse: A Preliminary Exploration Toward a Micro-World Simulation
SliderQuant: Accurate Post-Training Quantization for LLMs
Unleashing Guidance Without Classifiers for Human-Object Interaction Animation
Rethinking Unsupervised Cross-modal Flow Estimation: Learning from Decoupled Optimization and Consistency Constraint
Koopman-Assisted Trajectory Synthesis: A Data Augmentation Framework for Offline Imitation Learning
OrthoRF: Exploring Orthogonality in Object-Centric Representations
GaussianFusion: Unified 3D Gaussian Representation for Multi-Modal Fusion Perception
Value Matching: Scalable and Gradient-Free Reward-Guided Flow Adaptation
Uncertainty-driven Embedding Convolution
JailNewsBench: Multi-Lingual and Regional Benchmark for Fake News Generation under Jailbreak Attacks
SHAPO: Sharpness-Aware Policy Optimization for Safe Exploration
LaTo: Landmark-tokenized Diffusion Transformer for Fine-grained Human Face Editing
Qronos: Correcting the Past by Shaping the Future... in Post-Training Quantization
Correlations in the Data Lead to Semantically Rich Feature Geometry Under Superposition
Trace Anything: Representing Any Video in 4D via Trajectory Fields
SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning
MCP-SafetyBench: A Benchmark for Safety Evaluation of Large Language Models with Real-World MCP Servers
DanceTogether: Generating Interactive Multi-Person Video without Identity Drifting
Kevin: Multi-Turn RL for Generating CUDA Kernels
Asynchronous Matching with Dynamic Sampling for Multimodal Dataset Distillation
Critical attention scaling in long-context transformers
From Sorting Algorithms to Scalable Kernels: Bayesian Optimization in High-Dimensional Permutation Spaces
Enhancing Multi-Image Understanding through Delimiter Token Scaling
Uncertainty-Aware Diagnostics for Physics-Informed Machine Learning
Continuous-Time Value Iteration for Multi-Agent Reinforcement Learning
Decoupling Dynamical Richness from Representation Learning: Towards Practical Measurement
Path Matters: Unveiling Geometric Implicit Bias via Curvature-Aware Sparse View Optimization
DP-Fusion: Token-Level Differentially Private Inference for Large Language Models
Pre-training LLM without Learning Rate Decay Enhances Supervised Fine-Tuning
PAMDP: Interact to Persona Alignment via a Partially Observable Markov Decision Process
Endowing GPT-4 with a Humanoid Body: Building the Bridge Between Off-the-Shelf VLMs and the Physical World
Token-based Audio Inpainting via Discrete Diffusion
UP2You: Fast Reconstruction of Yourself from Unconstrained Photo Collections
RL's Razor: Why Online Reinforcement Learning Forgets Less
Plug, Play, and Fortify: A Low-Cost Module for Robust Multimodal Image Understanding Models
Dimension-Free Minimax Rates for Learning Pairwise Interactions in Attention-Style Models
Discrete Diffusion for Bundle Construction
Unified Registration of Cortical and Subcortical Structures
TianQuan-S2S: A Subseasonal-to-Seasonal Global Weather Model via Incorporate Climatology State
FlowBind: Efficient Any-to-Any Generation with Bidirectional Flows
PSP: Prompt-Guided Self-Training Sampling Policy for Active Prompt Learning
Getting Your LLMs Ready for Reinforcement Learning with Lightweight SFT
Spiking Discrepancy Transformer for Point Cloud Analysis
Advancing Multi-agent Traffic Simulation via R1-Style Reinforcement Fine-Tuning
Features Emerge as Discrete States: The First Application of SAEs to 3D Representations
QuadGPT: Native Quadrilateral Mesh Generation with Autoregressive Models
Point2RBox-v3: Self-Bootstrapping from Point Annotations via Integrated Pseudo-Label Refinement and Utilization
Relationship Alignment for View-aware Multi-view Clustering
Planning with an Embodied Learnable Memory
SMixer: Rethinking Efficient-Training and Event-Driven SNNs
SpineBench: A Clinically Salient, Level-Aware Benchmark Powered by the SpineMed-450k Corpus
Sample Lottery: Unsupervised Discovery of Critical Instances for LLM Reasoning
Adjusting Prediction Model Through Wasserstein Geodesic for Causal Inference
When Bias Helps Learning: Bridging Initial Prejudice and Trainability
Ctrl-World: A Controllable Generative World Model for Robot Manipulation
Social Agents: Collective Intelligence Improves LLM Predictions
Seeing What’s Wrong: A Trajectory-Guided Approach to Caption Error Detection
Flow Caching for Autoregressive Video Generation
Emotions Where Art Thou: Understanding and Characterizing the Emotional Latent Space of Large Language Models
PE-SGD: Differentially Private Deep Learning via Evolution of Gradient Subspace for Text
Instilling an Active Mind in Avatars via Cognitive Simulation
Language-guided Open-world Video Anomaly Detection under Weak Supervision
CoDA: From Text-to-Image Diffusion Models to Truly Training-Free Dataset Distillation
Flow Matching with Injected Noise for Offline-to-Online Reinforcement Learning
Mod-Adapter: Tuning-Free and Versatile Multi-concept Personalization via Modulation Adapter
Gradient-Direction-Aware Density Control for 3D Gaussian Splatting
Better Bounds for the Distributed Experts Problem
NEO — No-Optimization Test-Time Adaptation through Latent Re-Centering
DiscoX: Benchmarking Discourse-Level Translation in Expert Domains
OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models
Evaluating Text Creativity across Diverse Domains: a Dataset and Large Language Model Evaluator
Causal Interpretation of Neural Network Computations with Contribution Decomposition (CODEC)
AtlasKV: Augmenting LLMs with Billion-Scale Knowledge Graphs in 20GB VRAM
Distributional value gradients for stochastic environments
Seek-CAD: A Self-refined Generative Modeling for 3D Parametric CAD Using Local Inference via DeepSeek
Language Models Use Lookbacks to Track Beliefs
Multi-Scale Diffusion-Guided Graph Learning with Power-Smoothing Random Walk Contrast for Multi-View Clustering
SEMA: Simple yet Effective Learning for Multi-Turn Jailbreak Attacks
Invisible Safety Threat: Malicious Finetuning for LLM via Steganography
Generalizable Heuristic Generation Through LLMs with Meta-Optimization
Gelato: Graph Edit Distance via Autoregressive Neural Combinatorial Optimization
SIM-CoT: Supervised Implicit Chain-of-Thought
A Comprehensive Information-Decomposition Analysis of Large Vision-Language Models
Synthetic History: Evaluating Visual Representations of the Past in Diffusion Models
Beyond Distributions: Geometric Action Control for Continuous Reinforcement Learning
Revisting Node Affinity Prediction In Temporal Graphs
Evaluating Intuitive Physics Understanding in Video Diffusion Models via Likelihood Preference
LMask: Learn to Solve Constrained Routing Problems with Lazy Masking
TASTE: Text-Aligned Speech Tokenization and Embedding for Spoken Language Modeling
LouisKV: Efficient KV Cache Retrieval for Long Input-Output Sequences
Topological Causal Effects
Decoupled DMD: CFG Augmentation as the Spear, Distribution Matching as the Shield
SiMO: Single-Modality-Operable Multimodal Collaborative Perception
Resisting Contextual Interference in RAG via Parametric-Knowledge Reinforcement
Long-Text-to-Image Generation via Compositional Prompt Decomposition
Using maximal information auxiliary variables to improve synthetic data generation based on TabPFN foundation models
Matting Anything 2: Towards Video Matting for Anything
OXtal: An All-Atom Diffusion Model for Organic Crystal Structure Prediction
Mixing Importance with Diversity: Joint Optimization for KV Cache Compression in Large Vision-Language Models
Efficient Sliced Wasserstein Distance Computation via Adaptive Bayesian Optimization
Semantic Regexes: Auto-Interpreting LLM Features with a Structured Language
CASteer: Cross-Attention Steering for Controllable Concept Erasure
RepSpec: Structural Re-parameterized Draft Model Training for Speculative Decoding
Reasoning Language Model Inference Serving Unveiled: An Empirical Study
Learning to Interpret Weight Differences in Language Models
Understanding Task Vectors in In-Context Learning: Emergence, Functionality, and Limitations
Relational Feature Caching for Accelerating Diffusion Transformers
CityLens: Evaluating Large Vision-Language Models for Urban Socioeconomic Sensing
Bridging the Distribution Gap to Harness Pretrained Diffusion Priors for Super-Resolution
Discounted Online Convex Optimization: Uniform Regret Across a Continuous Interval
The Price of Robustness: Stable Classifiers Need Overparameterization
DNT: a Deeply Normalized Transformer that can be trained by Momentum SGD
MatRIS: Toward Reliable and Efficient Pretrained Machine Learning Interaction Potentials
DeepFRC: An End-to-End Deep Learning Model for Functional Registration and Classification
Flow Autoencoders are Effective Protein Tokenizers
Latent Wavelet Diffusion For Ultra High-Resolution Image Synthesis
Uni-CoT: Towards Unified Chain-of-Thought Reasoning Across Text and Vision
Does Weak-to-strong Generalization Happen under Spurious Correlations?
MCP Security Bench (MSB): Benchmarking Attacks Against Model Context Protocol in LLM Agents
A Brain-Inspired Gating Mechanism Unlocks Robust Computation in Spiking Neural Networks
MARC: Memory-Augmented RL Token Compression for Efficient Video Understanding
VMoBA: Mixture-of-Block Attention for Video Diffusion Models
LightCtrl: Training-free Controllable Video Relighting
Heterogeneous Front-Door Effects: Debiased Estimation with Quasi-Oracle Guarantees
Supporting Multimodal Intermediate Fusion with Informatic Constraint and Distribution Coherence
The Expressive Limits of Diagonal SSMs for State-Tracking
BWCache: Accelerating Video Diffusion Transformers through Block-Wise Caching
R-WoM: Retrieval-augmented World Model For Computer-use Agents
Mirage or Method? How Model–Task Alignment Induces Divergent RL Conclusions
STITCH: Simultaneous Thinking and Talking with Chunked Reasoning for Spoken Language Models
SciTS: Scientific Time Series Understanding and Generation with LLMs
Are EEG Foundation Models Worth It? Comparative Evaluation with Traditional Decoders in Diverse BCI Tasks
Outrageously Large Context Windows via RACE Attention -- A Family of Non-Linear Attention that can be calculated in Strictly Linear-Time
Veritas: Generalizable Deepfake Detection via Pattern-Aware Reasoning
From Fields to Random Trees
FedOpenMatch: Towards Semi-Supervised Federated Learning in Open-Set Environments
TABLET: A Large-Scale Dataset for Robust Visual Table Understanding
Can LLMs Reason Soundly in Law? Auditing Inference Patterns for Legal Judgment
Disrupting Hierarchical Reasoning: Adversarial Protection for Geographic Privacy in Multimodal Reasoning Models
MergePRAG: Orthogonal Merging of Passage-experts for Multi-hop Parametric RAG
Compose Your Policies! Improving Diffusion-based or Flow-based Robot Policies via Test-time Distribution-level Composition
SC-Arena: A Natural Language Benchmark for Single-Cell Reasoning with Knowledge-Augmented Evaluation
Jackpot: Align Actor-Policy Distribution for scalable and stable RL for LLM
Accessible, Realistic, and Fair Evaluation of Positive-Unlabeled Learning Algorithms
Multiverse Mechanica: A Testbed for Learning Game Mechanics via Counterfactual Worlds
Scaling Generalist Data-Analytic Agents
AutoFigure: Generating and Refining Publication-Ready Scientific Illustrations
Beyond Linear Processing: Dendritic Bilinear Integration in Spiking Neural Networks
AgentPO: Enhancing Multi-Agent Collaboration via Reinforcement Learning
PRISM: Progressive Robust Learning for Open-World Continual Category Discovery
Trinity: An Evolved LLM Coordinator
CAR-LoRA: Training Compression-Aware and Robust LoRA Adapters for Evolving LLMs
WaterDrum: Watermark-based Data-centric Unlearning Metric
CryoNet.Refine: A One-step Diffusion Model for Rapid Refinement of Structural Models with Cryo-EM Density Map Restraints
CryoLVM: Self-supervised Learning from Cryo-EM Density Maps with Large Vision Models
RD-HRL: Generating Reliable Sub-Goals for Long-Horizon Sparse-Reward Tasks
The Rank and Gradient Lost in Non-stationarity: Sample Weight Decay for Mitigating Plasticity Loss in Reinforcement Learning
Topological Flow Matching
HATSolver: Learning Gröbner Bases with Hierarchical Attention Transformers
Streaming Visual Geometry Transformer
Test-time Domain Generalization for Image Super-resolution
LDT: Layer-Decomposition Training Makes Networks More Generalizable
DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation
Generalization of Diffusion Models Arises with a Balanced Representation Space
Disentangled representation learning through unsupervised symmetry group discovery
VisionTrim: Unified Vision Token Compression for Training-Free MLLM Acceleration
VideoMind: A Chain-of-LoRA Agent for Temporal-Grounded Video Reasoning
MemoryVLA: Perceptual-Cognitive Memory in Vision-Language-Action Models for Robotic Manipulation
Bridging Explainability and Embeddings: BEE Aware of Spuriousness
Disentangled Hierarchical VAE for 3D Human-Human Interaction Generation
Towards Improved Sentence Representations using Token Graphs
Towards Personalized Deep Research: Benchmarks and Evaluations
Universal Beta Splatting
RAEE: A Robust Retrieval-Augmented Early Exit Framework for Efficient Inference
Stroke3D: Lifting 2D strokes into rigged 3D model via latent diffusion models
VisCoder2: Building Multi-Language Visualization Coding Agents
RAVENEA: A Benchmark for Multimodal Retrieval-Augmented Visual Culture Understanding
Combinatorial Rising Bandits
CFT-RAG: An Entity Tree Based Retrieval Augmented Generation Algorithm With Cuckoo Filter
StyliTruth : Unlocking Stylized yet Truthful LLM Generation via Disentangled Steering
HiFo-Prompt: Prompting with Hindsight and Foresight for LLM-based Automatic Heuristic Design
From Narrow to Panoramic Vision: Attention-Guided Cold-Start Reshapes Multimodal Reasoning
DistMLIP: A Distributed Inference Platform for Machine Learning Interatomic Potentials
WavefrontDiffusion: Dynamic Decoding Schedule for Improved Reasoning
Strongly Convex Sets in Riemannian Manifolds
High-Probability Bounds for the Last Iterate of Clipped SGD
TRACEDET: HALLUCINATION DETECTION FROM THE DECODING TRACE OF DIFFUSION LARGE LANGUAGE MODELS
Setting up for failure: automatic discovery of the neural mechanisms of cognitive errors
Multi-state Protein Design with DynamicMPNN
Distributions as Actions: A Unified Framework for Diverse Action Spaces
Count Bridges enable Modeling and Deconvolving Transcriptomics
Lookahead Tree-Based Rollouts for Enhanced Trajectory-Level Exploration in Reinforcement Learning with Verifiable Rewards
Causal-Steer: Disentangled Continuous Style Control without Parallel Corpora
Transformers as Unsupervised Learning Algorithms: A study on Gaussian Mixtures
MULTIMODALITY AS SUPERVISION: SELF-SUPERVISED SPECIALIZATION TO THE TEST ENVIRONMENT VIA MULTIMODALITY
An Agentic Framework with LLMs for Solving Complex Vehicle Routing Problems
SNAPHARD CONTRAST LEARNING
Pusa V1.0: Unlocking Temporal Control in Pretrained Video Diffusion Models via Vectorized Timestep Adaptation
Representation Alignment for Diffusion Transformers without External Components
GARLIC: Graph Attention-based Relational Learning of Multivariate Time Series in Intensive Care
Identity-Free Deferral For Unseen Experts
Textual Equilibrium Propagation for Deep Compound AI Systems
Diverse and Sparse Mixture-of-Experts for Causal Subgraph–Based Out-of-Distribution Graph Learning
TOUCH: Text-guided Controllable Generation of Free-Form Hand-Object Interactions
Optimal Brain Restoration for Joint Quantization and Sparsification of LLMs
QuantSparse: Comprehensively Compressing Video Diffusion Transformer with Model Quantization and Attention Sparsification
LD-MoLE: Learnable Dynamic Routing for Mixture of LoRA Experts
WebShaper: Agentically Data Synthesizing via Information-Seeking Formalization
Repurposing Synthetic Data for Fine-grained Search Agent Supervision
Beyond Text-Only: Towards Multimodal Table Retrieval in Open-World
Real-Time Motion-Controllable Autoregressive Video Diffusion
(Token-Level) \textbf{InfoRMIA}: Stronger Membership Inference and Privacy Assessment for LLMs
CPQS-Tuning: A Model Self-Perception-Based Data Filtering Algorithm for Efficient Instruction Fine-Tuning
ReIn: Conversational Error Recovery with Reasoning Inception
Priors in time: Missing inductive biases for language model interpretability
DoFlow: Flow-based Generative Models for Interventional and Counterfactual Forecasting on Time Series
NeMo-map: Neural Implicit Flow Fields for Spatio-Temporal Motion Mapping
Meta-RL Induces Exploration in Language Agents
Guided Query Refinement: Multimodal Hybrid Retrieval with Test-Time Optimization
TFHE-Coder: Evaluating LLM Agents for secure Fully Homomorphic Encryption Code Generation
TPDiff: Temporal Pyramid Video Diffusion Model
Pallatom-Ligand: an All-Atom Diffusion Model for Designing Ligand-Binding Proteins
Antibody: Strengthening Defense Against Harmful Fine-Tuning for Large Language Models via Attenuating Harmful Gradient Influence
Flow Matching with Semidiscrete Couplings
APT: Towards Universal Scene Graph Generation via Plug-in Adaptive Prompt Tuning
Linking Process to Outcome: Conditional Reward Modeling for LLM Reasoning
Measuring Uncertainty Calibration
Learning Physics-Grounded 4D Dynamics with Neural Gaussian Force Fields
SynthWorlds: Controlled Parallel Worlds for Disentangling Reasoning and Knowledge in Language Models
Retain and Adapt: Auto-Balanced Model Editing for Open-Vocabulary Object Detection under Domain Shifts
Forward-Learned Discrete Diffusion: Learning how to noise to denoise faster
ACE-Bench: Benchmarking Agentic Coding in End-to-End Development of Complex Features
Learning What Reinforcement Learning Can't: Interleaved Online Fine-Tuning for Hardest Questions
MolEditRL: Structure-Preserving Molecular Editing via Discrete Diffusion and Reinforcement Learning
MaskInversion: Localized Embeddings via Optimization of Explainability Maps
Computer Agent Arena: Toward Human-Centric Evaluation and Analysis of Computer-Use Agents
Sample Reward Soups: Query-efficient Multi-Reward Guidance for Text-to-Image Diffusion Models
Prompt-Robust Vision-Language Models via Meta-Finetuning
Visual Prompt-Agnostic Evolution
LCA: Local Classifier Alignment for Continual Learning
Align-Then-stEer: Adapting the Vision-Language Action Models through Unified Latent Guidance
ManipEvalAgent: Promptable and Efficient Evaluation Framework for Robotic Manipulation Policies
ScaleCap: Scalable Image Captioning via Dual-Modality Debiasing
TreeGRPO: Tree-Advantage GRPO for Online RL Post-Training of Diffusion Models
ALM-MTA: Front-Door Causal Multi-Touch Attribution Method for Creator-Ecosystem Optimization
Reducing Contextual Stochastic Bilevel Optimization via Structured Function Approximation
Hinge Regression Tree: A Newton Method for Oblique Regression Tree Splitting
A Simple "Motivation" Can Enhance Reinforcement Finetuning of Large Reasoning Models
The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs
PLoP: Precise LoRA Placement for Efficient Finetuning of Large Models
PGRF-Net: A Prototype-Guided Relational Fusion Network for Diagnostic Multivariate Time-Series Anomaly Detection
MomaGraph: State-Aware Unified Scene Graphs with Vision-Language Models for Embodied Task Planning
CMPhysBench: A Benchmark for Evaluating Large Language Models in Condensed Matter Physics
FATE: A Formal Benchmark Series for Frontier Algebra of Multiple Difficulty Levels
Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding
Temporally Detailed Hypergraph Neural ODE for Type 2 Diabetes Progression Modeling
TetraGT: Tetrahedral Geometry-Driven Explicit Token Interactions with Graph Transformer for Molecular Representation Learning
DUET: Optimizing Training Data Mixtures via Coarse, Noisy Feedback from Unseen Evaluation Tasks
Batch and Sequential Unlearning for Neural Networks
MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents
Sat3DGen: Comprehensive Street-Level 3D Scene Generation from Single Satellite Image
Precise and Interpretable Editing of Code Knowledge in Large Language Models
HWC-Loco: A Hierarchical Whole-Body Control Approach to Robust Humanoid Locomotion
DeCo-DETR: Decoupled Cognition DETR for efficient Open-Vocabulary Object Detection
One2Scene: Geometric Consistent Explorable 3D Scene Generation from a Single Image
On the Convergence Direction of Gradient Descent
Revisiting Weight Regularization for Low-Rank Continual Learning
TEDM: Time Series Forecasting with Elucidated Diffusion Models
Graph Signal Processing Meets Mamba2: Adaptive Filter Bank via Delta Modulation
Bradley-Terry and Multi-Objective Reward Modeling Are Complementary
Bottlenecked Transformers: Periodic KV Cache Consolidation for Generalised Reasoning
Escaping Low-Rank Traps: Interpretable Visual Concept Learning via Implicit Vector Quantization
Pose-RFT: Aligning MLLMs for 3D Pose Generation via Hybrid Action Reinforcement Fine-Tuning
TraPO: A Semi-Supervised Reinforcement Learning Framework for Boosting LLM Reasoning
Mini-cluster Guided Long-tailed Deep Clustering
Quotient-Space Diffusion Model
Unveiling Downstream Performance Scaling of LLMs: A Clustering-Based Perspective
EAST: Early Action Prediction Sampling Strategy with Token Masking
Expanding Reasoning Potential in Foundation Model by Learning Diverse Chains of Thought Patterns
Beyond Speedup - Utilizing KV Cache for Sampling and Reasoning
A$^2$Search: Ambiguity-Aware Question Answering with Reinforcement Learning
IVC-Prune: Revealing the Implicit Visual Coordinates in LVLMs for Vision Token Pruning
Belief-Based Offline Reinforcement Learning for Delay-Robust Policy Optimization
Can LLMs Move Beyond Short Exchanges to Realistic Therapy Conversations?
The Markovian Thinker
DVD-Quant: Data-free Video Diffusion Transformers Quantization
Learning from Synthetic Data Improves Multi-hop Reasoning
Enhancing Persona Following at Decoding Time via Dynamic Importance Estimation for Role-Playing Agents
Diffusion & Adversarial Schrödinger Bridges via Iterative Proportional Markovian Fitting
Three Forward, One Backward: Memory-Efficient Full-Rank Fine-Tuning of Large Models via Extra Forward Passes
FlowSearcher: Synthesizing Memory-Guided Agentic Workflows for Web Information Seeking
Graph-of-Agents: A Graph-based Framework for Multi-Agent LLM Collaboration
On the Limits of Sparse Autoencoders: A Theoretical Framework and Reweighted Remedy
The Price of Amortized inference in Sparse Autoencoders
Dataless Weight Disentanglement in Task Arithmetic via Kronecker-Factored Approximate Curvature
DA$^2$: Depth Anything in Any Direction
OPPO: Accelerating PPO-based RLHF via Pipeline Overlap
SLM-MUX: Orchestrating Small Language Models for Reasoning
When Reasoning Meets Compression: Understanding the Effects of LLMs Compression on Large Reasoning Models
Step-Aware Residual-Guided Diffusion for EEG Spatial Super-Resolution
Contextual Causal Bayesian Optimisation
Learning Structure-Semantic Evolution Trajectories for Graph Domain Adaptation
DualToken: Towards Unifying Visual Understanding and Generation with Dual Visual Vocabularies
Towards Safe Reasoning in Large Reasoning Models via Corrective Intervention
Large Scale Diffusion Distillation via Score-Regularized Continuous-Time Consistency
Your Models Have Thought Enough: Training Large Reasoning Models to Stop Overthinking
Uncovering Conceptual Blindspots in Generative Image Models Using Sparse Autoencoders
Welfarist Formulations for Diverse Similarity Search
MILCO: Learned Sparse Retrieval Across Languages via a Multilingual Connector
Latent Thinking Optimization: Your Latent Reasoning Language Model Secretly Encodes Reward Signals in its Latent Thoughts
UME-R1: Exploring Reasoning-Driven Generative Multimodal Embeddings
Adaptive Thinking: Large Language Models Know When to Think in Latent Space
Numerion: A Multi-Hypercomplex Model for Time Series Forecasting
Understanding and Improving Hyperbolic Deep Reinforcement Learning
Preserve and Personalize: Personalized Text-to-Image Diffusion Models without Distributional Drift
Knowledge Reasoning Language Model: Unifying Knowledge and Language for Inductive Knowledge Graph Reasoning
LipNeXt: Scaling up Lipschitz-based Certified Robustness to Billion-parameter Models
Hyden: A Hybrid Dual-Path Encoder for Monocular Geometry of High-resolution Images
Spherical Watermark: Encryption-Free, Lossless Watermarking for Diffusion Models
Relational Graph Transformer
Loc$^{2}$: Interpretable Cross-View Localization via Depth-Lifted Local Feature Matching
Enhancing Agentic Search via Data Synthesis on Hierarchical Constraint Satisfaction
ATPO: ADAPTIVE TREE POLICY OPTIMIZATION FOR MULTI-TURN MEDICAL DIALOGUE
Robust Adaptive Multi-Step Predictive Shielding
HiDivDrop: Vision Token Reduction in MLLMs via Late Injection and Differentiable Top-K
MMedAgent-RL: Optimizing Multi-Agent Collaboration for Multimodal Medical Reasoning
EgoNight: Towards Egocentric Vision Understanding at Night with a Challenging Benchmark
Lumos-1: On Autoregressive Video Generation with Discrete Diffusion from a Unified Model Perspective
CyberGym: Evaluating AI Agents' Real-World Cybersecurity Capabilities at Scale
Diverse Text Decoding via Iterative Reweighting
EnvSocial-Diff: A Diffusion-Based Crowd Simulation Model with Environmental Conditioning and Individual- Group Interaction
Contextual and Seasonal LSTMs for Time Series Anomaly Detection
Multi-Action Self-Improvement For Neural Combinatorial Optimization
HoloPart: Generative 3D Part Amodal Segmentation
On the Impact of the Utility in Semivalue-based Data Valuation
SOSBENCH: Benchmarking Safety Alignment on Scientific Knowledge
Adaptive gradient descent on Riemannian manifolds and its applications to Gaussian variational inference
Asymmetric Synthetic Data Update for Domain Incremental Dataset Distillation
Parameterization-Based Dataset Distillation of 3D Point Clouds through Learnable Shape Morphing
ELViS: Efficient Visual Similarity from Local Descriptors that Generalizes Across Domains
Opponent Shaping in LLM Agents
Neural Networks Learn Multi-Index Models Near the Information-Theoretic Limit
Leveraging Data to Say No: Memory Augmented Plug-and-Play Selective Prediction
Reshaping Reasoning in LLMs: A Theoretical Analysis of RL Training Dynamics through Pattern Selection
ViTSP: A Vision Language Models Guided Framework for Large-Scale Traveling Salesman Problems
Multi-agent Coordination via Flow Matching
Circuit Insights: Towards Interpretability Beyond Activations
Difficult Examples Hurt Unsupervised Contrastive Learning: A Theoretical Perspective
Transformers as a Measure-Theoretic Associative Memory: A Statistical Perspective
PARD: Accelerating LLM Inference with Low‑Cost PARallel Draft Model Adaptation
KVComm: Enabling Efficient LLM Communication through Selective KV Sharing
Tight Bounds for Schrodinger Potential Estimation in Unpaired Data Translation
Towards Self-Evolving Agent Benchmarks : Validatable Agent Trajectory via Test-Time Exploration
Best-of-N through the Smoothing Lens: KL Divergence and Regret Analysis
Stopping Computation for Converged Tokens in Masked Diffusion-LM Decoding
High Probability Bounds for Non-Convex Stochastic Optimization with Momentum
RFEval: Benchmarking Reasoning Faithfulness under Counterfactual Reasoning Intervention in Large Reasoning Models
Can Small Training Runs Reliably Guide Data Curation? Rethinking Proxy-Model Practice
3D Scene Prompting for Scene-Consistent Camera-Controllable Video Generation
Membership Privacy Risks of Sharpness Aware Minimization
Temporal Slowness in Central Vision Drives Semantic Object Learning
Differentiable Simulation of Hard Contacts with Soft Gradients for Learning and Control
Mitigating Non-IID Drift in Zeroth-Order Federated LLM Fine-Tuning with Transferable Sparsity
Helix: Evolutionary Reinforcement Learning for Open-Ended Scientific Problem Solving
Learning to Grasp Anything By Playing with Random Toys
Dynamic Texture Modeling of 3D Clothed Gaussian Avatars from a Single Video
ArtVIP: Articulated Digital Assets of Visual Realism, Modular Interaction, and Physical Fidelity for Robot Learning
3DCS: Datasets and Benchmark for Evaluating Conformational Sensitivity in Molecular Representations
ToonComposer: Streamlining Cartoon Production with Generative Post-Keyframing
SFBD-OMNI: Bridge models for lossy measurement restoration with limited clean samples
MICLIP: Learning to Interpret Representation in Vision Models
Inferring the Invisible: Neuro-Symbolic Rule Discovery for Missing Value Imputation
KDP: Simplifying Representation Dynamics in Kernel Space
Histopathology-Genomics Multi-modal Structural Representation Learning for Data-Efficient Precision Oncology
Deep SPI: Safe Policy Improvement via World Models
From ``Sure" to ``Sorry": Detecting Jailbreak in Large Vision Language Model via JailNeurons
Feedback-driven recurrent quantum neural network universality
Plug-and-Play Compositionality for Boosting Continual Learning with Foundation Models
Routing Matters in MoE: Scaling Diffusion Transformers with Explicit Routing Guidance
LEXam: Benchmarking Legal Reasoning on 340 Law Exams
A Probabilistic Hard Concept Bottleneck for Steerable Generative Models
AlphaSteer: Learning Refusal Steering with Principled Null-Space Constraint
IF-VidCap: Can Video Caption Models Follow Instructions?
Secret-Protected Evolution for Differentially Private Synthetic Text Generation
Master Skill Learning with Policy-Grounded Synergy of LLM-based Reward Shaping and Exploring
Johnson-Lindenstrauss Lemma Guided Network for Efficient 3D Medical Segmentation
GRADIEND: Feature Learning within Neural Networks Exemplified through Biases
pi-Flow: Policy-Based Few-Step Generation via Imitation Distillation
Eliminating VAE for Fast and High-Resolution Generative Detail Restoration
Understanding vs. Generation: Navigating Optimization Dilemma in Multimodal Models
WFR-FM: Simulation-Free Dynamic Unbalanced Optimal Transport
SafeMPO: Constrained Reinforcement Learning with Probabilistic Incremental Improvement
MoL: Adaptive Mixture-of-Length Reasoning for Efficient Question Answering with Context
Latent Denoising Makes Good Visual Tokenizers
Learning Patient-Specific Disease Dynamics With Latent Flow Matching For Longitudinal Imaging Generation
Conformalized Survival Counterfactuals Prediction for General Right-Censored Data
DCFold: Efficient Protein Structure Generation with Single Forward Pass
MindMix: A Multimodal Foundation Model for Auditory Perception Decoding via Deep Neural-Acoustic Alignment
MOLM: Mixture of LoRA Markers
Look Back to Reason Forward: Revisitable Memory for Long-Context LLM Agents
Physically-Guided Optical Inversion Enable Non-Contact Side-Channel Attack on Isolated Screens
Patronus: Interpretable Diffusion Models with Prototypes
DIVERSE: Disagreement-Inducing Vector Evolution for Rashomon Set Exploration
Enhancing Diffusion-Based Sampling with Molecular Collective Variables
SigmaDock: Untwisting Molecular Docking with Fragment-Based SE(3) Diffusion
Testing Most Influential Sets
ODE-GS: Latent ODEs for Dynamic Scene Extrapolation with 3D Gaussian Splatting
DSA: Efficient Inference For Video Generation Models via Distributed Sparse Attention
SSDi8: Accurate and Efficient 8-bit Quantization for State Space Duality
ParoQuant: Pairwise Rotation Quantization for Efficient Reasoning LLM Inference
Dual-Objective Reinforcement Learning with Novel Hamilton-Jacobi-Bellman Formulations
Difference Predictive Coding for Training Spiking Neural Networks
CoT-RVS: Zero-Shot Chain-of-Thought Reasoning Segmentation for Videos
Neural Multi-Objective Combinatorial Optimization for Flexible Job Shop Scheduling Problems
Practical estimation of the optimal classification error with soft labels and calibration
GraphShield: Graph-Theoretic Modeling of Network-Level Dynamics for Robust Jailbreak Detection
Unleashing Scientific Reasoning for Bio-experimental Protocol Generation via Structured Component-based Reward Mechanism
Vivid-VR: Distilling Concepts from Text-to-Video Diffusion Transformer for Photorealistic Video Restoration
Dynamical properties of dense associative memory
TaCo: A Benchmark for Lossless and Lossy Codecs of Heterogeneous Tactile Data
Price of Quality: Sufficient Conditions for Sparse Recovery using Mixed-Quality Data
FOCUS: Efficient Keyframe Selection for Long Video Understanding
Fast-dLLM v2: Efficient Block-Diffusion LLM
Graph Mixing Additive Networks
GAR: Generative Adversarial Reinforcement Learning for Formal Theorem Proving
MrRoPE: Mixed-radix Rotary Position Embedding
Theory-Grounded Evaluation of Human-Like Fallacy Patterns in LLM Reasoning
Capability-Based Scaling Laws for LLM-Based Red-Teaming
Learning Mixtures of Linear Dynamical Systems (MoLDS) via Hybrid Tensor–EM Method
Protection against Source Inference Attacks in Federated Learning
Projected Coupled Diffusion for Test-Time Constrained Joint Generation
Tensor learning with orthogonal, Lorentz, and symplectic symmetries
VisioMath: Benchmarking Figure-based Mathematical Reasoning in LMMs
Random-projection ensemble dimension reduction
Diversity-Enhanced Reasoning for Subjective Questions
MedAraBench: Large-scale Arabic Medical Question Answering Dataset and Benchmark
The Lie of the Average: How Class Incremental Learning Evaluation Deceives You?
SPG: Sandwiched Policy Gradient for Masked Diffusion Language Models
Geometry-aware 4D Video Generation for Robot Manipulation
Scaling Knowledge Editing in LLMs to 100,000 Facts with Neural KV Database
StableToken: A Noise-Robust Semantic Speech Tokenizer for Resilient SpeechLLMs
Bridging Piano Transcription and Rendering via Disentangled Score Content and Style
MathFimer: Enhancing Mathematical Reasoning by Expanding Reasoning Steps through Fill-in-the-Middle Task
Emergent Discrete Controller Modules for Symbolic Planning in Transformers
BioX-Bridge: Model Bridging for Unsupervised Cross-Modal Knowledge Transfer across Biosignals
D-REX: Differentiable Real-to-Sim-to-Real Engine for Learning Dexterous Grasping
Semantic Uncertainty Quantification of Hallucinations in LLMs: A Quantum Tensor Network Based Method
Model Already Knows the Best Noise: Bayesian Active Noise Selection via Attention in Video Diffusion Model
Query-Level Uncertainty in Large Language Models
Dynamic Novel View Synthesis in High Dynamic Range
The Deleuzian Representation Hypothesis
StreamingThinker: Large Language Models Can Think While Reading
Multi-Object System Identification from Videos
DPad: Efficient Diffusion Language Models with Suffix Dropout
Explainable $ K $-means Neural Networks for Multi-view Clustering
ReWatch-R1: Boosting Complex Video Reasoning in Large Vision-Language Models through Agentic Data Synthesis
Common Corpus: The Largest Collection of Ethical Data for LLM Pre-Training
Convergent Differential Privacy Analysis for General Federated Learning
CoFact: Conformal Factuality Guarantees for Language Models under Distribution Shift
AQER: A Scalable and Efficient Data Loader for Digital Quantum Computers
The State of Reinforcement Finetuning for Transformer-based Generative Agents
Personalized Feature Translation for Expression Recognition: An Efficient Source-Free Domain Adaptation Method
Towards a Theoretical Understanding of In-context Learning: Stability and Non-I.I.D Generalisation
OptMerge: Unifying Multimodal LLM Capabilities and Modalities via Model Merging
SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward
LeSTD: LLM Compression via Learning-based Sparse Tensor Decomposition
Efficient Resource-Constrained Training of Vision Transformers via Subspace Optimization
Strategic Scaling of Test-Time Compute: A Bandit Learning Approach
SimpleFold: Folding Proteins is Simpler than You Think
Semantic Visual Anomaly Detection and Reasoning in AI-Generated Images
MoGA: Mixture-of-Groups Attention for End-to-End Long Video Generation
Understanding Dataset Distillation via Spectral Filtering
Machine Unlearning under Retain–Forget Entanglement
Not All Clients Are Equal: Collaborative Model Personalization on Heterogeneous Multi-Modal Clients
Exposing and Defending the Achilles' Heel of Video Mixture-of-Experts
Latent Concept Disentanglement in Transformer-based Language Models
GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models
AutoSP: Unlocking Long-Context LLM Training Via Compiler-Based Sequence Parallelism
Sparkle: A Robust and Versatile Representation for Point Cloud-based Human Motion Capture
Adapting Self-Supervised Representations as a Latent Space for Efficient Generation
RuleReasoner: Reinforced Rule-based Reasoning via Domain-aware Dynamic Sampling
Map as a Prompt: Learning Multi-Modal Spatial-Signal Foundation Models for Cross-scenario Wireless Localization
Off-Policy Evaluation for Ranking Policies under Deterministic Logging Policies
SpikeGen: Decoupled “Rods and Cones” Visual Representation Processing with Latent Generative Framework
ChinaTravel: An Open-Ended Travel Planning Benchmark with Compositional Constraint Validation for Language Agents
Overlap-weighted orthogonal meta-learner for treatment effect estimation over time
Neural+Symbolic Approaches for Interpretable Actor-Critic Reinforcement Learning
Retro*: Optimizing LLMs for Reasoning-Intensive Document Retrieval
Learning Ordinal Probabilistic Reward from Preferences
Video-LevelGauge: Investigating Contextual Positional Bias in Video Language Models.
Multi-Feature Quantized Self-Attention for Fair Large Language Models
CogniLoad: A Synthetic Natural Language Reasoning Benchmark With Tunable Length, Intrinsic Difficulty, and Distractor Density
RoboCasa365: A Large-Scale Simulation Framework for Training and Benchmarking Generalist Robots
VoG: Enhancing LLM Reasoning through Stepwise Verification on Knowledge Graphs
Bandits with Single-Peaked Preferences and Limited Resources
Efficient Benchmarking of Functional Connectivity Modeling via Structure-aware Core-set Selection
Test-Time Adaptation without Source Data for Out-of-Domain Bioactivity Prediction
MaskPro: Linear-Space Probabilistic Learning for Strict (N:M)-Sparsity on LLMs
TusoAI: Agentic Optimization for Scientific Methods
LoC-Decomp: LLM Autoformalization via Logical Concept Decomposition and Iterative Feedback Correction
Geometric Image Editing via Effects-Sensitive In-Context Inpainting with Diffusion Transformers
When and Where to Reset Matters for Long-Term Test-Time Adaptation
Sublinear Spectral Clustering Oracle with Little Memory
Context and Diversity Matter: The Emergence of In-Context Learning in World Models
Efficient Prediction of Large Protein Complexes via Subunit-Guided Hierarchical Refinement
Gaussian certified unlearning in high dimensions: A hypothesis testing approach
Semantic-Aware Diffusion LLM Inference With Adaptive Block Size
DR-Submodular Maximization with Stochastic Biased Gradients: Classical and Quantum Gradient Algorithms
Larger Datasets Can Be Repeated More: A Theoretical Analysis of Multi-Epoch Scaling in Linear Regression
ATLAS: Adaptive Transfer Scaling Laws for Multilingual Pretraining, Finetuning, and Decoding the Curse of Multilinguality
TiTok: Transfer Token-level Knowledge via Contrastive Excess to Transplant LoRA
VERINA: Benchmarking Verifiable Code Generation
Automated Formalization via Conceptual Retrieval-Augmented LLMs
COMPACT: COMPositional Atomic-to-Complex Visual Capability Tuning
Orbital Transformers for Predicting Wavefunctions in Time-Dependent Density Functional Theory
PLAGUE: Plug-and-play Framework for Lifelong Adaptive Generation of Multi-turn Exploits
Reassessing Layer Pruning in LLMs: New Insights and Methods
Seeing Through Deception: Uncovering Misleading Creator Intent in Multimodal News with Vision-Language Models
MAGREF: Masked Guidance for Any-Reference Video Generation with Subject Disentanglement
TS-Attn: Temporal-wise Separable Attention for Multi-Event Video Generation
PD$^{2}$GS: Part-Level Decoupling and Continuous Deformation of Articulated Objects via Gaussian Splatting
Explainable Mixture Models through Differentiable Rule Learning
WSM: Decay-Free Learning Rate Schedule via Checkpoint Merging for LLM Pre-training
Vid2World: Crafting Video Diffusion Models to Interactive World Models
AI-for-Science Low-code Platform with Bayesian Adversarial Multi-Agent Framework
Densemarks: Learning Canonical Embeddings for Human Heads Images via Point Tracks
DIFFSPARSE: ACCELERATING DIFFUSION TRANSFORMERS WITH LEARNED TOKEN SPARSITY
Causal Discovery via Quantile Partial Effect
RAG4DMC: Retrieval-Augmented Generation for Data-Level Modality Completion
The Alignment Auditor: A Bayesian Framework for Verifying and Refining LLM Objectives
Target-Aware Video Diffusion Models
ContextNav: Towards Agentic Multimodal In-Context Learning
The First Impression Problem: Internal Bias Triggers Overthinking in Reasoning Models
DVLA-RL: Dual-Level Vision–Language Alignment with Reinforcement Learning Gating for Few-Shot Learning
LLaVA-FA: Learning Fourier Approximation for Compressing Large Multimodal Models
Articulation in Motion: Prior-free Part Mobility Analysis for Articulated Objects By Dynamic-Static Disentanglement
Achieving Expert-Level Agent from Foundation Model via Complexity Curriculum Reinforcement Learning with Synthetic Data
Generative Diffusion Prior Distillation for Long-Context Knowledge Transfer
WoW!: World Models in a Closed-Loop World
Learning Part-Aware Dense 3D Feature Field For Generalizable Articulated Object Manipulation
CompoDistill: Attention Distillation for Compositional Reasoning in Multimodal LLMs
Topology-Preserved Auto-regressive Mesh Generation in the Manner of Weaving Silk
IV-Bench: A Benchmark for Image-Grounded Video Perception and Reasoning in Multimodal LLMs
UnLoc: Leveraging Depth Uncertainties for Floorplan Localization
Dataset Color Quantization: A Training-Oriented Framework for Dataset-Level Compression
CPiRi: Channel Permutation-Invariant Relational Interaction for Multivariate Time Series Forecasting
Mathesis: Towards Formal Theorem Proving from Natural Languages
CLUE: Conflict-guided Localization for LLM Unlearning Framework
ST-HHOL: Spatio-Temporal Hierarchical Hypergraph Online Learning for Crime Prediction
Advancing Spatiotemporal Representations in Spiking Neural Networks via Parametric Invertible Transformation
GeoBench: Rethinking Multimodal Geometric Problem-Solving via Hierarchical Evaluation
AutoTool: Automatic Scaling of Tool-Use Capabilities in RL via Decoupled Entropy Constraints
ReconViaGen: Towards Accurate Multi-view 3D Object Reconstruction via Generation
VibeVoice: Expressive Podcast Generation with Next-Token Diffusion
Safe Exploration via Policy Priors
Geometric-Mean Policy Optimization
Enabling arbitrary inference in spatio-temporal dynamic systems: A physics-inspired perspective
BoreaRL: A Multi-Objective Reinforcement Learning Environment for Climate-Adaptive Boreal Forest Management
OmniCVR: A Benchmark for Omni-Composed Video Retrieval with Vision, Audio, and Text
Multi-Synaptic Cooperation: A Bio-Inspired Framework for Robust and Scalable Continual Learning
RECON: Robust symmetry discovery via Explicit Canonical Orientation Normalization
KnowProxy: Adapting Large Language Models by Knowledge-guided Proxy
DeRaDiff: Denoising Time Realignment of Diffusion Models
Optimal Transport-Induced Samples against Out-of-Distribution Overconfidence
Human-MME: A Holistic Evaluation Benchmark for Human-Centric Multimodal Large Language Models
Learning linear state-space models with sparse system matrices
FreqKV: Key-Value Compression in Frequency Domain for Context Window Extension
UniLiP: Adapting CLIP for Unified Multimodal Understanding, Generation and Editing
Vision-Language-Action Instruction Tuning: From Understanding to Manipulation
Lightweight Spatio-Temporal Modeling via Temporally Shifted Distillation for Real-Time Accident Anticipation
VL-JEPA: Joint Embedding Predictive Architecture for Vision-language
DeepScientist: Advancing Frontier-Pushing Scientific Findings Progressively
Information Estimation with Discrete Diffusion
Directional Textual Inversion for Personalized Text-to-Image Generation
Exploration v.s. Exploitation: Rethinking RLVR through Clipping, Entropy, and Spurious Reward
The Pensieve Paradigm: Stateful Language Models with Learned Memory Management
TINKER: Diffusion's Gift to 3D--Multi-View Consistent Editing From Sparse Inputs without Per-Scene Optimization
Less Gaussians, Texture More: 4K Feed-Forward Textured Splatting
What "Not" to Detect: Negation-Aware VLMs via Structured Reasoning and Token Merging
Time Is a Feature: Exploiting Temporal Dynamics in Diffusion Language Models
All-day Multi-scenes Lifelong Vision-and-Language Navigation with Tucker Adaptation
Advancing End-to-End Pixel-Space Generative Modeling via Self-Supervised Pre-Training
Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation
On Discriminative vs. Generative classifiers: Rethinking MLLMs for Action Understanding
Helmsman: Autonomous Synthesis of Federated Learning Systems via Multi-Agent Collaboration
Supervised Fine-Tuning or Contrastive Learning? Towards Better Multimodal LLM Reranking
VEAttack: Downstream-agnostic Vision Encoder Attack against Large Vision Language Models
Bridging Past and Future: Distribution-Aware Alignment for Time Series Forecasting
GOT-Edit: Geometry-Aware Generic Object Tracking via Online Model Editing
UniEdit-Flow: Unleashing Inversion and Editing in the Era of Flow Models
HEAPr: Hessian-based Efficient Atomic Expert Pruning in Output Space
Purifying Generative LLMs from Backdoors without Prior Knowledge or Clean Reference
REAL: Reading Out Transformer Activations for Precise Localization in Language Model Steering
Locality-aware Parallel Decoding for Efficient Autoregressive Image Generation
Chunking the Critic: A Transformer-based Soft Actor-Critic with N-Step Returns
IceCache: Memory-Efficient KV-cache Management for Long-Sequence LLMs
Diverse Dictionary Learning
Uni-X: Mitigating Modality Conflict with a Two-End-Separated Architecture for Unified Multimodal Models
Learning Dynamic Causal Graphs Under Parametric Uncertainty via Polynomial Chaos Expansions
Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test
StreamingVLM: Real-Time Understanding for Infinite Video Streams
MotionSight: Boosting Fine-Grained Motion Understanding in Multimodal LLMs
Positional Encoding Field
Value Gradient Flow: Behavior-Regularized RL without Regularization
KBVQ-MoE: KLT-guided SVD with Bias-Corrected Vector Quantization for MoE Large Language Models
RIG: Synergizing Reasoning and Imagination in End-to-End Generalist Policy
SAES-SVD: Self-Adaptive Suppression of Accumulated and Local Errors for SVD-based LLM Compression
RAR: Reversing Visual Attention Re-Sinking for Unlocking Potential in Multimodal Large Language Models
Ringleader ASGD: The First Asynchronous SGD with Optimal Time Complexity under Data Heterogeneity
Towards One-step Causal Video Generation via Adversarial Self-Distillation
Demystifying Supervision Data Generalization in Multimodal LMs
DexMove: Learning Tactile-Guided Non-Prehensile Manipulation with Dexterous Hands
Native Adaptive Solution Expansion for Diffusion-based Combinatorial Optimization
FeDaL: Federated Dataset Learning for General Time Series Foundation Models
MergeMix: A Unified Augmentation Paradigm for Visual and Multi-Modal Understanding
Figma2Code: Automating Multimodal Design to Code in the Wild
SPEED: Scalable, Precise, and Efficient Concept Erasure for Diffusion Models
PT$^2$-LLM: Post-Training Ternarization for Large Language Models
CylinderSplat: 3D Gaussian Splatting with Cylindrical Triplanes for Panoramic Novel View Synthesis
Fast Data Mixture Optimization via Gradient Descent
LogicXGNN: Grounded Logical Rules for Explaining Graph Neural Networks
Expert Merging: Model Merging with Unsupervised Expert Alignment and Importance-Guided Layer Chunking
Cultivating Pluralism In Algorithmic Monoculture: The Community Alignment Dataset
Train-before-Test Harmonizes Language Model Rankings
FreeViS: Training-free Video Stylization with Inconsistent References
Scenethesis: A Language and Vision Agentic Framework for 3D Scene Generation
3DGEER: 3D Gaussian Rendering Made Exact and Efficient for Generic Cameras
Aligning Visual Foundation Encoders to Tokenizers for Diffusion Models
T-TAMER: Provably Taming Trade-offs in ML Serving
Near-Optimal Online Deployment and Routing for Streaming LLMs
Pay Less Attention to Function Words for Free Robustness of Vision-Language Models
Rethinking Model Calibration through Spectral Entropy Regularization in Medical Image Segmentation
Journey to the Centre of Cluster: Harnessing Interior Nodes for A/B Testing under Network Interference
Adaptive Nonlinear Compression for Large Foundation Models
LLaVA-4D: Embedding SpatioTemporal Prompt into LMMs for 4D Scene Understanding
Breaking the Correlation Plateau: On the Optimization and Capacity Limits of Attention-Based Regressors
DecAlign: Hierarchical Cross-Modal Alignment for Decoupled Multimodal Representation Learning
Risk-Sensitive Reinforcement Learning for Alleviating Exploration Dilemmas in Large Language Models
Anime-Ready: Controllable 3D Anime Character Generation with Body-Aligned Component-Wise Garment Modeling
Embedding-Based Context-Aware Reranker
Knowledge Fusion of Large Language Models via Modular SkillPacks
Scheduling Your LLM Reinforcement Learning with Reasoning Trees
SYNC: Measuring and Advancing Synthesizability in Structure-Based Drug Design
Operator Learning with Domain Decomposition for Geometry Generalization in PDE Solving
RLVMR: Reinforcement Learning with Verifiable Meta-Reasoning Rewards for Robust Long-Horizon Agents
VLM4VLA: Revisiting Vision-Language-Models in Vision-Language-Action Models
Stop Guessing: Choosing the Optimization-Consistent Uncertainty Measurement for Evidential Deep Learning
DETR-ViP: Detection Transformer with Robust Discriminative Visual Prompts
AttTok: Marrying Attribute Tokens with Generative Pre-trained Vision-Language Models towards Medical Image Understanding
Data Provenance for Image Auto-Regressive Generation
GenCompositor: Generative Video Compositing with Diffusion Transformer
Natural Identifiers for Privacy and Data Audits in Large Language Models
Rethinking LLM Evaluation: Can We Evaluate LLMs with 200× Less Data?
GIR-Bench: Versatile Benchmark for Generating Images with Reasoning
Consolidating Reinforcement Learning for Multimodal Discrete Diffusion Models
IWR-Bench: Can LVLMs reconstruct interactive webpage from a user interaction video?
JanusVLN: Decoupling Semantics and Spatiality with Dual Implicit Memory for Vision-Language Navigation
ShinkaEvolve: Towards Open-Ended and Sample-Efficient Program Evolution
Paper Copilot: Tracking the Evolution of Peer Review in AI Conferences
Some Neural Networks Inherently Preserve Subspace Clustering Structure
There and Back Again: On the relation between Noise and Image Inversions in Diffusion Models
UniCon: Unified Framework for Efficient Contrastive Alignment via Kernels
CLAP: Unsupervised 3D Representation Learning for Fusion 3D Perception via Curvature Sampling and Prototype Learning
Training-Free Loosely Speculative Decoding: Accepting Semantically Correct Drafts Beyond Exact Match
ProstaTD: Bridging Surgical Triplet from Classification to Fully Supervised Detection
Prior-free Tabular Test-time Adaptation
Towards High Data Efficiency in Reinforcement Learning with Verifiable Reward
LumosX: Relate Any Identities with Their Attributes for Personalized Video Generation
On the Universality and Complexity of GNN for Solving Second-order Cone Programs
DreamPhase: Offline Imagination and Uncertainty-Guided Planning for Large-Language-Model Agents
IGGT: Instance-Grounded Geometry Transformer for Semantic 3D Reconstruction
FARI: Robust One-Step Inversion for Watermarking in Diffusion Models
Stable Video Infinity: Infinite-Length Video Generation with Error Recycling
Newton Method Revisited: Global Convergence Rates up to $O(1/k^3)$ for Stepsize Schedules and Linesearch Procedures
TrajTok: What makes for a good trajectory tokenizer in behavior generation?
ssToken: Self-modulated and Semantic-aware Token Selection for LLM Fine-tuning
FakeXplain: AI-Generated Images Detection via Human-Aligned Grounded Reasoning
FutureMind: Equipping Small Language Models with Strategic Thinking-Pattern Priors via Adaptive Knowledge Distillation
Understanding VLMs Spatial Mental Modeling Capability from Limited Views
QuRL: Low-Precision Reinforcement Learning for Efficient Reasoning
Set Representation Auxiliary Learning with Adversarial Encoding Perturbation and Optimization
RNE: plug-and-play diffusion inference-time control and energy-based training
Improving Diffusion Models for Class-imbalanced Training Data via Capacity Manipulation
Wide-In, Narrow-Out: Revokable Decoding for Efficient and Effective DLLMs
Squeeze the Soaked Sponge: Efficient Off-policy RFT for Large Language Model
LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement Learning
SinkTrack: Attention Sink based Context Anchoring for Large Language Models
ProtoTS: Learning Hierarchical Prototypes for Explainable Time Series Forecasting
Joint Distribution–Informed Shapley Values for Sparse Counterfactual Explanations
The Unseen Frontier: Pushing the Limits of LLM Sparsity with Surrogate-Free ADMM
Watermarking Diffusion Language Models
Transformers Learn Latent Mixture Models In-Context via Mirror Descent
Visual Self-Refine: A Pixel-Guided Paradigm for Accurate Chart Parsing
Reconstruction Alignment Improves Unified Multimodal Models
Linear Mechanisms for Spatiotemporal Reasoning in Vision Language Models
Advancing Universal Deep Learning for Electronic-Structure Hamiltonian Prediction of Materials
dParallel: Learnable Parallel Decoding for dLLMs
Quant-dLLM: Post-Training Extreme Low-Bit Quantization for Diffusion Large Language Models
Diffusion Language Model Knows the Answer Before It Decodes
DISCO: Diversifying Sample Condensation for Accelerating Model Evaluation
Off-Trajectory Reasoning: Can LLMs Collaborate on Reasoning Trajectory?
RAP: 3D Rasterization Augmented End-to-End Planning
PCLR: Progressively Compressed LoRA for Multimodal Continual Instruction Tuning
Dr.LLM: Dynamic Layer Routing in LLMs
Faithfulness Under the Distribution: A New Look at Attribution Evaluation
Uncovering Semantic Selectivity of Latent Groups in Higher Visual Cortex with Mutual Information-Guided Diffusion
Frozen Priors, Fluid Forecasts: Prequential Uncertainty for Low-Data Deployment with Pretrained Generative Models
Learning to Weight Parameters for Data Attribution
Q&C: When Quantization Meets Cache in Efficient Generation
FACM: Flow-Anchored Consistency Models
DexNDM: Closing the Reality Gap for Dexterous In-Hand Rotation via Joint-Wise Neural Dynamics Model
Forget Many, Forget Right: Scalable and Precise Concept Unlearning in Diffusion Models
FlashWorld: High-quality 3D Scene Generation within Seconds
Buffer Matters: Unleashing the Power of Off-Policy Reinforcement Learning in Large Language Model Reasoning
BideDPO: Conditional Image Generation with Simultaneous Text and Condition Alignment
Channel-Aware Mixed-Precision Quantization for Efficient Long-Context Inference
Reasoning-Aligned Perception Decoupling for Scalable Multi-modal Reasoning
VLM-SubtleBench: How Far Are VLMs from Human-Level Subtle Comparative Reasoning?
Texture Vector-Quantization and Reconstruction Aware Prediction for Generative Super-Resolution
Primary-Fine Decoupling for Action Generation in Robotic Imitation
Beyond Spectra: Eigenvector Overlaps in Loss Geometry
Relative Entropy Pathwise Policy Optimization
CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning
ABBA-Adapters: Efficient and Expressive Fine-Tuning of Foundation Models
Safety Subspaces are Not Linearly Distinct: A Fine-Tuning Case Study
Deconstructing Positional Information: From Attention Logits to Training Biases
Enhancing Hallucination Detection through Noise Injection
UniOD: A Universal Model for Outlier Detection across Diverse Domains
RESA: Bringing Back What Sparse Attention Ignores with Residual Estimation
MemGen: Weaving Generative Latent Memory for Self-Evolving Agents
Improving 2D Diffusion Models for 3D Medical Imaging with Inter‑Slice Consistent Stochasticity
Geometric Graph Neural Diffusion for Stable Molecular Dynamics
MVAR: Visual Autoregressive Modeling with Scale and Spatial Markovian Conditioning
From Pixels to Words -- Towards Native Vision-Language Primitives at Scale
Revisit Visual Prompt Tuning: The Expressiveness of Prompt Experts
Doloris: Dual Conditional Diffusion Implicit Bridges with Sparsity Masking Strategy for Unpaired Single-Cell Perturbation Estimation
SatDreamer360: Multiview-Consistent Generation of Ground-Level Scenes from Satellite Imagery
MoBE: Mixture-of-Basis-Experts for Compressing MoE-based LLMs
Pixel-Perfect Puppetry: Precision-Guided Enhancement for Face Image and Video Editing
OffTopicEval: When Large Language Models Enter the Wrong Chat, Almost Always!
Seeing Through Words: Controlling Visual Retrieval Quality with Language
SigLIP-HD by Fine-to-Coarse Supervision
Uncovering Robot Vulnerabilities through Semantic Potential Fields
Characterizing the Discrete Geometry of ReLU Networks
TIPS: Turn-level Information-Potential Reward Shaping for Search-Augmented LLMs
NeRV-Diffusion: Diffuse Implicit Neural Representation for Video Synthesis
Let OOD Feature Exploring Vast Predefined Classifiers
IA2: Alignment with ICL Activations improves Supervised Fine-Tuning
Synthetic Bootstrapped Pretraining
NExT-OMNI: Towards Any-to-Any Omnimodal Foundation Models with Discrete Flow Matching
Consistent Noisy Latent Rewards for Trajectory Preference Optimization in Diffusion Models
WavePolyp: Video Polyp Segmentation via Hierarchical Wavelet-Based Feature Aggregation and Inter-Frame Divergence Perception
JailbreakLoRA: Your Downloaded LoRA from Sharing Platforms might be Unsafe
Free Point-wise Anomaly Detection via Fold-bifurcation
Globally aware optimization with resurgence
CaTs and DAGs: Integrating Directed Acyclic Graphs with Transformers for Causally Constrained Predictions
Distributionally Robust Optimization via Generative Ambiguity Modeling
Agentic Jigsaw Interaction Learning for Enhancing Visual Perception and Reasoning in Vision-Language Models
Self-Aligned Reward: Towards Effective and Efficient Reasoners
ImageDoctor: Diagnosing Text-to-Image Generation via Grounded Image Reasoning
XModBench: Benchmarking Cross-Modal Capabilities and Consistency in Omni-Language Models
MMPD: Diverse Time Series Forecasting via Multi-Mode Patch Diffusion Loss
Motion-R1: Enhancing Motion Generation with Decomposed Chain-of-Thought and RL Binding
Joint Optimization for 4D Human-Scene Reconstruction in the Wild
EgoHandICL: Egocentric 3D Hand Reconstruction with In-Context Learning
MAGE: Multi-scale Autoregressive Generation for Offline Reinforcement Learning
Taming Momentum: Rethinking Optimizer States Through Low-Rank Approximation
Relative Value Learning
Training-Free Reward-Guided Image Editing via Trajectory Optimal Control
Generative Adversarial Reasoner: Enhancing LLM Reasoning with Adversarial Reinforcement Learning
Quasi-Monte Carlo Methods Enable Extremely Low-Dimensional Deep Generative Models
Split Happens (But Your Video Model Can Be Edited)
Capturing Visual Environment Structure Correlates with Control Performance
CE-Nav: Flow-Guided Reinforcement Refinement for Cross-Embodiment Local Navigation
Beyond Pass@ 1: Self-Play with Variational Problem Synthesis Sustains RLVR
Eliciting Numerical Predictive Distributions of LLMs Without Auto-Regression
Mitigating the Safety Alignment Tax with Null-Space Constrained Policy Optimization
Hourglass Persistence for Graphs, Simplices, and Cells
QuestA: Expanding Reasoning Capacity in LLMs via Question Augmentation
One step further with Monte-Carlo sampler to guide diffusion better
Estimating Worst-Case Frontier Risks of Open-Weight LLMs
Distribution-informed Online Conformal Prediction
Mechanism of Task-oriented Information Removal in In-context Learning
SAFETY-GUIDED FLOW (SGF): A UNIFIED FRAMEWORK FOR NEGATIVE GUIDANCE IN SAFE GENERATION
ViPO: Visual Preference Optimization at Scale
Bandit Learning in Matching Markets Robust to Adversarial Corruptions
BA-LoRA: Bias-Alleviating Low-Rank Adaptation to Mitigate Catastrophic Inheritance in Large Language Models
Topology of Reasoning: Retrieved Cell Complex-Augmented Generation for Textual Graph Question Answering
M$^3$E: Continual Vision-and-Language Navigation via Mixture of Macro and Micro Experts
SGD with Adaptive Preconditioning: Unified Analysis and Momentum Acceleration
Queue Length Regret Bounds for Contextual Queueing Bandits
Grounding-IQA: Grounding Multimodal Language Model for Image Quality Assessment
Activation Steering with a Feedback Controller
Q-Learning with Adjoint Matching
PersonaX: Multimodal Datasets with LLM-Inferred Behavior Traits
Reinforcing Diffusion Models by Direct Group Preference Optimization
Scaling Laws for Diffusion Transformers
Consistency Geodesic Bridge: Image Restoration with Pretrained Diffusion Models
Enforcing Axioms for AI Alignment under Loss-Based Rules
FastGRPO: Accelerating Policy Optimization via Concurrency-aware Speculative Decoding and Online Draft Learning
Fine-R1: Make Multi-modal LLMs Excel in Fine-Grained Visual Recognition by Chain-of-Thought Reasoning
Sharp Monocular View Synthesis in Less Than a Second
The Geometry and Topology of Circuits: the Manifolds of Modular Addition
From Concepts to Components: Concept-Agnostic Attention Module Discovery in Transformers
BBQ: Boosting Quantization Entropy with Bell Box Quantization
STRONGER TOGETHER: ON-POLICY REINFORCEMENT LEARNING FOR COLLABORATIVE LLMS
ATLAS: Constraints-Aware Multi-Agent Collaboration for Real-World Travel Planning
SPRINT: Sparse-Dense Residual Fusion for Efficient Diffusion Transformers
Physics-Informed Inference Time Scaling for Solving High-Dimensional Partial Differential Equations
ORCaS: Unsupervised Depth Completion via Occluded Region Completion as Supervision
MATRIX: Mask Track Alignment for Interaction-aware Video Generation
Why High-rank Neural Networks Generalize?: An Algebraic Framework with RKHSs
Self-Forcing++: Towards Minute-Scale High-Quality Video Generation
Online time series prediction using feature adjustment
GAPrune: Gradient-Alignment Pruning for Domain-Aware Embeddings
Proving the Limited Scalability of Centralized Distributed Optimization via a New Lower Bound Construction
Asynchronous Policy Gradient Aggregation for Efficient Distributed Reinforcement Learning
SAC Flow: Sample-Efficient Reinforcement Learning of Flow-Based Policies via Velocity-Reparameterized Sequential Modeling
Hierarchical Multi-Scale Molecular Conformer Generation with Structural Awareness
Weight-Space Linear Recurrent Neural Networks
Sparsity Forcing: Reinforcing Token Sparsity of MLLMs
Hyperbolic Aware Minimization: Implicit Bias for Sparsity
Selection, Reflection and Self-Refinement: Revisit Reasoning Tasks via a Causal Lens
Boosting for Predictive Sufficiency
Time-to-Move: Training-Free Motion-Controlled Video Generation via Dual-Clock Denoising
Off-Policy Safe Reinforcement Learning with Cost-Constrained Optimistic Exploration
Variance-Dependent Regret Lower Bounds for Contextual Bandits
SmellNet: A Large-scale Dataset for Real-world Smell Recognition
Analyzing and Evaluating Unbiased Language Model Watermark
Superficial Safety Alignment Hypothesis
Your Language Model Secretly Contains Personality Subnetworks
GGBall: Graph Generative Model on Poincaré Ball
Everything in Its Place: Benchmarking Spatial Intelligence of Text-to-Image Models
Pushing Test-Time Scaling Limits of Deep Search with Asymmetric Verification
Achieving low-bit Muon through subspace preservation and grid quantization
Massive Activations are the Key to Local Detail Synthesis in Diffusion Transformers
RATE-DISTORTION OPTIMIZED COMMUNICATION FOR COLLABORATIVE PERCEPTION
THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning
Tree-sliced Sobolev IPM
One-Step Flow for Image Super-Resolution with Tunable Fidelity-Realism Trade-offs
Revisiting Tree-Sliced Wasserstein Distance Through the Lens of the Fermat–Weber Problem
Mixed-Curvature Tree-Sliced Wasserstein Distance
Quantile Advantage Estimation for Entropy-Safe Reasoning
Deft Scheduling of Dynamic Cloud Workflows with Varying Deadlines via Mixture-of-Experts
Verification and Co-Alignment via Heterogeneous Consistency for Preference-Aligned LLM Annotations
SSG: Scaled Spatial Guidance for Multi-Scale Visual Autoregressive Generation
Tackling Time-Series Forecasting Generalization via Mitigating Concept Drift
Towards Learned Optimization Free Lunch
Polynomial, trigonometric, and tropical activations
``Noisier'’ Noise Contrastive Estimation is (Almost) Maximum Likelihood
Aligned Novel View Image and Geometry Synthesis via Cross-modal Attention Instillation
Tokenizing Single-Channel EEG with Time-Frequency Motif Learning
Local Geometry Attention for Time Series Forecasting under Realistic Corruptions
Break the Trade-off Between Watermark Strength and Speculative Sampling Efficiency for Language Models
Turning Internal Gap into Self-Improvement: Promoting the Generation-Understanding Unification in MLLMs
GTR-Bench: Evaluating Geo-Temporal Reasoning in Vision-Language Models
ORION: Decoupling and Alignment for Unified Autoregressive Understanding and Generation
Rethinking LoRA for Privacy-Preserving Federated Learning in Large Models
Consistent Low-Rank Approximation
On Smoothness Bounds for Non-Clairvoyant Scheduling with Predictions
QWHA: Quantization-Aware Walsh-Hadamard Adaptation for Parameter-Efficient Fine-Tuning on Large Language Models
Boolean Satisfiability via Imitation Learning
A Memory-Efficient Hierarchical Algorithm for Large-scale Optimal Transport Problems
Conjuring Semantic Similarity
Distributionally Robust Classification for Multi-source Unsupervised Domain Adaptation
Online Alignment as Perceptual Loss
FLUX-Reason-6M & PRISM-Bench: A Million-Scale Text-to-Image Reasoning Dataset and Comprehensive Benchmark
GenSR: Symbolic regression based on equation generative space
Graph Representational Learning: When Does More Expressivity Hurt Generalization?
Efficient Turing Machine Simulation with Transformers
AntigenLM: Structure-Aware DNA Language Modeling for Influenza
Foresight Diffusion: Improving Sampling Consistency in Predictive Diffusion Models
HFSTI-Net: Hierarchical Frequency-spatial-temporal Interactions for Video Polyp Segmentation
Turbo-DDCM: Fast and Flexible Zero-Shot Diffusion-Based Image Compression
Causal Discovery in the Wild: A Voting-Theoretic Ensemble Approach
FedMC: Federated Manifold Calibration
FastVGGT: Fast Visual Geometry Transformer
Measuring Bias Amplification in Multi-Agent Systems with Large Language Models
CrossPL: Systematic Evaluation of Large Language Models for Cross Programming Language Interoperating Code Generation
Temporal Geometry of Deep Networks: Hyperbolic Representations of Training Dynamics for Intrinsic Explainability
Mean-Field Neural Differential Equations: A Game-Theoretic Approach to Sequence Prediction
RegionReasoner: Region-Grounded Multi-Round Visual Reasoning
On Entropy Control in LLM-RL Algorithms
Late-to-Early Training: LET LLMs Learn Earlier, So Faster and Better
ScalingCache: Extreme Acceleration of DiTs through Difference Scaling and Dynamic Interval Caching
From Tokens to Nodes: Semantic-Guided Motion Control for Dynamic 3D Gaussian Splatting
Enhancing Multivariate Time Series Forecasting with Global Temporal Retrieval
Efficient-LVSM: Faster, Cheaper, and Better Large View Synthesis Model via Decoupled Co-Refinement Attention
Replicable Reinforcement Learning with Linear Function Approximation
SketchThinker-R1: Towards Efficient Sketch-Style Reasoning in Large Multimodal Models
Activation Function Design Sustains Plasticity in Continual Learning
ImageRAG: Dynamic Image Retrieval for Reference-Guided Image Generation
Beyond Instance-Level Alignment: Dual-Level Optimal Transport for Audio-Text Retrieval
The Power of Small Initialization in Noisy Low-Tubal-Rank Tensor Recovery
GAVEL: Towards Rule-Based Safety through Activation Monitoring
Controllable First-Frame-Guided Video Editing via Mask-Aware LoRA Fine-Tuning
UniSS: Unified Expressive Speech-to-Speech Translation with Your Voice
A Law of Data Reconstruction for Random Features (And Beyond)
Towards Quantifying Long-Range Interactions in Graph Machine Learning: a Large Graph Dataset and a Measurement
Privacy Beyond Pixels: Latent Anonymization for Privacy-Preserving Video Understanding
General Exploratory Bonus for Optimistic Exploration in RLHF
Ada-Diffuser: Latent-Aware Adaptive Diffusion for Decision-Making
Reusing Pre-Training Data at Test Time is a Compute Multiplier
Generative Blocks World: Moving Things Around in Pictures
Textual Bayes: Quantifying Uncertainty in LLM-Based Systems
TokMem: Tokenized Procedural Memory for Large Language Models
No labels, No Problem: Training Visual Reasoners with Multimodal Verifiers
Spatial CAPTCHA: Generatively Benchmarking Spatial Reasoning for Human-Machine Differentiation
Can Transformers Really Do It All? On the Compatibility of Inductive Biases Across Tasks
Distilled Pretraining: A modern lens of Data, In-Context Learning and Test-Time Scaling
Convex Efficient Coding
Scalable Offline Model-Based RL with Action Chunks
Scalable Random Wavelet Features: Efficient Non-Stationary Kernel Approximation with Convergence Guarantees
Training-Free Determination of Network Width via Neural Tangent Kernel
One-Shot Exemplars for Class Grounding in Self-Supervised Learning
Distributional Equivalence in Linear Non-Gaussian Latent-Variable Cyclic Causal Models: Characterization and Learning
I2Mole: Interaction-aware Invariant Molecular Learning For Generalizable Property Prediction
All That Glitters Is Not Gold: Key-Secured 3D Secrets within 3D Gaussian Splatting
Learning From Dictionary: Enhancing Robustness of Machine-Generated Text Detection in Zero-Shot Language via Adversarial Training
Token-Importance Guided Direct Preference Optimization
Interaction-aware Representation Modeling With Co-Occurrence Consistency for Egocentric Hand-Object Parsing
Primal-Dual Policy Optimization for Adversarial Linear CMDPs
Exploring Mode Connectivity in Krylov Subspace for Domain Generalization
Proximal Supervised Fine-Tuning
Simulating and Understanding Deceptive Behaviors in Long-Horizon Interactions
DeAltHDR: Learning HDR Video Reconstruction from Degraded Alternating Exposure Sequences
CoCoDiff: Correspondence-Consistent Diffusion Model for Fine-grained Style Transfer
Measure Twice, Cut Once: A Semantic-Oriented Approach to Video Temporal Localization with Video LLMs
Reducing Symmetry Increase in Equivariant Neural Networks
Is In-Context Learning Learning?
Efficient Audio-Visual Speech Separation with Discrete Lip Semantics and Multi-Scale Global-Local Attention
Glance and Focus Reinforcement for Pan-cancer Screening
PCPO: Proportionate Credit Policy Optimization for Preference Alignment of Image Generation Models
No Pixel Left Behind: A Detail-Preserving Architecture for Robust High-Resolution AI-Generated Image Detection
OSWorld-MCP: Benchmarking MCP Tool Invocation In Computer-Use Agents
LucidFlux: Caption-Free Universal Image Restoration via a Large-Scale Diffusion Transformer
Sim2Real VLA: Zero-Shot Generalization of Synthesized Skills to Realistic Manipulation
COSA: Context-aware Output-Space Adapter for Test-Time Adaptation in Time Series Forecasting
Genomic Foundationless Models: Pretraining Does Not Promise Performance
Beyond Markovian: Reflective Exploration via Bayes-Adaptive RL for LLM Reasoning
FlowAlign: Trajectory-Regularized, Inversion-Free Flow-based Image Editing
Learning What Matters Now: Dynamic Preference Inference under Contextual Shifts
La-Proteina: Atomistic Protein Generation via Partially Latent Flow Matching
FutureX: An Advanced Live Benchmark for LLM Agents in Future Prediction
Non-Autoregressive Generation for Agentic Multi-Turn Interaction
RIVER: Real-time Video Interaction Benchmark
Process-Level Trajectory Evaluation for Environment Configuration in Software Engineering Agents
Fathom-DeepResearch: Unlocking Long Horizon Information Retrieval and Synthesis for SLMs
ZeroSiam: An Efficient Siamese for Test-Time Entropy Optimization without Collapse
ReSplat: Degradation-agnostic Feed-forward Gaussian Splatting via Self-guided Residual Diffusion
When to Ensemble: Identifying Token-Level Points for Stable and Fast LLM Ensembling
SPACeR: Self-Play Anchoring with Centralized Reference Models
ASTRAEA: A Token-wise Acceleration Framework for Video Diffusion Transformers
WMPO: World Model-based Policy Optimization for Vision-Language-Action Models
Towards Efficient, Adaptive, and Unified Reinforcement Mid-Training
RESCHED: Rethinking Flexible Job Shop Scheduling from a Transformer-based Architecture with Simplified States
RADAR: Learning to Route with Asymmetry-aware Distance Representations
Secure Outlier-Aware Large Language Model Inference
FlexHiNM-GP: Flexible Hierarchical Pruning via Region Allocation and Channel Permutation
Internal Evaluation of Density-Based Clusterings with Noise
Pareto Variational Autoencoder
Mitigating Hallucination in Vision-Language Model with Depth and Spatial-aware Key-Value Refinement
ROGA: Scaling Generalist Agents for Office Productivity Tasks via Tool Generation
PostAlign: Multimodal Grounding as a Corrective Lens for MLLMs
Jailbreaking on Text-to-Video Models via Scene Splitting Strategy
From Ticks to Flows: Dynamics of Neural Reinforcement Learning in Continuous Environments
BigMac3D: A Big Macaque Motion and Animation Dataset Bridging Image and 3D Pose Representations
SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning
From atom to space: A region-based readout function for spatial properties of materials
DiVeQ: Differentiable Vector Quantization Using the Reparameterization Trick
PolyGraphScore: a classifier-based metric for evaluating graph generative models
TriC-Motion: Tri-Domain Causal Modeling Grounded Text-to-Motion Generation
SPELL: Self-Play Reinforcement Learning for evolving Long-Context Language Models
LoRA-S: An Efficient Low Rank Adaptation scheme via Sylvester equation
From Samples to Scenarios: A New Paradigm for Probabilistic Forecasting
How to Lose Inherent Counterfactuality in Reinforcement Learning
Quantization-Aware Diffusion Models For Maximum Likelihood Training
Learning to Generate Stylized Handwritten Text via a Unified Representation of Style, Content, and Noise
Dual Randomized Smoothing: Beyond Global Noise Variance
FRABench and UFEval: Unified Fine-grained Evaluation with Task and Aspect Generalization
Analytica: Soft Propositional Reasoning for Robust and Scalable LLM-Driven Analysis
Efficient Approximate Posterior Sampling with Annealed Langevin Monte Carlo
Flow Straight and Fast in Hilbert Space: Functional Rectified Flow
Horizon Imagination: Efficient On-Policy Training in Diffusion World Models
VisJudge-Bench: Aesthetics and Quality Assessment of Visualizations
Emergent Hierarchical Reasoning in LLMs through Reinforcement Learning
CUPID: A Plug-in Framework for Joint Aleatoric and Epistemic Uncertainty Estimation with a Single Model
Culture in Action: Evaluating Text-to-Image Models through Social Activities
Interactive Learning of Single-Index Models via Stochastic Gradient Descent
Routing Channel-Patch Dependencies in Time Series Forecasting with Graph Spectral Decomposition
HDR-NSFF: High Dynamic Range Neural Scene Flow Fields
Perturbation-Induced Linearization: Constructing Unlearnable Data with Solely Linear Classifiers
GradShield: Alignment Preserving Finetuning
JULI: Jailbreak Large Language Models by Self-Introspection
When Data is the Algorithm: A Systematic Study and Curation of Preference Optimization Datasets
$\textit{MADFormer}$: Mixed Autoregressive and Diffusion Transformers for Continuous Image Generation
One-Prompt Strikes Back: Sparse Mixture of Experts for Prompt-based Continual Learning
Eliciting Harmful Capabilities by Fine-Tuning on Safeguarded Outputs
Learning a distance measure from the information-estimation geometry of data
Steering Embedding Models with Geometric Rotation: Mapping Semantic Relationships Across Languages and Models
Train on Validation (ToV): Fast data selection with applications to fine-tuning
Understanding Language Prior of LVLMs by Contrasting Chain-of-Embedding
SNaX: sparse narrow accelerated mixture of experts
A Sharp KL-Convergence Analysis for Diffusion Models under Minimal Assumptions
Quagmires in SFT-RL Post-Training: When High SFT Scores Mislead and What to Use Instead
Singleton-Optimized Conformal Prediction
AlphaFlow: Understanding and Improving MeanFlow Models
Trained on Tokens, Calibrated on Concepts: The Emergence of Semantic Calibration in LLMs
Elastic Optimal Transport: Theory, Application, and Empirical Evaluation
Bilateral Information-aware Test-time Adaptation for Vision-Language Models
The Spacetime of Diffusion Models: An Information Geometry Perspective
MixLinear: Extreme Low Resource Multivariate Time Series Forecasting with $0.1K$ Parameters
Diversity-Aware Online Prompt Assignment to Generative Models
Flatness Guided Test-Time Adaptation for Vision-Language Models
Learning Collective Variables from BioEmu with Time-Lagged Generation
LLM Pretraining with Continuous Concepts
TaTToo: Tool-Grounded Thinking PRM for Test-Time Scaling in Tabular Reasoning
CMT: Mid-Training for Efficient Learning of Consistency, Mean Flow, and Flow-Map Models
GCGNet: Graph-Consistent Generative Network for Time Series Forecasting with Exogenous Variables
Beyond RAG vs. Long-Context: Learning Distraction-Aware Retrieval for Efficient Knowledge Grounding
RMFlow: Refined Mean Flow by a Noise-Injection Step for Multimodal Generation
Cross-Tokenizer Likelihood Scoring Algorithms for Language Model Distillation
NextQuill: Causal Preference Modeling for Enhancing LLM Personalization
DND: Boosting Large Language Models with Dynamic Nested Depth
Open-Set Semantic Gaussian Splatting SLAM with Expandable Representation
Reducing Semantic Mismatch in Brain-to-Text Decoding Through Personalized Multimodal Masking
RigidSSL: Rigidity-based Geometric Pretraining for Protein Generation
Refine Now, Query Fast: A Decoupled Refinement Paradigm for Implicit Neural Fields
Diffusion LLMs Can Do Faster-Than-AR Inference via Discrete Diffusion Forcing
M$^2$-Miner: Multi-Agent Enhanced MCTS for Mobile GUI Agent Data Mining
One-Step Flow Q-Learning: Addressing the Diffusion Policy Bottleneck in Offline Reinforcement Learning
Plug-and-Play Fidelity Optimization for Diffusion Transformer Acceleration via Cumulative Error Minimization
Think-While-Generating: On-the-Fly Reasoning for Personalized Long-Form Generation
STEDiff: Revealing the Spatial and Temporal Redundancy of Backdoor Attacks in Text-to-Image Diffusion Models
Modality Alignment across Trees on Heterogeneous Hyperbolic Manifolds
Debiased and Denoised Projection Learning for Incomplete Multi-view Clustering
Exposing Weaknesses of Large Reasoning Models through Graph Algorithm Problems
Self-Augmented Visual Contrastive Decoding
Teach2Eval: An Interaction-Driven LLMs Evaluation Method via Teaching Effectiveness
High Accuracy, Less Talk (HALT): Reliable LLMs through Capability-Aligned Finetuning
Structurally Human, Semantically Biased: Detecting LLM-Generated References with Embeddings and GNNs
ARROW: An Adaptive Rollout and Routing Method for Global Weather Forecasting
String Seed of Thought: Prompting LLMs for Distribution-Faithful and Diverse Generation
Test-Time Optimization of 3D Point Cloud LLM via Manifold-Aware In-Context Guidance and Refinement
OrchestrationBench: LLM-Driven Agentic Planning and Tool Use in Multi-Domain Scenarios
Riemannian Variational Flow Matching for Material and Protein Design
Forget Forgetting: Continual Learning in a World of Abundant Memory
Benefits and Pitfalls of Reinforcement Learning for Language Model Planning: A Theoretical Perspective
P$^2$-DPO:Grounding Hallucination in Perceptual Processing via Calibration Direct Preference Optimization
Untraceable DeepFakes via Traceable Fingerprint Elimination
FSA: An Alternative Efficient Implementation of Native Sparse Attention Kernel
SCoT: Teaching 3D-LLMs to Think Spatially with Million-scale CoT Annotations
LLM DNA: Tracing Model Evolution via Functional Representations
Beyond Prompt-Induced Lies: Investigating LLM Deception on Benign Prompts
Revisiting Confidence Calibration for Misclassification Detection in VLMs
Almost Bayesian: Dynamics of SGD Through Singular Learning Theory
Do Not Let Low-Probability Tokens Over-Dominate in RL for LLMs
Interpretable 3D Neural Object Volumes for Robust Conceptual Reasoning
Data-to-Energy Stochastic Dynamics
Uncertainty-Aware Gaussian Map for Vision-Language Navigation
WILD-Diffusion: A WDRO Inspired Training Method for Diffusion Models under Limited Data
StepORLM: A Self-Evolving Framework With Generative Process Supervision For Operations Research Language Models
NIMO: a Nonlinear Interpretable MOdel
On the $O(1/T)$ Convergence of Alternating Gradient Descent–Ascent in Bilinear Games
Single-stream Policy Optimization
Entropy Regularizing Activation: Boosting Continuous Control, Large Language Models, and Image Classification with Activation as Entropy Constraints
CRONOS: Continuous time reconstruction for 4D medical longitudinal series
CoT-Evo: Evolutionary Distillation of Chain-of-Thought for Scientific Reasoning
LLaVAction: evaluating and training multi-modal large language models for action understanding
Bidirectional Predictive Coding
Learn More with Less: Uncertainty Consistency Guided Query Selection for RLVR
Neural Sum-of-Squares: Certifying the Nonnegativity of Polynomials with Transformers
UltraGauss: Ultrafast Gaussian Reconstruction of 3D Ultrasound Volumes
Stabilizing Policy Gradients for Sample-Efficient Reinforcement Learning in LLM Reasoning
Multi-Agent Debate with Memory Masking
LinearRAG: Linear Graph Retrieval Augmented Generation on Large-scale Corpora
Inducing Dyslexia in Vision Language Models
Graph Tokenization for Bridging Graphs and Transformers
Reasoning-Driven Multimodal LLM for Domain Generalization
Align Once, Benefit Multilingually: Enforcing Multilingual Consistency for LLM Safety Alignment
DeLiVR: Differential Spatiotemporal Lie Bias for Efficient Video Deraining
Enhancing Communication Compression via Discrepancy-aware Calibration for Federated Learning
Understanding and improving Shampoo and SOAP via Kullback-Leibler Minimization
Noise-Adaptive Diffusion Sampling for Inverse Problems Without Task-Specific Tuning
ReFocusEraser: Refocusing for Small Object Removal with Robust Context-Shadow Repair
Unlocking the Value of Text: Event-Driven Reasoning and Multi-Level Alignment for Time Series Forecasting
Detecting Data Contamination in LLMs via In-Context Learning
Reconstruct Anything Model a lightweight foundation model for computational imaging
Noise Stability of Transformer Models
Deep Hierarchical Learning with Nested Subspace Networks
The Softmax Bottleneck Does Not Limit the Probabilities of the Most Likely Tokens
Saddle-to-Saddle Dynamics Explains A Simplicity Bias Across Architectures
Self-Speculative Masked Diffusions
Gumbel Distillation for Parallel Text Generation
CDE: Curiosity-Driven Exploration for Efficient Reinforcement Learning in Large Language Models
Understanding and Improving Continuous LLM Adversarial Training via In-context Learning Theory
S2GO: Streaming Sparse Gaussian Occupancy
PaAno: Patch-Based Representation Learning for Time-Series Anomaly Detection
$\mathbf{Li_2}$: A Framework on Dynamics of Feature Emergence and Delayed Generalization
First is Not Really Better Than Last: Evaluating Layer Choice and Aggregation Strategies in Language Model Data Influence Estimation
Into the Rabbit Hull: From Task-Relevant Concepts in DINO to Minkowski Geometry
Noise-Aware Generalization: Robustness to In-Domain Noise and Out-of-Domain Generalization
Whatever Remains Must Be True: Filtering Drives Reasoning in LLMs, Shaping Diversity
Generative View Stitching
MATH-Beyond: A Benchmark for RL to Expand Beyond the Base Model
Fantastic Pretraining Optimizers and Where to Find Them
Learning to Recall with Transformers Beyond Orthogonal Embeddings
Diversified Multinomial Logit Contextual Bandits
Provably Accelerated Imaging with Restarted Inertia and Score-based Image Priors
Relational Transformer: Toward Zero-Shot Foundation Models for Relational Data
How NOT to benchmark your SITE metric: Beyond Static Leaderboards and Towards Realistic Evaluation.
h-MINT: Modeling Pocket-Ligand Binding with Hierarchical Molecular Interaction Network
Contact-guided Real2Sim from Monocular Video with Planar Scene Primitives
The Alignment Waltz: Jointly Training Agents to Collaborate for Safety
Is This Just Fantasy? Language Model Representations Reflect Human Judgments of Event Plausibility
Conditioned Initialization for Attention
Distributionally Robust Cooperative Multi-agent Reinforcement Learning with Value Factorization
BioTamperNet: Affinity-Guided State-Space Model Detecting Tampered Biomedical Images
DiffAdapt: Difficulty-Adaptive Reasoning for Token-Efficient LLM Inference
The Ideation-Execution Gap: Execution Outcomes of LLM-Generated versus Human Research Ideas
What Happens Next? Anticipating Future Motion by Generating Point Trajectories
Theory of Scaling Laws for In-Context Regression: Depth, Width, Context and Time
VeriCoT: Neuro-symbolic Chain-of-Thought Validation via Logical Consistency Checks
Deep Think with Confidence
Pose Prior Learner: Unsupervised Categorical Prior Learning for Pose Estimation
Decoupling Primitive with Experts: Dynamic Feature Alignment for Compositional Zero-Shot Learning
Market Games for Generative Models: Equilibria, Welfare, and Strategic Entry
DisTaC: Conditioning Task Vectors via Distillation for Robust Model Merging
LiveResearchBench: Benchmarking Single- and Multi-Agent Systems for Citation-Grounded Deep Research
ByteFlow: Language Modeling through Adaptive Byte Compression without a Tokenizer
TangleScore: Tangle-Guided Purge and Imprint for Unstructured Knowledge Editing
Dual-Solver: A Generalized ODE Solver for Diffusion Models with Dual Prediction
OneTwoVLA: A Unified Vision-Language-Action Model with Adaptive Reasoning
ExGRPO: Learning to Reason from Prior Successes
Are LLMs Really Not Knowledgeable? Mining the Submerged Knowledge in LLMs' Memory
Random Policy Valuation is Enough for LLM Reasoning with Verifiable Rewards
Cyber-Zero: Training Cybersecurity Agents without Runtime
On the Bayes Inconsistency of Disagreement Discrepancy Surrogates
Diffusion Fine-Tuning via Reparameterized Policy Gradient of the Soft Q-Function
On the Alignment Between Supervised and Self-Supervised Contrastive Learning
Multi-Head Low-Rank Attention
Does Higher Interpretability Imply Better Utility? A Pairwise Analysis on Sparse Autoencoders
Token Hidden Reward: Steering Exploration-Exploitation in Group Relative Deep Reinforcement Learning
CORE: Concept-Oriented Reinforcement for Bridging the Definition–Application Gap in Mathematical Reasoning
Characterization and Learning of Causal Graphs with Latent Confounders and Post-treatment Selection from Interventional Data
Unlearning Evaluation through Subset Statistical Independence
MetaVLA: Unified Meta Co-Training for Efficient Embodied Adaptation
Towards Persistent Noise-Tolerant Active Learning of Regular Languages with Class Query
Can Vision-Language Models Answer Face to Face Questions in the Real-World?
ASTGI: Adaptive Spatio-Temporal Graph Interactions for Irregular Multivariate Time Series Forecasting
GuidedBench: Measuring and Mitigating the Evaluation Discrepancies of In-the-wild LLM Jailbreak Methods
DrivingGen: A Comprehensive Benchmark for Generative Video World Models in Autonomous Driving
Discrete Latent Features Ablate Adversarial Attack: A Robust Prompt Tuning Framework for VLMs
Arbitrary-Shaped Image Generation via Spherical Neural Field Diffusion
Rethinking Consistent Multi-Label Classification under Inexact Supervision
Hilbert: Recursively Building Formal Proofs with Informal Reasoning
Joint Selection for Large-Scale Pre-Training Data via Policy Gradient-based Mask Learning
Bridging ML and algorithms: comparison of hyperbolic embeddings
T1: Tool-integrated Verification for Test-time Compute Scaling in Small Language Models
Towards Safe and Optimal Online Bidding: A Modular Look-ahead Lyapunov Framework
Learning Human Habits with Rule-Guided Active Inference
Uniform Discrete Diffusion with Metric Path for Video Generation
Learning from Label Proportions via Proportional Value Classification
PhysLLM: Harnessing Large Language Models for Cross-Modal Remote Physiological Sensing
Learning Nonlinear Causal Reductions to Explain Reinforcement Learning Policies
VideoZoomer: Reinforcement-Learned Temporal Focusing for Long Video Reasoning
Graph-Theoretic Intrinsic Reward: Guiding RL with Effective Resistance
A Convergence Analysis of Adaptive Optimizers under Floating-point Quantization
TD-JEPA: Latent-predictive Representations for Zero-Shot Reinforcement Learning
AQuA: Toward Strategic Response Generation for Ambiguous Visual Questions
PILOT-Bench: Probabilistic Interaction for LLM Operations in Tool-driven Scenarios
DriveMamba: Task-Centric Scalable State Space Model for Efficient End-to-End Autonomous Driving
PSDNorm: Temporal Normalization for Deep Learning in Sleep Staging
Taming Hierarchical Image Coding Optimization: A Spectral Regularization Perspective
Evolution of Concepts in Language Model Pre-Training
MoCa: Modeling Object Consistency for 3D Camera Control in Video Generation
INTIMA: A Benchmark for Human-AI Companionship Behavior
Harmonized Cone for Feasible and Non-conflict Directions in Training Physics-Informed Neural Networks
Fractional-Order Spiking Neural Network
FlowAD: Ego-Scene Interactive Modeling for Autonomous Driving
Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning
Revisiting Multimodal Positional Encoding in Vision–Language Models
Captain Cinema: Towards Short Movie Generation
Finite-Time Analysis of Actor-Critic Methods with Deep Neural Network Approximation
CatalystBench: A Comprehensive Multi-Task Benchmark for Advancing Language Models in Catalysis Science
How does the optimizer implicitly bias the model merging loss landscape?
A Study of Posterior Stability in Time-Series Latent Diffusion
Cross-Modal Redundancy and the Geometry of Vision–Language Embeddings
Scalable and Adaptive Trust-Region Learning via Projection Convex Hull
GDR-learners: Orthogonal Learning of Generative Models for Potential Outcomes
Robust Equation Structure learning with Adaptive Refinement
Learning AND–OR Templates for Compositional Representation in Art and Design
MoSA: Mosaic Shared Adaptation of Large Language Models
Einstein Fields: A Neural Perspective To Computational General Relativity
Segment Any Events with Language
Bootstrapping MLLM for Weakly‑Supervised Class‑Agnostic Object Counting
Discovering Diverse Behaviors via Temporal Contrastive Learning
Bridging Input Feature Spaces Towards Graph Foundation Models
Long-Document QA with Chain-of-Structured-Thought and Fine-Tuned SLMs
Trust but Verify: Adaptive Conditioning for Reference-Based Diffusion Super-Resolution via Implicit Reference Correlation Modeling
Temperature as a Meta-Policy: Adaptive Temperature in LLM Reinforcement Learning
PepBenchmark: A Standardized Benchmark for Peptide Machine Learning
AlignSep: Temporally-Aligned Video-Queried Sound Separation with Flow Matching
Inference-time scaling of diffusion models through classical search
Bi-LoRA: Efficient Sharpness-Aware Minimization for Fine-Tuning Large-Scale Models
Compositional Generalization through Gradient Search in Nonparametric Latent Space
Incorporating Expert Priors into Bayesian Optimization via Dynamic Mean Decay
Rex-Thinker: Grounded Object Referring via Chain-of-Thought Reasoning
Point-MoE: Large-Scale Multi-Dataset Training with Mixture-of-Experts for 3D Semantic Segmentation
Expertise Can Be Helpful for Reinforcement Learning-based Macro Placement
ReDDiT: Rehashing Noise for Discrete Visual Generation
GPG: A Simple and Strong Reinforcement Learning Baseline for Model Reasoning
P2P: Automated Paper-to-Poster Generation and Fine-Grained Benchmark
Revisiting Long-context Modeling from Context Denoising Perspective
Laplacian Multi-scale Flow Matching for Generative Modeling
Federated ADMM from Bayesian Duality
MobiEdit: Resource-efficient Knowledge Editing for Personalized On-device LLMs
Robust Selective Activation with Randomized Temporal K-Winner-Take-All in Spiking Neural Networks for Continual Learning
From Observations to Events: Event-Aware World Models for Reinforcement Learning
Q-Learning with Fine-Grained Gap-Dependent Regret
Gradient-Normalized Smoothness for Optimization with Approximate Hessians
On the Thinking-Language Modeling Gap in Large Language Models
MILPnet: A Multi-Scale Architecture with Geometric Feature Sequence Representations for Advancing MILP Problems
CoDi: Subject-Consistent and Pose-Diverse Text-to-Image Generation
ProteinAE: Protein Diffusion Autoencoders for Structure Encoding
On The Surprising Effectiveness of a Single Global Merging in Decentralized Learning
Joint Adaptation of Uni-modal Foundation Models for Multi-modal Alzheimer's Disease Diagnosis
Learn to Reason Efficiently with Adaptive Length-based Reward Shaping
Online Pseudo-Zeroth-Order Training of Neuromorphic Spiking Neural Networks
Towards Better Branching Policies: Leveraging the Sequential Nature of Branch-and-Bound Tree
Efficient Reinforcement Learning by Guiding World Models with Non-Curated Data
Policy Contrastive Decoding for Robotic Foundation Models
PI-Light: Physics-Inspired Diffusion for Full-Image Relighting
D-AR: Diffusion via Autoregressive Models
CLASH: Evaluating Language Models on Judging High-Stakes Dilemmas from Multiple Perspectives
Exploring Synthesizable Chemical Space with Iterative Pathway Refinements
Multi-Scale Hypergraph Meets LLMs: Aligning Large Language Models for Time Series Analysis
EVLP: Learning Unified Embodied Vision-Language Planner with Reinforced Supervised Fine-Tuning
Fast Frank–Wolfe Algorithms with Adaptive Bregman Step-Size for Weakly Convex Functions
AutoDV: An End-to-End Deep Learning Model for High-Dimensional Data Visualization
Learning with Dual-level Noisy Correspondence for Multi-modal Entity Alignment
Oversmoothing, "Oversquashing'', Heterophily, Long-Range, and more: Demystifying Common Beliefs in Graph Machine Learning
Thompson Sampling via Fine-Tuning of LLMs
Back to Square Roots: An Optimal Bound on the Matrix Factorization Error for Multi-Epoch Differentially Private SGD
MVR: Multi-view Video Reward Shaping for Reinforcement Learning
Rethinking Continual Learning with Progressive Neural Collapse
Adaptive Width Neural Networks
Learning Dynamics Feature Representation via Policy Attention for Dynamic Path Planning in Urban Road Networks
Flow-based Conformal Prediction for Multi-dimensional Time Series
GAP: Gradient Adjustment with Phase-guidance for Robust Vision-Proprioception Policies in Robotic Manipulation
GRACE: A Language Model Framework for Explainable Inverse Reinforcement Learning
Covariate-Guided Clusterwise Linear Regression for Generalization to Unseen Data
Enhanced Continual Learning of Vision-Language Models with Model Fusion
JAPAN: Joint Adaptive Prediction Areas with Normalising Flow
Glance for Context: Learning When to Leverage LLMs for Node-Aware GNN-LLM Fusion
G-Merging: Graph Models Merging for Parameter-Efficient Multi-Task Knowledge Consolidation
Plan-R1: Safe and Feasible Trajectory Planning as Language Modeling
FLOWER: A Flow-Matching Solver for Inverse Problems
Revisiting Group Relative Policy Optimization: Insights into On-Policy and Off-Policy Training
Soft Tokens, Hard Truths
AlphaAlign: Incentivizing Safety Alignment with Extremely Simplified Reinforcement Learning
Solving Parameter-Robust Avoid Problems with Unknown Feasibility using Reinforcement Learning
Redirection for Erasing Memory (REM): Towards a universal unlearning method for corrupted data
LogiStory: A Logic-Aware Framework for Multi-Image Story Visualization
How Muon’s Spectral Design Benefits Generalization: A Study on Imbalanced Data
Who Matters Matters: Agent-Specific Conservative Offline MARL
Probabilistic Kernel Function for Fast Angle Testing
CARPRT: Class-Aware Zero-Shot Prompt Reweighting for Vision-Language Model
FZOO: Fast Zeroth-Order Optimizer for Fine‑Tuning Large Language Models towards Adam‑Scale Speed
Darwin Gödel Machine: Open-Ended Evolution of Self-Improving Agents
All Roads Lead to Likelihood: The Value of Reinforcement Learning in Fine-Tuning
Learning Distributions over Permutations and Rankings with Factorized Representations
Cascadia: An Efficient Cascade Serving System for Large Language Models
Monitoring Decomposition Attacks with Lightweight Sequential Monitors
Post-Training Quantization for Video Matting
MIMIC-Bench: Exploring the User-Like Thinking and Mimicking Capabilities of Multimodal Large Language Models
EdgeCape: Edge Weight Prediction For Category-Agnostic Pose Estimation
The Potential of Second-Order Optimization for LLMs: A Study with Full Gauss-Newton
Control Tax: The Price of Keeping AI in Check
Learning Pseudorandom Numbers with Transformers: Permuted Congruential Generators, Curricula, and Interpretability
ReLaSH: Reconstructing Joint Latent Spaces for Efficient Generation of Synthetic Hypergraphs with Hyperlink Attributes
Decision-Theoretic Approaches for Improved Learning-Augmented Algorithms
Disentangling the Factors of Convergence between Brains and Computer Vision Models
Cite Pretrain: Retrieval-Free Knowledge Attribution for Large Language Models
Learning to Reason without External Rewards
Solving General-Utility Markov Decision Processes in the Single-Trial Regime with Online Planning
Sysformer: Safeguarding Frozen Large Language Models with Adaptive System Prompts
Addressing Pitfalls in the Evaluation of Uncertainty Estimation Methods for Natural Language Generation
Riemannian Federated Learning via Averaging Gradient Streams
GNN-as-Judge: Unleashing the Power of LLMs for Graph Few-shot Semi-supervised Learning with GNN Feedback
Unified Diffusion VLA: Vision-Language-Action Model via Joint Discrete Diffusion Diffusion Process
PAC-Bayes bounds for cumulative loss in Continual Learning
Sparling: End-to-End Spatial Concept Learning via Extremely Sparse Activations
Learning in Prophet Inequalities with Noisy Observations
OptimSyn: Influence-Guided Rubrics Optimization for Synthetic Data Generation
AnyBCQ: Hardware Efficient Flexible Binary-Coded Quantization for Multi-Precision LLMs
Unified Vision–Language Modeling via Concept Space Alignment
Constrained Diffusion for Protein Design with Hard Structural Constraints
From EduVisBench to EduVisAgent: A Benchmark and Multi-Agent Framework for Reasoning-Driven Pedagogical Visualization
Context parroting: A simple but tough-to-beat baseline for foundation models in scientific machine learning
Understanding the Mechanisms of Fast Hyperparameter Transfer
Converge Faster, Talk Less: Hessian-Informed Federated Zeroth-Order Optimization
Memory-Statistics Tradeoff in Continual Learning with Structural Regularization
Minimax Optimal Adversarial Reinforcement Learning
Frustratingly Simple Retrieval Improves Challenging, Reasoning-Intensive Benchmarks
Cactus: Accelerating Auto-Regressive Decoding with Constrained Acceptance Speculative Sampling
Predicting Training Re-evaluation Curves Enables Effective Data Curriculums for LLMs
Distributionally Robust Linear Regression with Block Lewis Weights
CauKer: Classification Time Series Foundation Models Can Be Pretrained on Synthetic Data
We-Math 2.0: A Versatile MathBook System for Incentivizing Visual Mathematical Reasoning
Attribution-Guided Decoding
Towards Understanding Valuable Preference Data for Large Language Model Alignment
Sample-Efficient Distributionally Robust Multi-Agent Reinforcement Learning via Online Interaction
Principled RL for Diffusion LLMs Emerges from a Sequence-Level Perspective
Referring Layer Decomposition
Pay Attention to CTC: Fast and Robust Pseudo-Labelling for Unified Speech Recognition
Identifiability Challenges in Sparse Linear Ordinary Differential Equations
Beyond Sequential Reranking: Reranker-Guided Search Improves Reasoning Intensive Retrieval
Equivariant Splitting: Self-supervised learning from incomplete data
Evaluating SAE interpretability without generating explanations
AutoBio: A Simulation and Benchmark for Robotic Automation in Digital Biology Laboratory
Sparse Autoencoders Trained on the Same Data Learn Different Features
Exploring Knowledge Purification in Multi-Teacher Knowledge Distillation for LLMs
Generate Any Scene: Scene Graph Driven Data Synthesis for Visual Generation Training
Causal Structure Learning in Hawkes Processes with Complex Latent Confounder Networks
Bounds of Chain-of-Thought Robustness: Reasoning Steps, Embed Norms, and Beyond
Asymmetric Proximal Policy Optimization: mini-critics boost LLM reasoning
TimeRecipe: A Time-Series Forecasting Recipe via Benchmarking Module Level Effectiveness
Toward Enhancing Representation Learning in Federated Multi-Task Settings
FrontierCO: Real-World and Large-Scale Evaluation of Machine Learning Solvers for Combinatorial Optimization
Adaptive Hopfield Network: Rethinking Similarities in Associative Memory
Transducing Language Models
When Shift Happens - Confounding Is to Blame
Is it Thinking or Cheating? Detecting Implicit Reward Hacking by Measuring Reasoning Effort
Mamba-3: Improved Sequence Modeling using State Space Principles
Memorizing Long-tail Data Can Help Generalization Through Composition
Capacity-Aware Inference: Mitigating the Straggler Effect in Mixture of Experts
From Tokens to Thoughts: How LLMs and Humans Trade Compression for Meaning
CREPE: Controlling diffusion with REPlica Exchange
Parallel Token Generation for Language Models
NetArena: Dynamically Generated LLM Benchmarks for Network Applications
Nearly-Optimal Bandit Learning in Stackelberg Games with Side Information
Q-learning with Posterior Sampling
Robust Amortized Bayesian Inference with Self-Consistency Losses on Unlabeled Data
Pretraining Scaling Laws for Generative Evaluations of Language Models
Function Induction and Task Generalization: An Interpretability Study with Off-by-One Addition
Expressiveness of Multi-Neuron Convex Relaxations in Neural Network Certification
Self-Jailbreaking: Language Models Can Reason Themselves Out of Safety Alignment After Benign Reasoning Training
Smooth Calibration Error: Uniform Convergence and Functional Gradient Analysis
SciNav: A Principled Agent Framework for Scientific Coding Tasks
Provable Separations between Memorization and Generalization in Diffusion Models
VenusX: Unlocking Fine-Grained Functional Understanding of Proteins
TikZilla: Scaling Text-to-TikZ with High-Quality Data and Reinforcement Learning
TTSDS2: Resources and Benchmark for Evaluating Human-Quality Text to Speech Systems
LatentQA: Teaching LLMs to Decode Activations Into Natural Language
FoNE: Precise Single-Token Number Embeddings via Fourier Features
DRAGON: Guard LLM Unlearning in Context via Negative Detection and Reasoning
LLMs Get Lost In Multi-Turn Conversation
Revela: Dense Retriever Learning via Language Modeling
Learning to Be Uncertain: Pre-training World Models with Horizon-Calibrated Uncertainty
DistillKac: Few-Step Image Generation via Damped Wave Equations
Estimating Dimensionality of Neural Representations from Finite Samples
gen2seg: Generative Models Enable Generalizable Instance Segmentation
Hot PATE: Private Aggregation of Distributions for Diverse Tasks
AgentSynth: Scalable Task Generation for Generalist Computer-Use Agents
Information-based Value Iteration Networks for Decision Making Under Uncertainty
A Unifying Framework for Causal Imitation Learning with Hidden Confounders
True Self-Supervised Novel View Synthesis is Transferable
DeLeaker: Dynamic Inference-Time Reweighting For Semantic Leakage Mitigation in Text-to-Image Models
Dual Optimistic Ascent (PI Control) is the Augmented Lagrangian Method in Disguise
QLIP: A Dynamic Quadtree Vision Prior Enhances MLLM Performance Without Retraining
Latent Veracity Inference for Identifying Errors in Stepwise Reasoning
Diffusion-DFL: Decision-focused Diffusion Models for Stochastic Optimization
Latent Visual Reasoning
On The Fragility of Benchmark Contamination Detection in Reasoning Models
Type-Compliant Adaptation Cascades
Critique-Coder: Enhancing Coder Models by Critique Reinforcement Learning
Sequential Parallel Duality in Prefix Scannable Models
Don't Throw Away Your Pretrained Model
Sample Complexity and Representation Ability of Test-time Scaling Paradigms
A universal compression theory: Lottery ticket hypothesis and superpolynomial scaling laws
EquAct: An SE(3)-Equivariant Multi-Task Transformer for 3D Robotic Manipulation
Pretraining with hierarchical memories: separating long-tail and common knowledge
Entropy-Based Block Pruning for Efficient Large Language Models
Differentially Private Two-Stage Gradient Descent for Instrumental Variable Regression
How Reliable is Language Model Micro-Benchmarking?
Localizing Task Recognition and Task Learning in In-Context Learning via Attention Head Analysis
A Balanced Neuro-Symbolic Approach for Commonsense Abductive Logic
Noise Tolerance of Distributionally Robust Learning
Two-Way Is Better Than One: Bidirectional Alignment with Cycle Consistency for Exemplar-Free Class-Incremental Learning
Pretrain–Test Task Alignment Governs Generalization in In-Context Learning
Jailbreak Transferability Emerges from Shared Representations
Distributed Algorithms for Euclidean Clustering
MetaSpatial: Reinforcing 3D Spatial Reasoning in VLMs for the Metaverse
Pursuing Minimal Sufficiency in Spatial Reasoning
Nearly Space-Optimal Graph and Hypergraph Sparsification in Insertion-Only Data Streams
jqBench: a benchmark for reading and editing JSON from natural language and/or examples
Hybrid Training for Vision-Language-Action Models
Gradient-Based Program Synthesis with Neurally Interpreted Languages
Adaptive Conformal Guidance for Learning under Uncertainty
Global Resolution: Optimal Multi-Draft Speculative Sampling via Convex Optimization
Transfer Paramatters: Optimal per-Module Hyperparameters Across All Scaling Axes
TUMIX: Multi-Agent Test-Time Scaling with Tool-Use Mixture
Seq vs Seq: An Open Suite of Paired Encoders and Decoders
On the Theoretical Limitations of Embedding-Based Retrieval
Learned Meta-Tokens for Language Modeling
Go-Browse: Training Web Agents with Structured Exploration
CONCUR: A Framework for Continual Constrained and Unconstrained Routing
Scalable Chain of Thoughts via Elastic Reasoning
Nemotron-Research-Tool-N1: Exploring Tool-Using Language Models with Reinforced Reasoning
Incentives in Federated Learning with Heterogeneous Agents
Dynamic Kernel Graph Sparsifiers
Towards a Sharp Analysis of Learning Offline $f$-Divergence-Regularized Contextual Bandits
Efficient Estimation of Kernel Surrogate Models for Task Attribution
ReTrace: Reinforcement Learning-Guided Reconstruction Attacks on Machine Unlearning
Where Did This Sentence Come From? Tracing Provenance in LLM Reasoning Distillation
Better Together: Leveraging Unpaired Multimodal Data for Stronger Unimodal Models
Get RICH or Die Scaling: Profitably Trading Inference Compute for Robustness
Relatron: Automating Relational Machine Learning over Relational Databases
Towards Knowledge‑and‑Data‑Driven Organic Reaction Prediction: RAG‑Enhanced and Reasoning‑Powered Hybrid System with LLMs
Cautious Optimizers: Improving Training with One Line of Code
Flipping the Dialogue: Training and Evaluating User Language Models
CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning
Balancing the Experts: Unlocking LoRA-MoE for GRPO via Mechanism-Aware Rewards
SMOTE and Mirrors: Exposing Privacy Leakage from Synthetic Minority Oversampling
Learning Recursive Multi-Scale Representations for Irregular Multivariate Time Series Forecasting
Grounding Computer Use Agents on Human Demonstrations
Spatial Structure and Selective Text Jointly Facilitate Image Clustering
Fast Estimation of Wasserstein Distances via Regression on Sliced Wasserstein Distances
From Cheap Geometry to Expensive Physics: Elevating Neural Operators via Latent Shape Pretraining
Riemannian Optimization on Relaxed Indicator Matrix Manifold
Multi-View Encoders for Performance Prediction in LLM-Based Agentic Workflows
Scaling Laws Meet Model Architecture: Toward Inference-Efficient LLMs
Routing Manifold Alignment Improves Generalization of Mixture-of-Experts LLMs
The Hidden Lattice Geometry of LLMs
SkillFactory: Self-Distillation for Learning Cognitive Behaviors
IR-Agent: Expert-Inspired LLM Agents for Structure Elucidation from Infrared Spectra
Derandomized Online-to-Non-convex Conversion for Stochastic Weakly Convex Optimization
Block-sample MAC-Bayes generalization bounds
Midway Network: Learning Representations for Recognition and Motion from Latent Dynamics
ProPerSim: Developing Proactive and Personalized AI Assistants through User-Assistant Simulation
Not All Documents Are What You Need for Extracting Instruction Tuning Data
Active Learning for Decision Trees with Provable Guarantees
Omni-Reward: Towards Generalist Omni-Modal Reward Modeling with Free-Form Preferences
Knowledge Exchange with Confidence: Cost-Effective LLM Integration for Reliable and Efficient Visual Question Answering
Diffusion Blend: Inference-Time Multi-Preference Alignment for Diffusion Models
InclusiveVidPose: Bridging the Pose Estimation Gap for Individuals with Limb Deficiencies in Video-Based Motion
Rethinking Uncertainty Estimation in LLMs: A Principled Single-Sequence Measure
Dual-Scale World Models for LLM Agents towards Hard-Exploration Problems
How hard is learning to cut? Trade-offs and sample complexity
Learning Exposure Mapping Functions for Inferring Heterogeneous Peer Effects
Motion-Aligned Word Embeddings for Text-to-Motion Generation
BANZ-FS: BANZSL Fingerspelling Dataset
RepIt: Steering Language Models with Concept-Specific Refusal Vectors
Scalable Second-order Riemannian Optimization for $K$-means Clustering
Gauge Flow Matching: Efficient Constrained Generative Modeling over General Convex Set and Beyond
A Block Coordinate Descent Method for Nonsmooth Composite Optimization under Orthogonality Constraints
Learning to Play Multi-Follower Bayesian Stackelberg Games
Rethinking JEPA: Compute‑Efficient Video Self-Supervised Learning with Frozen Teachers
Robust Spiking Neural Networks Against Adversarial Attacks
Adaptive Moments are Surprisingly Effective for Plug-and-Play Diffusion Sampling
Diffusion Transformers with Representation Autoencoders
From Evaluation to Defense: Advancing Safety in Video Large Language Models
OCR-Reasoning Benchmark: Unveiling the True Capabilities of MLLMs in Complex Text-Rich Image Reasoning
ProxyThinker: Test-Time Guidance through Small Visual Reasoners
Soft Equivariance Regularization for Invariant Self-Supervised Learning
DUET: DISTILLED LLM UNLEARNING FROM AN EFFICIENTLY CONTEXTUALIZED TEACHER
Secure Inference for Diffusion Models via Unconditional Scores
Learning to Answer from Correct Demonstrations
Eliminating Inductive Bias in Reward Models with Information-Theoretic Guidance
Visual Autoregressive Modeling for Instruction-Guided Image Editing
Discrete Adjoint Matching
Do Vision-Language Models Respect Contextual Integrity in Location Disclosure?
Children's Intelligence Tests Pose Challenges for MLLMs? KidGym: A 2D Grid-Based Reasoning Benchmark for MLLMs
Self-Rewarding Vision-Language Model via Reasoning Decomposition and Multi-Reward Policy Optimization
Efficient Agent Training for Computer Use
ROSETTA: Constructing Code-Based Reward from Unconstrained Language Preference
GIT-BO: High-Dimensional Bayesian Optimization with Tabular Foundation Models
QueryStream: Advancing Streaming Video Understanding with Query-Aware Pruning and Proactive Response
Reasoned Safety Alignment: Ensuring Jailbreak Defense via Answer-Then-Check
REA-RL: Reflection-Aware Online Reinforcement Learning for Efficient Reasoning
AdaSpec: Adaptive Spectrum for Enhanced Node Distinguishability
VUDG: A Dataset for Video Understanding Domain Generalization
To Augment or Not to Augment? Diagnosing Distributional Symmetry Breaking
MaskCO: Masked Generation Drives Effective Representation Learning and Exploiting for Combinatorial Optimization
THE END OF MANUAL DECODING: TOWARDS TRULY END-TO-END LANGUAGE MODELS
Diagnosing and Remedying Knowledge Deficiencies in LLMs via Label-free Curricular Meaningful Learning
Improving Text-guided CAD Prototyping via Modality-Specific Tokenization
WAVE: Learning Unified & Versatile Audio-Visual Embeddings with Multimodal LLM
ULTRA-360: Unconstrained Dataset for Large-scale Temporal 3D Reconstruction across Altitudes and Omnidirectional Views
Hierarchy-of-Groups Policy Optimization for Long-Horizon Agentic Tasks
Entropy-Guided Dynamic Tokens for Graph-LLM Alignment in Molecular Understanding
SPR$^2$Q: Static Priority-based Rectifier Routing Quantization for Image Super-Resolution
Uncertainty-Aware 3D Reconstruction for Dynamic Underwater Scenes
Process-Verified Reinforcement Learning for Theorem Proving via Lean
MoMaGen: Generating Demonstrations under Soft and Hard Constraints for Multi-Step Bimanual Mobile Manipulation
Learning to Segment for Vehicle Routing Problems
Toward Conservative Planning from Preferences in Offline Reinforcement Learning
Reasoning without Training: Your Base Model is Smarter Than You Think
Bi-Lipschitz Autoencoder With Injectivity Guarantee
Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention
MMR-V: What's Left Unsaid? A Benchmark for Multimodal Deep Reasoning in Videos
GAS: Enhancing Reward-Cost Balance of Generative Model-assisted Offline Safe RL
Online Black-Box Prompt Optimization with Regret Guarantees under Noisy Feedback
BAR: Refactor the Basis of Autoregressive Visual Generation
Difficulty–Diversity Collaborative Filtering for Data-Efficient LLM Fine-Tuning
Near-Optimal Second-Order Guarantees for Model-Based Adversarial Imitation Learning
Combinatorial Bandit Bayesian Optimization for Tensor Outputs
Variation in Verification: Understanding Verification Dynamics in Large Language Models
Geometric Constraints for Small Language Models to Understand and Expand Scientific Taxonomies
SelfReflect: Can LLMs Communicate Their Internal Answer Distribution?
Language Identification in the Limit with Computational Trace
Understanding Routing Mechanism in Mixture-of-Experts Language Models
Certified Evaluation of Model-Level Explanations for Graph Neural Networks
VADv2: End-to-End Autonomous Driving via Probabilistic Planning
Cartridges: Lightweight and general-purpose long context representations via self-study
Large Language Model Compression with Global Rank and Sparsity Optimization
Towards Understanding the Nature of Attention with Low-Rank Sparse Decomposition
Depth Anything with Any Prior
Egalitarian Gradient Descent: A Simple Approach to Accelerated Grokking
Multimodal Classification via Total Correlation Maximization
What's In My Human Feedback? Learning Interpretable Descriptions of Preference Data
On the Design of KL-Regularized Policy Gradient Algorithms for LLM Reasoning
PRISON: Unmasking the Criminal Potential of Large Language Models
Modeling Interference for Treatment Effect Estimation in Network Dynamic Environment
NerVE: Nonlinear Eigenspectrum Dynamics in LLM Feed-Forward Networks
Improved $\ell_{p}$ Regression via Iteratively Reweighted Least Squares
DEAS: DEtached value learning with Action Sequence for Scalable Offline RL
Compositional Visual Planning via Inference-Time Diffusion Scaling
An Improved Model-free Decision-estimation Coefficient with Applications in Adversarial MDPs
Multi-Task Low-Rank Model Adaptation
Enhancing LLMs for Knowledge Base Question Answering by Chain-of-Decomposition
Towards Real-World Routing with Neural Combinatorial Optimization
WebSeer: Training Deeper Search Agents through Reinforcement Learning with Self-Reflection
StoryAlign: Evaluating and Training Reward Models for Story Generation
Planned Diffusion
To View Transform or Not to View Transform: NeRF-based Pre-training Perspective
Cost-of-Pass: An Economic Framework for Evaluating Language Models
Identifiability and recoverability in self-supervised models
A Generalized Geometric Theoretical Framework of Centroid Discriminant Analysis for Linear Classification of Multi-dimensional Data
SIPDO: Closed-Loop Prompt Optimization via Synthetic Data Feedback
When More is Less: Understanding Chain-of-Thought Length in LLMs
Tab-MIA: A Benchmark Dataset for Membership Inference Attacks on Tabular Data in LLMs
TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate
Decentralized Attention Fails Centralized Signals: Rethinking Transformers for Medical Time Series
HAMLET: Switch Your Vision-Language-Action Model into a History-Aware Policy
MAPSS: Manifold-based Assessment of Perceptual Source Separation
Reliable Probabilistic Forecasting of Irregular Time Series through Marginalization-Consistent Flows
MMR-Life: Piecing Together Real-life Scenes for Multimodal Multi-image Reasoning
VisionLaw: Inferring Interpretable Intrinsic Dynamics from Visual Observations via Bilevel Optimization
Imitating the Truth: Attention-aware Truth-Guided Enhancement for Hallucination Mitigation in Large Vision-Language Models
Beyond Short Steps in Frank-Wolfe Algorithms
LLMs are Single-threaded Reasoners: Demystifying the Working Mechanism of Soft Thinking
Probing Rotary Position Embeddings through Frequency Entropy
Adaptive Augmentation-Aware Latent Learning for Robust LiDAR Semantic Segmentation
Closing the Modality Gap Aligns Group-Wise Semantics
Multi-Subspace Multi-Modal Modeling for Diffusion Models: Estimation, Convergence and Mixture of Experts
Predictive CVaR Q-learning
SketchEvo: Leveraging Drawing Dynamics for Enhanced Image Synthesis
ASSESS: A Semantic and Structural Evaluation Framework for Statement Similarity
A Guardrail for Safety Preservation: When Safety-Sensitive Subspace Meets Harmful-Resistant Null-Space
Solving the 2-norm k-hyperplane clustering problem via multi-norm formulations
On the trade-off between expressivity and privacy in graph representation learning
Nonparametric Contextual Online Bilateral Trade
Web-CogReasoner: Towards Knowledge-Induced Cognitive Reasoning for Web Agents
Logit‑KL Flow Matching: Non‑Autoregressive Text Generation via Sampling‑Hybrid Inference
Critic–Adviser–Reviser Cyclic Refinement: Towards High-Quality EMR Corpus Generation with LLMs
3D-aware Disentangled Representation for Compositional Reinforcement Learning
Map the Flow: Revealing Hidden Pathways of Information in VideoLLMs
BioMD: All-atom Generative Model for Biomolecular Dynamics Simulation
A Spectral-Grassmann Wasserstein metric for operator representations of dynamical systems
Learning to Summarize by Learning to Quiz: Adversarial Agentic Collaboration for Long Document Summarization
How Base Frequency Shapes RoPE: An Analytical Study of Frequency-Band Formation
ROC-n-reroll: How verifier imperfection affects test-time scaling
Remaining-data-free Machine Unlearning by Suppressing Sample Contribution
PRISM: Enhancing PRotein Inverse Folding through Fine- Grained Retrieval on Structure-Sequence Multimodal Representations
SONIC: Spectral Oriented Neural Invariant Convolutions
Toward Complex-Valued Neural Networks for Waveform Generation
Language and Experience: A Computational Model of Social Learning in Complex Tasks
Charts Are Not Images: On the Challenges of Scientific Chart Editing
SplitLoRA: Balancing Stability and Plasticity in Continual Learning Through Gradient Space Splitting
Non-Collaborative User Simulators for Tool Agents
Huxley-G\"odel Machine: Human-Level Coding Agent Development by an Approximation of the Optimal Self-Improving Machine
NI Sampling: Accelerating Discrete Diffusion Sampling by Token Order Optimization
Dual Distillation for Few-Shot Anomaly Detection
Robust Preference Alignment via Directional Neighborhood Consensus
DrugTrail: Explainable Drug Discovery via Structured Reasoning and Druggability‑Tailored Preference Optimization
A Reward-Free Viewpoint on Multi-Objective Reinforcement Learning
Interact-RAG: Reason and Interact with the Corpus, Beyond Black-Box Retrieval
Bayesian Test-Time Adaptation via Dirichlet feature projection and GMM-Driven Inference for Motor Imagery EEG Decoding
Merge before Forget: A Single LoRA Continual Learning via Continual Merging
Taming the Fragility of KV Cache Eviction in LLM Inference
Controllable Exploration in Hybrid-Policy RLVR for Multi-Modal Reasoning
PROS: Towards Compute-Efficient RLVR via Rollout Prefix Reuse
Hidden Patterns in Chain-of-Thought Reasoning
Distractor-free Generalizable 3D Gaussian Splatting
Vid-LLM: A Compact Video-based 3D Multimodal LLM with Reconstruction–Reasoning Synergy
In-Context Learning of Temporal Point Processes with Foundation Inference Models
ASGuard: Activation-Scaling Guard to Mitigate Targeted Jailbreaking Attack
Multi-Agent Design: Optimizing Agents with Better Prompts and Topologies
Short Window Attention Enables Long-Term Memorization
Reconciling Visual Perception and Generation in Diffusion Models
Memory-Free Continual Learning with Null Space Adaptation for Zero-Shot Vision-Language Models
Truthfulness Despite Weak Supervision: Evaluating and Training LLMs Using Peer Prediction
Hyperparameter Trajectory Inference with Conditional Lagrangian Optimal Transport
DeepSADR: Deep Transfer Learning with Subsequence Interaction and Adaptive Readout for Cancer Drug Response Prediction
Supporting High-Stakes Decision Making Through Interactive Preference Elicitation in the Latent Space
CARE: Towards Clinical Accountability in Multi-Modal Medical Reasoning with an Evidence-Grounded Agentic Framework
Beyond Classification Accuracy: Neural-MedBench and the Need for Deeper Reasoning Benchmarks
LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities
Non-Clashing Teaching in Graphs: Algorithms, Complexity, and Bounds
StochasTok: Improving Fine-Grained Subword Understanding in LLMs
Alternating Diffusion for Proximal Sampling with Zeroth Order Queries
R2-Dreamer: Redundancy-Reduced World Models without Decoders or Augmentation
PhaseFormer: From Patches to Phases for Efficient and Effective Time Series Forecasting
MoAlign: Motion-Centric Representation Alignment for Video Diffusion Models
WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data and Scalable Reinforcement Learning
DreamSwapV: Mask-guided Subject Swapping for Any Customized Video Editing
PiCa: Parameter-Efficient Fine-Tuning with Column Space Projection
RedacBench: Can AI Erase Your Secrets?
Hierarchical Multi-Stage Recovery Framework for Kronecker Compressed Sensing
UrbanGraph: Physics-Informed Spatio-Temporal Dynamic Heterogeneous Graphs for Urban Microclimate Prediction
Physics-Constrained Fine-Tuning of Flow-Matching Models for Generation and Inverse Problems
Spinning Straw into Gold: Relabeling LLM Agent Trajectories in Hindsight for Successful Demonstrations
Implicit Sensing for Fourier Sparse Boolean Functions
gLSTM: Mitigating Over-Squashing by Increasing Storage Capacity
Non-Asymptotic Analysis of (Sticky) Track-and-Stop
Poisson Midpoint Method for Log Concave Sampling: Beyond the Strong Error Lower Bounds
A Stitch in Time Saves Nine: Proactive Self-Refinement for Language Models
Fused-Planes: Why Train a Thousand Tri-Planes When You Can Share?
Mitigating Safety Fallback in Editing-based Backdoor Injection on LLMs
Retaining Suboptimal Actions to Follow Shifting Optima in Multi-Agent Reinforcement Learning
Agentic Reinforcement Learning with Implicit Step Rewards
Landing with the Score: Riemannian Optimization through Denoising
SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse–Linear Attention
PerfGuard: A Performance-Aware Agent for Visual Content Generation
Can SAEs reveal and mitigate racial biases of LLMs in healthcare?
Navigating the Latent Space Dynamics of Neural Models
Minimax-Optimal Aggregation for Density Ratio Estimation
Loopholing Discrete Diffusion: Deterministic Bypass of the Sampling Wall
Beyond Simple Graphs: Neural Multi-Objective Routing on Multigraphs
AgentMath: Empowering Mathematical Reasoning for Large Language Models via Tool-Augmented Agent
LongRLVR: Long-Context Reinforcement Learning Requires Verifiable Context Rewards
Understanding the Emergence of Seemingly Useless Features in Next-Token Predictors
Quantum machine learning advantages beyond hardness of evaluation
General search techniques without common knowledge for imperfect-information games, and application to superhuman Fog of War chess
PlantRSR: A New Plant Dataset and Method for Reference-based Super-Resolution
Convergence of Actor-Critic gradient flow for entropy regularised MDPs in general spaces
Online Rounding and Learning Augmented Algorithms for Facility Location
Silent Leaks: Implicit Knowledge Extraction Attack on RAG Systems
KinemaDiff: Towards Diffusion for Coherent and Physically Plausible Human Motion Prediction
Discrete Variational Autoencoding via Policy Search
Fine-Tuning Diffusion Models via Intermediate Distribution Shaping
Cut Less, Fold More: Model Compression through the Lens of Projection Geometry
Improving and Accelerating Offline RL in Large Discrete Action Spaces with Structured Policy Initialization
An Optimal Diffusion Approach to Quadratic Rate-Distortion Problems: New Solution and Approximation Methods
Confident and Adaptive Generative Speech Recognition via Conformal Risk Control
MarS-FM: Generative Modeling of Molecular Dynamics via Markov State Models
AdPO: Enhancing the Adversarial Robustness of Large Vision-Language Models with Preference Optimization
Test-Time Poisoned Sample Detection by Exploiting Shallow Malicious Matching in Backdoored CLIP
LycheeDecode: Accelerating Long-Context LLM Inference via Hybrid-Head Sparse Decoding
Simplicial Embeddings Improve Sample Efficiency in Actor–Critic Agents
Dimension-Free Decision Calibration for Nonlinear Loss Functions
SteinsGate: Adding Causality to Diffusions for Long Video Generation via Path Integral
A New Initialization to Control Gradients in Sinusoidal Neural Networks
Decomposing Extrapolative Problem Solving: Spatial Transfer and Length Scaling with Map Worlds
APC-RL: Exceeding data-driven behavior priors with adaptive policy composition
Hedonic Neurons: A Mechanistic Mapping of Latent Coalitions in Transformer MLPs
Metis: Training LLMs with FP4 Quantization
ReactID: Synchronizing Realistic Actions and Identity in Personalized Video Generation
Cost-Aware Dynamic Tree Construction for Efficient Large Language Model Inference
Rethinking Code Similarity for Automated Algorithm Design with LLMs
UIS-Digger: Towards Comprehensive Research Agent Systems for Real-world Unindexed Information Seeking
High-Dimensional Analysis of Single-Layer Attention for Sparse-Token Classification
Escaping Policy Contraction: Contraction-Aware PPO (CaPPO) for Stable Language Model Fine-Tuning
Cross-ControlNet: Training-Free Fusion of Multiple Conditions for Text-to-Image Generation
Mitigating Spurious Correlation via Distributionally Robust Learning with Hierarchical Ambiguity Sets
What happens when generative AI models train recursively on each others' outputs?
Efficient Ensemble Conditional Independence Test Framework for Causal Discovery
Flow Expansion via Verifier-Constrained Noised State Space Exploration
AssetFormer: Modular 3D Assets Generation with Autoregressive Transformer
Free Energy Mixer
Plan-Answer-Refine-on-Graph: Structured Planning and Self-Refinement for Large Language Model Reasoning on Knowledge Graphs
Goal Reaching with Eikonal-Constrained Hierarchical Quasimetric Reinforcement Learning
Implicit bias produces neural scaling laws in learning curves, from perceptrons to deep networks
Revisiting Global Text Conditioning in Diffusion Transformers
Continuous Audio Language Models
A Primer on SO(3) Action Representations in Deep Reinforcement Learning
Robust Generalized Schr\"{o}dinger Bridge via Sparse Variational Gaussian Processes
MoMa: A Simple Modular Learning Framework for Material Property Prediction
On the Shelf Life of Finetuned LLM-Judges: Future Proofing, Backward Compatibility, and Question Generalization
KV-Cache Transform Coding for Compact Storage in LLM Inference
Benchmarking LLM Tool-Use in the Wild
SpeechOp: Inference-Time Task Composition for Generative Speech Processing
LiveClin: A Live Clinical Benchmark without Leakage
Optimizing Canaries for Privacy Auditing with Metagradient Descent
Rethinking Radiology Report Generation: From Narrative Flow to Topic-Guided Findings
Representation-Based Exploration for Language Models: From Test-Time to Post-Training
Weak Correlations as the Underlying Principle for Linearization of Gradient-Based Learning Systems
Protein Structure Tokenization via Geometric Byte Pair Encoding
Neural Message-Passing on Attention Graphs for Hallucination Detection
Is Pure Exploitation Sufficient in Exogenous MDPs with Linear Function Approximation?
H$^3$GNNs: Harmonizing Heterophily and Homophily in GNNs via Self-Supervised Node Encoding
EditLens: Quantifying the Extent of AI Editing in Text
Diffusion Language Models For Code Infilling Beyond Fixed-size Canvas
Steering and Rectifying Latent representation manifolds in Frozen Multi-modal LLMs for Video Anomaly Detection
Spectral Bellman Method: Unifying Representation and Exploration in RL
Downgrade to Upgrade: Optimizer Simplification Enhances Robustness in LLM Unlearning
Scaling Direct Feedback Learning with Theoretical Guarantees
GuirlVG: Incentivize GUI Visual Grounding via Empirical Exploration on Reinforcement Learning
SESaMo: Symmetry-Enforcing Stochastic Modulation for Normalizing Flows
FantasyWorld: Geometry-Consistent World Modeling via Unified Video and 3D Prediction
Energy-Efficient Random Variate Generation via Compressed Lookup Tables
Learning Robust Intervention Representations with Delta Embeddings
Unified Analyses for Hierarchical Federated Learning: Topology Selection under Data Heterogeneity
Sharing State Between Prompts and Programs
Brain-Semantoks: Learning Semantic Tokens of Brain Dynamics with a Self-Distilled Foundation Model
Hierarchy Decoding: A Training-free Parallel Decoding Strategy for Diffusion Large Language Models
ConfHit: Conformal Generative Design via Nested Testing
Narrow Finetuning Leaves Clearly Readable Traces in the Activation Differences
Implicit Regularisation in Diffusion Models: An Algorithm-Dependent Generalisation Analysis
WRING Out The Bias: A Rotation-Based Alternative To Projection Debiasing
Latent Stochastic Interpolants
Verifier-free Test-Time Sampling for Vision Language Action Models
Task Vectors, Learned Not Extracted: Performance Gains and Mechanistic Insights
Watch your steps: Dormant Adversarial Behaviors that Activate upon LLM Finetuning
Prompt Curriculum Learning for Efficient LLM Post-Training
Discovering alternative solutions beyond the simplicity bias in recurrent neural networks
How Catastrophic is Your LLM? Certifying Risk in Conversation
Statistical Guarantees for Offline Domain Randomization
Fewer Weights, More Problems: A Practical Attack on LLM Pruning
Prior-based Noisy Text Data Filtering: Fast and Strong Alternative For Perplexity
LLM Fingerprinting via Semantically Conditioned Watermarks
Training Large Language Models To Reason In Parallel With Global Forking Tokens
On learning linear dynamical systems in context with attention layers
Understanding the Role of Training Data in Test-Time Scaling
The Sample Complexity of Online Reinforcement Learning: A Multi-model Perspective
Decoupled MeanFlow: Turning Flow Models into Flow Maps for Accelerated Sampling
Stochastic Neural Networks for Causal Inference with Missing Confounders
WARC-Bench: Web Archive based Benchmark for GUI Subtask Executions
Beyond DAGs: A Latent Partial Causal Model for Multimodal Learning
Latent-Guided Reasoning: Empowering Small LLMs with Large-Model Thinking
Low-Pass Filtering Improves Behavioral Alignment of Vision Models
A State-Transition Framework for Efficient LLM Reasoning
Hybrid Reinforcement: when reward is sparse, better to be dense
GEOMETRY OF UNCERTAINTY: LEARNING METRIC SPACES FOR MULTIMODAL STATE ESTIMATION IN RL
MATHMO: Automated Mathematical Modeling Through Adaptive Search
Private Rate-Constrained Optimization with Applications to Fair Learning
Measuring LLM Novelty As The Frontier Of Original And High-Quality Output
Fine-Grained Iterative Adversarial Attacks with Limited Computation Budget
Fair Policy Aggregation from Standard Policy Optimization
To Compress or Not? Pushing the Frontier of Lossless GenAI Model Weights Compression with Exponent Concentration
Code World Models for General Game Playing
On the Interpolation Effect of Score Smoothing in Diffusion Models
Learning Facts at Scale with Active Reading
Structural Inference: Interpreting Small Language Models with Susceptibilities
Algorithmic Guarantees for Distilling Supervised and Offline RL Datasets
Token Distillation: Attention-Aware Input Embeddings for New Tokens
Learning to Reason as Action Abstractions with Scalable Mid-Training RL
VCWorld: A Biological World Model for Virtual Cell Simulation
Predictability Shapes Adaptation: An Evolutionary Perspective on Modes of Learning in Transformers
Iterative Distillation for Reward-Guided Fine-Tuning of Diffusion Models in Biomolecular Design
UNDERSTANDING TRANSFORMERS FOR TIME SEIRES FORECASTING: A CASE STUDY ON MOIRAI
Inter-Agent Relative Representations for Multi-Agent Option Discovery
Time-Gated Multi-Scale Flow Matching for Time-Series Imputation
Cache What Lasts: Token Retention for Memory-Bounded KV Cache in LLMs
Distilling to Hybrid Attention Models via KL-Guided Layer Selection
SPRIG: Improving Large Language Model Performance by System Prompt Optimization
GraphPlanner: Graph-Based Agentic Routing for LLMs
Keep the Best, Forget the Rest: Reliable Alignment with Order-Aware Preference Optimization
Laplacian Kernelized Bandit
Closed-form $\ell_r$ norm scaling with data for overparameterized linear regression and diagonal linear networks under $\ell_p$ bias
Entropic Confinement and Mode Connectivity in Overparameterized Neural Networks
GEM: A Gym for Generalist LLMs
Preventing Model Collapse Under Overparametrization: Optimal Mixing Ratios for Interpolation Learning and Ridge Regression
On the Predictive Power of Representation Dispersion in Language Models
Self-Destructive Language Models
VITA: Zero-Shot Value Functions via Test-Time Adaptation of Vision–Language Models
Black-Box Privacy Attacks on Shared Representations in Multitask Learning
Model Collapse Is Not a Bug but a Feature in Machine Unlearning for LLMs
LLMs Can Hide Text in Other Text of the Same Length
Flattery, Fluff, and Fog: Diagnosing and Mitigating Idiosyncratic Biases in Preference Models
Identifying Robust Neural Pathways: Few-Shot Adversarial Mask Tuning for Vision-Language Models
Transfer Learning in Infinite Width Feature Learning Networks
AbsTopK: Rethinking Sparse Autoencoders For Bidirectional Features
Skirting Additive Error Lower Bounds for Private Turnstile Streams
Beyond URLs: Metadata Diversity and Position for Efficient LLM Pretraining
Adaptive Conformal Anomaly Detection with Time Series Foundation Models for Signal Monitoring.
Decoupled Q-Chunking
Learning to Reason over Continuous Tokens with Reinforcement Learning
Zero-Shot Adaptation of Behavioral Foundation Models to Unseen Dynamics
LVTINO: LAtent Video consisTency INverse sOlver for High Definition Video Restoration
In Good GRACES: Principled Teacher Selection for Knowledge Distillation
Is Your Paper Being Reviewed by an LLM? Benchmarking AI Text Detection in Peer Review
Gradient Descent with Large Step Sizes: Chaos and Fractal Convergence Region
Persona Features Control Emergent Misalignment
Comparing the learning dynamics of in-context learning and fine-tuning in language models
Automata Learning and Identification of the Support of Language Models
Generalization Below the Edge of Stability: The Role of Data Geometry
Prediction with Expert Advice under Local Differential Privacy
Neologism Learning for Controllability and Self-Verbalization
OpenApps: Simulating Environment Variations to Measure UI Agent Reliability
Mode-conditioning unlocks superior test-time compute scaling
In-Context Learning for Pure Exploration
An Information-Theoretical Framework For Optimizing Experimental Design To Distinguish Probabilistic Neural Codes
A Recovery Guarantee for Sparse Neural Networks
Trust-Region Adaptive Policy Optimization
Webscale-RL: Automated Data Pipeline for Scaling RL Data to Pretraining Levels
RedCodeAgent: Automatic Red-teaming Agent against Diverse Code Agents
Two (narrow) heads are better than (an arbitrarily wide) one
All Code, No Thought: Language Models Struggle to Reason in Ciphered Language
Accelerated co-design of robots through morphological pretraining
Learning Shrinks the Hard Tail: Training‑Dependent Inference Scaling in a Solvable Linear Model
Unified Multi-Modal Interactive and Reactive 3D Motion Generation via Rectified Flow
Truthful or Fabricated? Using Causal Attribution to Mitigate Reward Hacking in Explanations
Discovering Hierarchical Software Engineering Agents via Bandit Optimization
Efficient Testing for Correlation Clustering: Improved Algorithms and Optimal Bounds
Intrinsic Explanation of Random Subspace Method for Enhanced Security Applications
COLD-Steer: Steering Large Language Models via In-Context One-step Learning Dynamics
Sharpness-Aware Minimization in Logit Space Efficiently Enhances Direct Preference Optimization
Improving LLM Alignment with References
Searching for Privacy Risks in LLM Agents via Simulation
SynCoGen: Synthesizable 3D Molecule Generation via Joint Reaction and Coordinate Modeling
Is the Reversal Curse a Binding Problem? Uncovering Limitations of Transformers from a Basic Generalization Failure
FACT: a first-principles alternative to the Neural Feature Ansatz for how networks learn representations
Making, Not Taking, the Best of N
Revisiting the Scaling Properties of Downstream Metrics in Large Language Model Training
Inpainting-Guided Policy Optimization for Diffusion Large Language Models
RAVEN: End-to-end Equivariant Robot Learning with RGB Cameras
A Joint Diffusion Model with Pre-Trained Priors for RNA Sequence–Structure Co-Design
Neural Posterior Estimation with Latent Basis Expansions
The Tutor-Pupil Augmentation: Enhancing Learning and Interpretability via Input Corrections
The Diffusion Duality, Chapter II: $\Psi$-Samplers and Efficient Curriculum
MOBODY: Model-Based Off-Dynamics Offline Reinforcement Learning
Benefits and Limitations of Communication in Multi-Agent Reasoning
DataMIL: Selecting Data for Robot Imitation Learning with Datamodels
Strategic Planning and Rationalizing on Trees Make LLMs Better Debaters
MuonBP: Faster Muon via Block-Periodic Orthogonalization
TSLM: Tree-Structured Language Modeling for Divergent Thinking
Antislop: A Comprehensive Framework for Identifying and Eliminating Repetitive Patterns in Language Models
Expressive and Invariant Graph Learning via Canonical Tree Cover Neural Networks
When Scores Learn Geometry: Rate Separations under the Manifold Hypothesis
Flowing Through States: Neural ODE Regularization for Reinforcement Learning
OpenEstimate: Evaluating LLMs on Probabilistic Estimation with Real-World Data
The Limits of Inference Scaling Through Resampling
Every Language Model Has a Forgery-Resistant Signature
Representational Alignment Across Model Layers and Brain Regions with Hierarchical Optimal Transport
NeuCLIP: Efficient Large-Scale CLIP Training with Neural Normalizer Optimization
Pre-training under infinite compute
LABEL-FREE MITIGATION OF SPURIOUS CORRELATIONS IN VLMS USING SPARSE AUTOENCODERS
Once-More: Continuous Self-Correction for Large Language Models via Perplexity-Guided Intervention
Scaling Behavior of Discrete Diffusion Language Models
KL-Regularized Reinforcement Learning is Designed to Mode Collapse
Improving Feasibility via Fast Autoencoder-Based Projections
Weak-to-Strong Generalization with Failure Trajectories
Can we generate portable representations for clinical time series data using LLMs?
Dyna-Mind: Learning to Simulate from Experience for Better AI Agents
Splat Regression Models
Measuring the Intrinsic Dimension of Earth Representations
Mixing Mechanisms: How Language Models Retrieve Bound Entities In-Context
HUME: Measuring the Human-Model Performance Gap in Text Embedding Tasks
WARP: Weight Teleportation for Attack-Resilient Unlearning Protocols
Value Flows
SWERank: Software Issue Localization with Code Ranking
AIRE-Prune: Asymptotic Impulse-Response Energy for State Pruning in State Space Models
RLBFF: Binary Flexible Feedback to bridge between Human Feedback & Verifiable Rewards
FutureFill: Fast Generation from Convolutional Sequence Models
EXPO: Stable Reinforcement Learning with Expressive Policies
AutoLibra: Agent Metric Induction from Open-Ended Human Feedback
Infinite Horizon Markov Economies
Robust Decision-Making with Partially Calibrated Forecasters
Personalized Reasoning: Just-in-time Personalization and Why LLMs Fail at It
Why Less is More (Sometimes): A Theory of Data Curation
CodeSense: a Real-World Benchmark and Dataset for Code Semantic Reasoning
CyclicReflex: Improving Reasoning Models via Cyclical Reflection Token Scheduling
Offline Preference-Based Value Optimization
RFS: Reinforcement learning with Residual flow steering for dexterous manipulation
Towards Improvisational TAMP: Learning Low-Level Shortcuts in Abstract Planning Graphs
Are Deep Speech Denoising Models Robust to Adversarial Noise?
Foundational Automatic Evaluators: Scaling Multi-Task Generative Evaluator Training for Reasoning-Centric Domains
Difference-Aware Retrieval Polices for Imitation Learning
Language Confusion Gate: Language-Aware Decoding Through Model Self-Distillation
Log-Linear Attention
Strategic Obfuscation of Deceptive Reasoning in Language Models
When Does Divide and Conquer Work for Long Context LLM? A Noise Decomposition Framework
Clipped Gradient Methods for Nonsmooth Convex Optimization under Heavy-Tailed Noise: A Refined Analysis
Deep FlexQP: Accelerated Nonlinear Programming via Deep Unfolding
VideoJudge: Bootstrapping Enables Scalable Supervision of MLLM-as-a-Judge for Video Understanding
PTNET: A PROPOSAL-CENTRIC TRANSFORMER NET- WORK FOR 3D OBJECT DETECTION
Overtone: Cyclic Patch Modulation for Cleaner, Faster Physics Emulators
GuidedSampling: Steering LLMs Towards Diverse Candidate Solutions at Inference-Time
BFM-Zero: A Promptable Behavioral Foundation Model for Humanoid Control Using Unsupervised Reinforcement Learning
ULD-Net: Enabling Ultra-Low-Degree Fully Polynomial Networks for Homomorphically Encrypted Inference
LLMs Struggle to Balance Reasoning and World Knowledge in Causal Narrative Understanding
Tversky Neural Networks: Psychologically Plausible Deep Learning with Differentiable Tversky Similarity
IncVGGT: Incremental VGGT for Memory-Bounded Long-Range 3D Reconstruction
How Dark Patterns Manipulate Web Agents
Steering Autoregressive Music Generation with Recursive Feature Machines
Sculpting Subspaces: Constrained Full Fine-Tuning in LLMs for Continual Learning
Latent Fourier Transform
A Tale of Two Smoothness Notions: Adaptive Optimizers and Non-Euclidean Descent
Evolution and compression in LLMs: on the emergence of human-aligned categorization
Operator Theory-Driven Autoformulation of MDPs for Control of Queueing Systems
Why Attention Patterns Exist: A Unifying Temporal Perspective Analysis
LLM-Guided Evolutionary Program Synthesis for Quasi-Monte Carlo Design
Complementing Self-Consistency with Cross-Model Disagreement for Uncertainty Quantification
Spectral-guided Physical Dynamics Distillation
Any-Depth Alignment: Unlocking Innate Safety Alignment of LLMs to Any-Depth
The Natural Geometry of Code: Hyperbolic Representation Learning for Program Reasoning
Steering Evaluation-Aware Language Models To Act Like They Are Deployed
Natural Language PDDL (NL-PDDL) for Open-world Goal-oriented Commonsense Regression Planning in Embodied AI
Reward Is Enough: LLMs Are In-Context Reinforcement Learners
Forest-Based Graph Learning for Semi-Supervised Node Classification
A Hierarchical Circuit Symbolic Discovery Framework for Efficient Logic Optimization
Revisiting the Past: Data Unlearning with Model State History
PEAR: Phase Entropy Aware Reward for Efficient Reasoning
Cross-Domain Lossy Compression via Rate- and Classification-Constrained Optimal Transport
HiPO: Self-Hint Policy Optimization for RLVR
Transformers Trained via Gradient Descent Can Provably Learn a Class of Teacher Models
Test-Time Matching: Unlocking Compositional Reasoning in Multimodal Models
DriftLite: Lightweight Drift Control for Inference-Time Scaling of Diffusion Models
Learning multimodal dictionary decompositions with group-sparse autoencoders
Bures Generalized Category Discovery
Neon: Negative Extrapolation From Self-Training Improves Image Generation
Exploring the Potential of Encoder-free Architectures in 3D LMMs
Weak-to-Strong Diffusion
ZeroGR: A Generalizable and Scalable Framework for Zero-Shot Generative Retrieval
Negative Pre-activations Differentiate Syntax
Align to Misalign: Automatic LLM Jailbreak with Meta-Optimized LLM Judges
Emergence of Superposition: Unveiling the Training Dynamics of Chain of Continuous Thought
Flow Map Learning via Games
Counterfactual Structural Causal Bandits
SK2Decompile: LLM-based Two-Phase Binary Decompilation from Skeleton to Skin
NDAD: Negative-Direction Aware Decoding for Large Language Models via Controllable Hallucination Signal Injection
MergOPT: A Merge-Aware Optimizer for Robust Model Merging
Achieving Approximate Symmetry Is Exponentially Easier than Exact Symmetry
PixelVLA: Advancing Pixel-level Understanding in Vision-Language-Action Model
Long Chain-of-Thought Reasoning Across Languages
Scaling up Memory for Robotic Control via Experience Retrieval
LANE: Label-Aware Noise Elimination for Fine-Grained Text Classification
ConsisDrive: Identity-Preserving Driving World Models for Video Generation by Instance Mask
LLM-JEPA: Large Language Models Meet Joint Embedding Predictive Architectures
Hyper-SET: Designing Transformers via Hyperspherical Energy Minimization
Beyond Ensembles: Simulating All-Atom Protein Dynamics in a Learned Latent Space
Be Careful When Fine-tuning On Open-Source LLMs: Your Fine-tuning Data Could Be Secretly Stolen!
Strong Correlations Induce Cause Only Predictions in Transformer Training
Diffusion Language Models are Provably Optimal Parallel Samplers
Polychromic Objectives for Reinforcement Learning
SafeDPO: A Simple Approach to Direct Preference Optimization with Enhanced Safety
Swap-guided Preference Learning for Personalized Reinforcement Learning from Human Feedback
Efficient Message-Passing Transformer for Error Correcting Codes
TNT: Improving Chunkwise Training for Test-Time Memorization
WebDS: An End-to-End Benchmark for Web-based Data Science
Leveraging Discrete Function Decomposability for Scientific Design
Reliable Evaluation of MRI Motion Correction: Dataset and Insights
TwinFlow: Realizing One-step Generation on Large Models with Self-adversarial Flows
On the Convergence of Two-Layer Kolmogorov-Arnold Networks with First-Layer Training
Massive Editing for Large Language Models Based on Dynamic Weight Generation
Exploring Interpretability for Visual Prompt Tuning with Cross-layer Concepts
Contextual Similarity Distillation: Ensemble Uncertainties with a Single Model
Sampling Complexity of TD and PPO in RKHS
On Coreset for LASSO Regression Problem with Sensitivity Sampling
Tackling Heavy-Tailed Q-Value Bias in Offline-to-Online Reinforcement Learning with Laplace-Robust Modeling
Task-Agnostic Amortized Multi-Objective Optimization
Cross-Timestep: 3D Diffusion Model with Trans-temporal Memory LSTM and Adaptive Priori Decoding Strategy for Medical Segmentation
MoEEdit: Efficient and Routing-Stable Knowledge Editing for Mixture-of-Experts LLMs
Is On-Policy Data always the Best Choice for Direct Preference Optimization-Based LM Alignment?
Smarter Not Harder: Generative Process Evaluation with Intrinsic-Signal Driving and Ability‑Adaptive Reward Shaping
In Context Semi-Supervised Learning
Steering Language Models with Weight Arithmetic
Count Counts: Motivating Exploration in LLM Reasoning with Count-based Intrinsic Rewards
Comparing AI Agents to Cybersecurity Professionals in Real-World Penetration Testing
Modeling Others' Minds as Code
Attention as a Compass: Efficient Exploration for Process-Supervised RL in Reasoning Models
Condition Errors Refinement in Autoregressive Image Generation with Diffusion Loss
Enhancing Instruction Following of LLMs via Activation Steering with Dynamic Rejection
Exchangeability of GNN Representations with Applications to Graph Retrieval
Frozen Policy Iteration: Computationally Efficient RL under Linear $Q^{\pi}$ Realizability for Deterministic Dynamics
Reliable Fine-Grained Evaluation of Natural Language Math Proofs
Potentially Optimal Joint Actions Recognition for Cooperative Multi-Agent Reinforcement Learning
From f(x) and g(x) to f(g(x)): LLMs Learn New Skills in RL by Composing Old Ones
Flow2GAN: Hybrid Flow Matching and GAN with Multi-Resolution Network for One-/Two-step High-Fidelity Audio Generation
Detection of unknown unknowns in autonomous systems
LightMem: Lightweight and Efficient Memory-Augmented Generation
Do We Really Need Permutations? Impact of Width Expansion on Linear Mode Connectivity
Micro-Macro Retrieval: Reducing Long-Form Hallucination in Large Language Models
On the Lipschitz Continuity of Set Aggregation Functions and Neural Networks for Sets
Minor First, Major Last: A Depth-Induced Implicit Bias of Sharpness-Aware Minimization
Scalable Multilingual Multimodal Machine Translation with Speech-Text Fusion
Enough is as good as a feast: A Comprehensive Analysis of How Reinforcement Learning Mitigates Task Conflicts in LLMs
Complexity- and Statistics-Guided Anomaly Detection in Time Series Foundation Models
Programming by Backprop: Learning Behaviour from Symbolic Descriptions
Adaptive Methods Are Preferable in High Privacy Settings: An SDE Perspective
Random Controlled Differential Equations
EMFuse: Energy-based Model Fusion for Decision Making
When Language Models Lose Their Mind: The Consequences of Brain Misalignment
DeepCompress: A Dual Reward Strategy for Dynamically Exploring and Compressing Reasoning Chains
FrugalRAG: Less is More in RL Finetuning for Multi-hop Question Answering
EVEREST: A Transformer for Probabilistic Rare-Event Anomaly Detection with Evidential and Tail-Aware Uncertainty
Characterizing and Optimizing the Spatial Kernel of Multi Resolution Hash Encodings
PoinnCARE: Hyperbolic Multi-Modal Learning for Enzyme Classification
CORDS - Continuous Representations of Discrete Structures
Submodular Function Minimization with Dueling Oracle
Quadratic Direct Forecast for Training Multi-Step Time-Series Forecast Models
Jet Expansions: Restructuring LLM Computation for Model Inspection
Lavida-O: Elastic Large Masked Diffusion Models for Unified Multimodal Understanding and Generation
AMLRIS: Alignment-aware Masked Learning for Referring Image Segmentation
Accelerating Eigenvalue Dataset Generation via Chebyshev Subspace Filter
Understanding Collaboration Mechanism In VAE Recommender Systems
SAFER: Risk-Constrained Sample-then-Filter in Large Language Models
Rote Learning Considered Useful: Generalizing over Memorized Data in LLMs
Explain in Your Own Words: Improving Reasoning via Token-Selective Dual Knowledge Distillation
CALM: Co-evolution of Algorithms and Language Model for Automatic Heuristic Design
Dynamic Multi-sample Mixup with Gradient Exploration for Open-set Graph Anomaly Detection
DoVer: Intervention-Driven Auto Debugging for LLM Multi-Agent Systems
Pretrain Value, Not Reward: Decoupled Value Policy Optimization
On the Direction of RLVR Updates for LLM Reasoning: Identification and Exploitation
Doubly-Regressing Approach for Subgroup Fairness
A Fair Bayesian Inference through Matched Gibbs Posterior
DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents
TSM-Bench: Detecting LLM-Generated Text in Real-World Wikipedia Editing Practices
Training Large Reasoning Models Efficiently via Progressive Thought Encoding
Lossless Vocabulary Reduction for Auto-Regressive Language Models
EEPO: Exploration-Enhanced Policy Optimization via Sample-Then-Forget
Shop-R1: Rewarding LLMs to Simulate Human Behavior in Online Shopping via Reinforcement Learning
AesCoder: Code Aesthetics with Agentic Reward Feedback
floq: Training Critics via Flow-Matching for Scaling Compute in Value-Based RL
Error as Signal: Stiffness-Aware Diffusion Sampling via Embedded Runge-Kutta Guidance
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
CONSIGN: Conformal Segmentation Informed by Spatial Groupings via Decomposition
Fine-tuning Behavioral Cloning Policies with Preference‑Based Reinforcement Learning
Patch-as-Decodable-Token: Towards Unified Multi-Modal Vision Tasks in MLLMs
Video-KTR: Reinforcing Video Reasoning via Key Token Attribution
WorldTree: Towards 4D Dynamic Worlds from Monocular Video using Tree-Chains
The Overthinking Predicament: When Reasoning Hurts Ranking
Tools are under-documented: Simple Document Expansion Boosts Tool Retrieval
Entropy-Monitored Kernelized Token Distillation for Audio-Visual Compression
Exploring the Basin-Like Loss Landscape in Large Language Models
When Machine Learning Gets Personal: Evaluating Prediction and Explanation
Beyond Masks: Efficient, Flexible Diffusion Language Models via Deletion-Insertion Processes
DePO: Demonstration-guided Policy Optimization for Molecular Optimization
Deconstructing Guidance: A Semantic Hierarchy for Precise Diffusion Model Editing
TINY BUT MIGHTY: A SOFTWARE-HARDWARE CO- DESIGN APPROACH FOR EFFICIENT MULTIMODAL IN- FERENCE ON BATTERY-POWERED SMALL DEVICES
Neuron-Aware Data Selection in Instruction Tuning for Large Language Models
RCPU: Rotation-Constrained Error Compensation for Structured Pruning of a Large Language Model
Image Can Bring Your Memory Back: A Novel Multi-Modal Guided Attack against Image Generation Model Unlearning
Online Inventory Optimization in Non-Stationary Environment
Accelerating Diffusion Planners in Offline RL via Reward-Aware Consistency Trajectory Distillation
TESSAR: Geometry-Aware Active Regression via Dynamic Voronoi Tessellation
Do LLMs Forget What They Should? Evaluating In-Context Forgetting in Large Language Models
Dynamics-Predictive Sampling for Active RL Finetuning of Large Reasoning Models
MiSS: Revisiting the Trade-off in LoRA with an Efficient Shard-Sharing Structure
Deep Latent Variable Model based Vertical Federated Learning with Flexible Alignment and Labeling Scenarios
PreferThinker: Reasoning-based Personalized Image Preference Assessment
Transductive Visual Programming: Evolving Tool Libraries from Experience for Spatial Reasoning
Towards Generalizable PDE Dynamics Forecasting via Physics-Guided Invariant Learning
Designing Rules to Pick a Rule: Aggregation by Consistency
Hyperspherical Latents Improve Continuous-Token Autoregressive Generation
Teaching LLMs to Admit Uncertainty in OCR
Enhancing Generative Auto-bidding with Offline Reward Evaluation and Policy Search
Person-Centric Annotations of LAION-400M: Auditing Bias and Its Transfer to Models
AFTER: Mitigating the Object Hallucination of LVLM via Adaptive Factual-Guided Activation Editing
LLM Unlearning with LLM Beliefs
AEGIS: Adversarial Target–Guided Retention-Data-Free Robust Concept Erasure from Diffusion Models
Autoregressive Image Generation with Randomized Parallel Decoding
Dual-IPO: Dual-Iterative Preference Optimization for Text-to-Video Generation
DARE-bench: Evaluating Modeling and Instruction Fidelity of LLMs in Data Science
On Discovering Algorithms for Adversarial Imitation Learning
X-VLA: Soft-Prompted Transformer as Scalable Cross-Embodiment Vision-Language-Action Model
Pyramid Patchification Flow for Visual Generation
Tactic: Adaptive Sparse Attention with Clustering and Distribution Fitting for Long-Context LLMs
Reasoning Boosts Opinion Alignment in LLMs
Text-Aware Image Restoration with Diffusion Models
Hierarchical Concept-based Interpretable Models
Mitigating Mismatch within Reference-based Preference Optimization
RLAP-CLIP: Continual Multimodal Learning with Prototype Adaptation and Difficulty-Aware Routing
Fixing the Broken Compass: Diagnosing and Improving Inference-Time Reward Modeling
On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting
Group-Relative REINFORCE Is Secretly an Off-Policy Algorithm: Demystifying Some Myths About GRPO and Its Friends
TPRU: Advancing Temporal and Procedural Understanding in Large Multimodal Models
AgenTracer: Who Is Inducing Failure in the LLM Agentic Systems?
SPWOOD: Sparse Partial Weakly-Supervised Oriented Object Detection
pFedMMA: Personalized Federated Fine-Tuning with Multi-Modal Adapter for Vision-Language Models
SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning
VisuRiddles: Fine-grained Perception is a Primary Bottleneck for Multimodal Large Language Models in Abstract Visual Reasoning
OccDriver: Future Occupancy Guided Dual-branch Trajectory Planner in Autonomous Driving
Reverse-Engineered Reasoning for Open-Ended Generation
SenseFlow: Scaling Distribution Matching for Flow-based Text-to-Image Distillation
Why Reinforcement Fine-Tuning Enables MLLMs Preserve Prior Knowledge Better: A Data Perspective
Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping
Interactive Agents to Overcome Underspecificity in Software Engineering
Towards All-Atom Foundation Models for Biomolecular Binding Affinity Prediction
Annotation-Efficient Honesty Alignment via Confidence Elicitation and Calibration
Translation Heads: Unveiling Attention's Role in LLM Multilingual Translation
Watch the Weights: Unsupervised monitoring and control of fine-tuned LLMs
GranViT: A Fine-Grained Vision Model With Autoregressive Perception For MLLMs
Learning to Adapt: In-Context Learning Beyond Stationarity
OmniEVA: Embodied Versatile Planner via Task-Adaptive 3D-Grounded and Embodiment-aware Reasoning
FAPO: Flawed-Aware Policy Optimization for Efficient and Reliable Reasoning
Reformulation for Pretraining Data Augmentation
Group Critical-token Policy Optimization for Autoregressive Image Generation
Constraint-guided Hardware-aware NAS through Gradient Modification
Financial fraud collusion among generative AI agents in social networks
Spilling the Beans: Teaching LLMs to Self-Report Their Hidden Objectives
VitaBench: Benchmarking LLM Agents with Versatile Interactive Tasks in Real-world Applications
TyphoonMLA: A Mixed Naive-Absorb MLA Kernel For Shared Prefix
Stochastic Self-Organization in Multi-Agent Systems
LoFT: Low-Rank Adaptation That Behaves Like Full Fine-Tuning
Block-wise Adaptive Caching for Accelerating Diffusion Policy
Native Reasoning Models: Training Language Models to Reason on Unverifiable Data
Preserving Forgery Artifacts: AI-Generated Video Detection at Native Scale
Decoding Dynamic Visual Experience from Calcium Imaging via Cell-Pattern-Aware SSL
Low-Latency Neural LiDAR Compression with 2D Context Models
From Language to Locomotion: Retargeting-free Humanoid Control via Motion Latent Guidance
Information Gain-based Policy Optimization: A Simple and Effective Approach for Multi-Turn LLM Agents
Judo: A Juxtaposed Domain-oriented Multimodal Reasoner for Industrial Anomaly QA
Towards Multimodal Data-Driven Scientific Discovery Powered by LLM Agents
ChainGPT: Dual-Reasoning Model with Recurrent Depth and Multi-Rank State Updates
Learning under Quantization for High-Dimensional Linear Regression
Discovering heterogeneous synaptic plasticity rules via large-scale neural evolution
Disentangling Knowledge Representations for Large Language Model Editing
Propaganda AI: An Analysis of Semantic Divergence in Large Language Models
Adversarial Attacks Already Tell the Answer: Directional Bias-Guided Test-time Defense for Vision-Language Models
Model-Guided Microstimulation Steers Primate Visual Behavior
LS-Merge: Merging Language Models in Latent Space
SCUBA: Salesforce Computer Use Benchmark
WALT: Web Agents that Learn Tools
GTA1: GUI Test-time Scaling Agent
Resp-Agent: An Agent-Based System for Multimodal Respiratory Sound Generation and Disease Diagnosis
Strict Subgoal Execution: Reliable Long-Horizon Planning in Hierarchical Reinforcement Learning
Towards Lossless Memory-efficient Training of Spiking Neural Networks via Gradient Checkpointing and Spike Compression
Fine-tuning Done Right in Model Editing
Transitive RL: Value Learning via Divide and Conquer
Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search
DuPO: Enabling Reliable Self-Verification via Dual Preference Optimization
HSIC Bottleneck for Cross-Generator and Domain-Incremental Synthetic Image Detection
CoDA: Agentic Systems for Collaborative Data Visualization
Generalised Flow Maps for Few-Step Generative Modelling on Riemannian Manifolds
Physics-Informed Audio-Geometry-Grid Representation Learning for Universal Sound Source Localization
ERGO: Efficient High-Resolution Visual Understanding for Vision-Language Models
STORM: Synergistic Cross-Scale Spatio-Temporal Modeling for Weather Forecasting
Multimodal Policy Internalization for Conversational Agents
On Code-Induced Reasoning in LLMs
Bi-Criteria Metric Distortion
UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning
From Seeing to Doing: Bridging Reasoning and Decision for Robotic Manipulation
Embodied-R1: Reinforced Embodied Reasoning for General Robotic Manipulation
KaLM-Embedding-V2: Superior Training Techniques and Data Inspire A Versatile Embedding Model
InternSpatial: A Comprehensive Dataset for Spatial Reasoning in Vision-Language Models
Weight Decay may matter more than µP for Learning Rate Transfer in Practice
DSSA: Dense-Sparse Switchable Attention for Seamless Short-to-Long Adaptation
Scaling Agent Learning via Experience Synthesis
Speech World Model: Causal State–Action Planning with Explicit Reasoning for Speech
CloDS: Visual-Only Unsupervised Cloth Dynamics Learning in Unknown Conditions
Feature compression is the root cause of adversarial fragility in neural networks
Evaluating GFlowNet from partial episodes for stable and flexible policy-based training
End-to-End Probabilistic Framework for Learning with Hard Constraints
Incomplete Data, Complete Dynamics: A Diffusion Approach
Faster Vision Transformers with Adaptive Patches
Prosperity before Collapse: How Far Can Off-Policy RL Reach with Stale Data on LLMs?
End-to-end Listen, Look, Speak and Act
Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation
Pedagogically-Inspired Data Synthesis for Language Model Knowledge Distillation
PASER: Post-Training Data Selection for Efficient Pruned Large Language Model Recovery
Align-SAM: Seeking Flatter Minima for Better Cross-Subset Alignment
Towards a Universally Transferable Acceleration Method for Density Functional Theory
Uncover Underlying Correspondence for Robust Multi-view Clustering
From Neural Networks to Logical Theories: The Correspondence between Fibring Modal Logics and Fibring Neural Networks
Fine-Grained Activation Steering: Steering Less, Achieving More
Grounding or Guessing? Visual Signals for Detecting Hallucinations in Sign Language Translation
ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding
Causality ≠ Invariance: Function vs Concept Vectors in LLMs
Predicting LLM Reasoning Performance with Small Proxy Model
Revisiting [CLS] and Patch Token Interaction in Vision Transformers
Uncertainty Estimation via Hyperspherical Confidence Mapping
Token-Efficient Long-Term Interest Sketching and Internalized Reasoning for LLM-based Recommendation
Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss
$\pi^3$: Permutation-Equivariant Visual Geometry Learning
OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling
WinT3R: Window-Based Streaming Reconstruction with Camera Token Pool
Consis-GCPO: Consistency-Preserving Group Causal Preference Optimization for Vision Customization
MOSAIC: Multi-Subject Personalized Generation via Correspondence-Aware Alignment and Disentanglement
JanusCoder: Towards a Foundational Visual-Programmatic Interface for Code Intelligence
Faster Parameter-Free Regret Matching Algorithms
CoMAS: Co-Evolving Multi-Agent Systems via Interaction Rewards
Random Spiking Neural Networks are Stable and Spectrally Simple
NLI : Non-uniform Linear Interpolation Approximation of Nonlinear Operations for Efficient LLMs Inference
Epistemic Uncertainty Quantification To Improve Decisions From Black-Box Models
Enhancing Image-Conditional Coverage in Segmentation: Adaptive Thresholding via Differentiable Miscoverage Loss
What Layers When: Learning to Skip Compute in LLMs with Residual Gates
Rejuvenating Cross-Entropy Loss in Knowledge Distillation for Recommender Systems
Pragma-VL: Towards a Pragmatic Arbitration of Safety and Helpfulness in MLLMs
Aurelius: Relation Aware Text-to-Audio Generation At Scale
CLIP-FMoE: Scalable CLIP via Fused Mixture-of-Experts with Enforced Specialization
Learning Ising Models under Hard Constraints using One Sample
Importance Sampling for Multi-Negative Multimodal Direct Preference Optimization
Efficient Offline Reinforcement Learning via Peer-Influenced Constraint
Enabling Your Forensic Detector Know How Well It Performs on Distorted Samples
SERE: Similarity-based Expert Re-routing for Efficient Batch Decoding in MoE Models
Shift-and-Sum Quantization for Visual Autoregressive Models
RPG: A Repository Planning Graph for Unified and Scalable Codebase Generation
TrimR: Verifier-based Training-Free Thinking Trimming for Efficient Test-Time Scaling
PIRN: Prototypical-based Intra-modal Reconstruction with Normality Communication for Multi-modal Anomaly Detection.
ODI-Bench: Can MLLMs Understand Immersive Omnidirectional Environments?
SDErasure: Concept-Specific Trajectory Shifting for Concept Erasure via Adaptive Diffusion Classifier
EvA: Evolutionary Attacks on Graphs
Enhancing Geometric Perception in VLMs via Translator-Guided Reinforcement Learning
WebArbiter: A Generative Reasoning Process Reward Model for Web Agents
Multiple Token Divergence: Measuring and Steering In-Context Computation Density
DecompGAIL: Learning Realistic Traffic Behaviors with Decomposed Multi-Agent Generative Adversarial Imitation Learning
PerFit: Exploring Personalization Shifts in Representation Space of LLMs
Overshoot and Shrinkage in Classifier-Free Guidance: From Theory to Practice
PromptHub: Enhancing Multi-Prompt Visual In-Context Learning with Locality-Aware Fusion, Concentration and Alignment
CoPRS: Learning Positional Prior from Chain-of-Thought for Reasoning Segmentation
MASAM: Multimodal Adaptive Sharpness-Aware Minimization for Heterogeneous Data Fusion
Learning Self-Critiquing Mechanisms for Region-Guided Chest X-Ray Report Generation
The Value of Information in Human-AI Decision-making
CP-Agent: Context‑Aware Multimodal Reasoning for Cellular Morphological Profiling under Chemical Perturbations
Towards Quantization-Aware Training for Ultra-Low-Bit Reasoning LLMs
Diversity-Incentivized Exploration for Versatile Reasoning
Nef-Net+: Adapting Electrocardio Panorama in the wild
Scaling with Collapse: Efficient and Predictable Training of LLM Families
Unlocking the Power of Multi-Agent LLM for Reasoning: From Lazy Agents to Deliberation
UniCA: Unified Covariate Adaptation for Time Series Foundation Model
Pruning Long Chain-of-Thought of Large Reasoning Models via Small-Scale Preference Optimization
GDGB: A Benchmark for Generative Dynamic Text-Attributed Graph Learning
Many-for-Many: Unify the Training of Multiple Video and Image Generation and Manipulation Tasks
Stage-wise Dynamics of Classifier-Free Guidance in Diffusion Models
Concept-Aware Privacy Mechanisms for Defending Embedding Inversion Attacks
SPREAD: Sampling-based Pareto front Refinement via Efficient Adaptive Diffusion
Fairness via Independence: A General Regularization Framework for Machine Learning
Extending the Context of Pretrained LLMs by Dropping Their Positional Embedding
Improved Adversarial Diffusion Compression for Real-World Video Super-Resolution
Reasoning as Representation: Rethinking Visual Reinforcement Learning in Image Quality Assessment
We use cookies to store which papers have been visited.
I agree
Successful Page Load
ICLR uses cookies for essential functions only. We do not sell your personal information.
Our Privacy Policy »
Accept
We use cookies to store which papers have been visited.
I agree