Downloads 2026
Number of events: 5398
- $\alpha$-DPO: Robust Preference Alignment for Diffusion Models via $\alpha$ Divergence
- $AutoDrive\text{-}P^3$: Unified Chain of Perception–Prediction–Planning Thought via Reinforcement Fine-Tuning
- $\boldsymbol{\partial^\infty}$-Grid: Differentiable Grid Representations for Fast and Accurate Solutions to Differential Equations
- $\ell_1$ Latent Distance based Continuous-time Graph Representation
- $\mathbf{Li_2}$: A Framework on Dynamics of Feature Emergence and Delayed Generalization
- $\mathbf{T^3}$: Reducing Belief Deviation in Reinforcement Learning for Active Reasoning
- $\mu$LO: Compute-Efficient Meta-Generalization of Learned Optimizers
- $\nabla$-Reasoner: LLM Reasoning via Test-Time Gradient Descent in Textual Space
- $PhyWorldBench$: A Comprehensive Evaluation of Physical Realism in Text-to-Video Models
- $\pi^3$: Permutation-Equivariant Visual Geometry Learning
- $p\textrm{-less}$ Sampling: A Robust Hyperparameter-Free Approach for LLM Decoding
- $\textbf{Re}^{2}$: Unlocking LLM Reasoning via Reinforcement Learning with Re-solving
- $\textit{MADFormer}$: Mixed Autoregressive and Diffusion Transformers for Continuous Image Generation
- 1st ICLR Workshop on Time Series in the Age of Large Models
- 2nd Workshop on World Models: Understanding, Modelling and Scaling
- 3D-aware Disentangled Representation for Compositional Reinforcement Learning
- 3D Aware Region Prompted Vision Language Model
- 3DCS: Datasets and Benchmark for Evaluating Conformational Sensitivity in Molecular Representations
- 3DGEER: 3D Gaussian Rendering Made Exact and Efficient for Generic Cameras
- 3D RNA Inverse Design with Reinforcement Learning-Guided Diffusion Models
- 3D Scene Prompting for Scene-Consistent Camera-Controllable Video Generation
- 3DSMT: A Hybrid Spiking Mamba-Transformer for Point Cloud Analysis
- 3rd Workshop on Navigating and Addressing Data Problems For Foundation Models (DATA-FM)
- 4th ICLR Workshop on Machine Learning for Remote Sensing
- A$^2$FM: An Adaptive Agent Foundation Model for Tool-Aware Hybrid Reasoning
- A$^2$Search: Ambiguity-Aware Question Answering with Reinforcement Learning
- A2ASecBench: A Protocol-Aware Security Benchmark for Agent-to-Agent Multi-Agent Systems
- A2D: Any-Order, Any-Step Safety Alignment for Diffusion Language Models
- A^2TG: Adaptive Anisotropic Textured Gaussians for Efficient 3D Scene Representation
- A Balanced Neuro-Symbolic Approach for Commonsense Abductive Logic
- A Bayesian Nonparametric Framework For Learning Disentangled Representations
- A Bayesian Nonparametric Framework for Private, Fair, and Balanced Tabular Data Synthesis
- ABBA-Adapters: Efficient and Expressive Fine-Tuning of Foundation Models
- AbdCTBench: Learning Clinical Biomarker Representations from Abdominal Surface Geometry
- A Benchmark for Deep Information Synthesis
- A Biologically Plausible Dense Associative Memory with Exponential Capacity
- A Block Coordinate Descent Method for Nonsmooth Composite Optimization under Orthogonality Constraints
- A Brain Graph Foundation Model: Pre-Training and Prompt-Tuning across Broad Atlases and Disorders
- A Brain-Inspired Gating Mechanism Unlocks Robust Computation in Spiking Neural Networks
- AbsTopK: Rethinking Sparse Autoencoders For Bidirectional Features
- Abstracting Robot Manipulation Skills via Mixture-of-Experts Diffusion Policies
- AbstRaL: Augmenting LLMs' Reasoning by Reinforcing Abstract Thinking
- ACADREASON: Exploring the Limits of Reasoning Models with Academic Research Problems
- Accelerated co-design of robots through morphological pretraining
- Accelerated Learning with Linear Temporal Logic using Differentiable Simulation
- Accelerated Parallel Tempering via Neural Transports
- Accelerating Diffusion Large Language Models with SlowFast Sampling: The Three Golden Principles
- Accelerating Diffusion Planners in Offline RL via Reward-Aware Consistency Trajectory Distillation
- Accelerating Eigenvalue Dataset Generation via Chebyshev Subspace Filter
- Accelerating Inference for Multilayer Neural Networks with Quantum Computers
- Accelerating Materials Design via LLM-Guided Evolutionary Search
- Accessible, Realistic, and Fair Evaluation of Positive-Unlabeled Learning Algorithms
- ACCORD: Alleviating Concept Coupling through Dependence Regularization for Text-to-Image Diffusion Personalization
- ACE: Attribution-Controlled Knowledge Editing for Multi-hop Factual Recall
- ACE-Bench: Benchmarking Agentic Coding in End-to-End Development of Complex Features
- AceReason-Nemotron 1.1: Advancing Math and Code Reasoning through SFT and RL Synergy
- AC-Foley: Reference-Audio-Guided Video-to-Audio Synthesis with Acoustic Transfer
- Achieving Approximate Symmetry Is Exponentially Easier than Exact Symmetry
- Achieving Expert-Level Agent from Foundation Model via Complexity Curriculum Reinforcement Learning with Synthetic Data
- Achieving low-bit Muon through subspace preservation and grid quantization
- A Cognitive Process-Inspired Architecture for Subject-Agnostic Brain Visual Decoding
- A Comprehensive Information-Decomposition Analysis of Large Vision-Language Models
- A Convergence Analysis of Adaptive Optimizers under Floating-point Quantization
- ACPBench Hard: Unrestrained Reasoning about Action, Change, and Planning
- AC-Sampler: Accelerate and Correct Diffusion Sampling with Metropolis-Hastings Algorithm
- Action-aware Dynamic Pruning for Efficient Vision-Language-Action Manipulation
- Action Chunking and Data Augmentation Yield Exponential Improvements for Imitation Learning in Continuous Spaces
- Action-Free Offline-To-Online RL via Discretised State Policies
- Action-Guided Attention for Video Action Anticipation
- Actions as Language: Fine-Tuning VLMs into VLAs Without Catastrophic Forgetting
- Actions Speak Louder than Prompts: A Large-Scale Study of LLMs for Graph Inference
- Activation Function Design Sustains Plasticity in Continual Learning
- ActivationReasoning: Logical Reasoning in Latent Activation Spaces
- Activation Steering for LLM Alignment via a Unified ODE-Based Framework
- Activation Steering with a Feedback Controller
- ActiveCQ: Active Estimation of Causal Quantities
- ActiveDPO: Active Direct Preference Optimization for Sample-Efficient Alignment
- Active Learning for Decision Trees with Provable Guarantees
- Active Learning of 3D Gaussian Splatting with Consistent Region Partition and Robust Pose Estimation
- AdaCache: Adaptive Caching and Context Augmentation for Efficient LLM Serving
- Ada-Diffuser: Latent-Aware Adaptive Diffusion for Decision-Making
- AdAEM: An Adaptively and Automated Extensible Evaluation Method of LLMs' Value Difference
- Adapt Data to Model: Adaptive Transformation Optimization for Domain-shared Time Series Foundation Models
- Adapting Self-Supervised Representations as a Latent Space for Efficient Generation
- Adaptive Acquisition Selection for Bayesian Optimization with Large Language Models
- Adaptive Attacks on Trusted Monitors Subvert AI Control Protocols
- Adaptive Augmentation-Aware Latent Learning for Robust LiDAR Semantic Segmentation
- Adaptive Canonicalization with Application to Invariant Anisotropic Geometric Networks
- Adaptive Collaboration with Humans: Metacognitive Policy Optimization for Multi-Agent LLMs with Continual Learning
- Adaptive Concept Discovery for Interpretable Few-Shot Text Classification
- Adaptive Conformal Anomaly Detection with Time Series Foundation Models for Signal Monitoring.
- Adaptive Conformal Guidance for Learning under Uncertainty
- Adaptive Conformal Prediction via Mixture-of-Experts Gating Similarity
- Adaptive Data-Knowledge Alignment in Genetic Perturbation Prediction
- Adaptive Debiasing Tsallis Entropy for Test-Time Adaptation
- Adaptive Domain Shift in Diffusion Models for Cross-Modality Image Translation
- Adaptive Gaussian Expansion for On-the-fly Category Discovery
- Adaptive gradient descent on Riemannian manifolds and its applications to Gaussian variational inference
- Adaptive Hopfield Network: Rethinking Similarities in Associative Memory
- Adaptive Logit Adjustment for Debiasing Multimodal Language Models
- Adaptive Mamba Neural Operators
- Adaptive Methods Are Preferable in High Privacy Settings: An SDE Perspective
- Adaptive Mixture of Disentangled Experts for Dynamic Graphs under Distribution Shifts
- Adaptive Moments are Surprisingly Effective for Plug-and-Play Diffusion Sampling
- Adaptive Nonlinear Compression for Large Foundation Models
- Adaptive Regularization for Large-Scale Sparse Feature Embedding Models
- Adaptive Rollout Allocation for Online Reinforcement Learning with Verifiable Rewards
- Adaptive Scaling of Policy Constraints for Offline Reinforcement Learning
- Adaptive Social Learning via Mode Policy Optimization for Language Agents
- Adaptive Test-Time Training for Predicting Need for Invasive Mechanical Ventilation in Multi-Center Cohorts
- Adaptive Thinking: Large Language Models Know When to Think in Latent Space
- Adaptive Width Neural Networks
- AdaRank: Adaptive Rank Pruning for Enhanced Model Merging
- AdaReasoner: Dynamic Tool Orchestration for Iterative Visual Reasoning
- AdaSpec: Adaptive Spectrum for Enhanced Node Distinguishability
- AdaViewPlanner: Adapting Video Diffusion Models for Viewpoint Planning in 4D Scenes
- Addressing divergent representations from causal interventions on neural networks
- Addressing Pitfalls in the Evaluation of Uncertainty Estimation Methods for Natural Language Generation
- A Dense Subset Index for Collective Query Coverage
- ADEPT: Continual Pretraining via Adaptive Expansion and Dynamic Decoupled Tuning
- A Derandomization Framework for Structure Discovery: Applications in Neural Networks and Beyond
- Adjusting Prediction Model Through Wasserstein Geodesic for Causal Inference
- ADM-v2: Pursuing Full-Horizon Roll-out in Dynamics Models for Offline Policy Learning and Evaluation
- AdPO: Enhancing the Adversarial Robustness of Large Vision-Language Models with Preference Optimization
- AdS-GNN - a Conformally Equivariant Graph Neural Network
- Ads that Stick: Near-Optimal Ad Optimization through Psychological Behavior Models
- Advancing Complex Video Object Segmentation via Progressive Concept Construction
- Advancing End-to-End Pixel-Space Generative Modeling via Self-Supervised Pre-Training
- Advancing Multi-agent Traffic Simulation via R1-Style Reinforcement Fine-Tuning
- Advancing Spatiotemporal Representations in Spiking Neural Networks via Parametric Invertible Transformation
- Advancing Universal Deep Learning for Electronic-Structure Hamiltonian Prediction of Materials
- AdvChain: Adversarial Chain-of-Thought Tuning for Robust Safety Alignment of Large Reasoning Models
- Adversarial Attacks Already Tell the Answer: Directional Bias-Guided Test-time Defense for Vision-Language Models
- Adversarial Déjà Vu: Jailbreak Dictionary Learning for Stronger Generalization to Unseen Attacks
- Adversarially Pretrained Transformers may be Universally Robust In-Context Learners
- AEGIS: Adversarial Target–Guided Retention-Data-Free Robust Concept Erasure from Diffusion Models
- Aegis: Automated Error Generation and Identification for Multi-Agent Systems
- AesCoder: Code Aesthetics with Agentic Reward Feedback
- AetherCode: Evaluating LLMs’ Ability to Win In Premier Programming Competitions
- A Fair Bayesian Inference through Matched Gibbs Posterior
- A Fano-Style Accuracy Upper Bound for LLM Single-Pass Reasoning in Multi-Hop QA
- AFD-INSTRUCTION: A Comprehensive Antibody Instruction Dataset with Functional Annotations for LLM-Based Understanding and Design
- A Federated Generalized Expectation-Maximization Algorithm for Mixture Models with an Unknown Number of Components
- A Fictional Q&A Dataset for Studying Memorization and Knowledge Acquisition
- A Formal Controllability Toolkit for Black-Box Generative Models
- A foundation model with multi-variate parallel attention to generate neuronal activity
- A Framework for Studying AI Agent Behavior: Evidence from Consumer Choice Experiments
- AFTER: Mitigating the Object Hallucination of LVLM via Adaptive Factual-Guided Activation Editing
- A Function-Centric Graph Neural Network Approach for Predicting Electron Densities
- A General Framework for Black-Box Attacks Under Cost Asymmetry
- A Generalized Geometric Theoretical Framework of Centroid Discriminant Analysis for Linear Classification of Multi-dimensional Data
- A General Spatio-Temporal Backbone with Scalable Contextual Pattern Bank for Urban Continual Forecasting
- A Genetic Algorithm for Navigating Synthesizable Molecular Spaces
- Agent Data Protocol
- AgentFold: Long-Horizon Web Agents with Proactive Context Folding
- AgentGym-RL: An Open-Source Framework to Train LLM Agents for Long-Horizon Decision Making via Multi-Turn RL
- Agentic AI in the Wild: From Hallucinations to Reliable Autonomy
- Agentic Collaboration as an Information Bottleneck Problem
- Agentic Context Engineering: Learning Comprehensive Contexts for Self-Improving Language Models
- Agentic Jigsaw Interaction Learning for Enhancing Visual Perception and Reasoning in Vision-Language Models
- Agentic Reinforced Policy Optimization
- Agentic Reinforcement Learning with Implicit Step Rewards
- AgentMath: Empowering Mathematical Reasoning for Large Language Models via Tool-Augmented Agent
- AgentPO: Enhancing Multi-Agent Collaboration via Reinforcement Learning
- AgenTracer: Who Is Inducing Failure in the LLM Agentic Systems?
- Agents in the Wild: Safety, Security, and Beyond
- AgentSynth: Scalable Task Generation for Generalist Computer-Use Agents
- Agent-X: Evaluating Deep Multimodal Reasoning in Vision-Centric Agentic Tasks
- Agnostics: Learning to Synthesize Code in Any Programming Language with a Universal Reinforcement Learning Environment
- A Graph Meta-Network for Learning on Kolmogorov–Arnold Networks
- A Guardrail for Safety Preservation: When Safety-Sensitive Subspace Meets Harmful-Resistant Null-Space
- A Hidden Semantic Bottleneck in Conditional Embeddings of Diffusion Transformers
- A Hierarchical Circuit Symbolic Discovery Framework for Efficient Logic Optimization
- A High Quality Dataset and Reliable Evaluation for Interleaved Image-Text Generation
- AI4MAT-ICLR-2026: ICLR 2026 Workshop on AI for Accelerated Materials Design
- AI for Mechanism Design and Strategic Decision Making (AIMS)
- AI for Peace
- AI-for-Science Low-code Platform with Bayesian Adversarial Multi-Agent Framework
- AI&PDE: ICLR 2026 Workshop on AI and Partial Differential Equations
- A.I.R.: Enabling Adaptive, Iterative, and Reasoning-based Frame Selection For Video Question Answering
- AIRE-Prune: Asymptotic Impulse-Response Energy for State Pruning in State Space Models
- A Joint Diffusion Model with Pre-Trained Priors for RNA Sequence–Structure Co-Design
- A Law of Data Reconstruction for Random Features (And Beyond)
- Algorithm Generation via Creative Ideation
- Algorithmic Fairness Across Alignment Procedures and Agentic Systems
- Algorithmic Guarantees for Distilling Supervised and Offline RL Datasets
- Aligned Novel View Image and Geometry Synthesis via Cross-modal Attention Instillation
- Aligner, Diagnose Thyself: A Meta-Learning Paradigm for Fusing Intrinsic Feedback in Preference Alignment
- AlignFlow: Improving Flow-based Generative Models with Semi-Discrete Optimal Transport
- Aligning Collaborative View Recovery and Tensorial Subspace Learning via Latent Representation for Incomplete Multi-View Clustering
- Aligning Deep Implicit Preferences by Learning to Reason Defensively
- Aligning Visual Foundation Encoders to Tokenizers for Diffusion Models
- Alignment-Enhanced Integration of Connectivity and Spectral Sparse in Dynamic Sparse Training of LLM
- Alignment through Meta-Weighted Online Sampling: Bridging the Gap between Data Generation and Preference Optimization
- Alignment-Weighted DPO: A principled reasoning approach to improve alignment
- Align Once, Benefit Multilingually: Enforcing Multilingual Consistency for LLM Safety Alignment
- Align-SAM: Seeking Flatter Minima for Better Cross-Subset Alignment
- AlignSep: Temporally-Aligned Video-Queried Sound Separation with Flow Matching
- Align-Then-stEer: Adapting the Vision-Language Action Models through Unified Latent Guidance
- Align to Misalign: Automatic LLM Jailbreak with Meta-Optimized LLM Judges
- Align Your Structures: Generating Trajectories with Structure Pretraining for Molecular Dynamics
- All Code, No Thought: Language Models Struggle to Reason in Ciphered Language
- All-day Multi-scenes Lifelong Vision-and-Language Navigation with Tucker Adaptation
- All Patches Matter, More Patches Better: Enhance AI-Generated Image Detection via Panoptic Patch Learning
- All Roads Lead to Likelihood: The Value of Reinforcement Learning in Fine-Tuning
- All That Glitters Is Not Gold: Key-Secured 3D Secrets within 3D Gaussian Splatting
- ALM-MTA: Front-Door Causal Multi-Touch Attribution Method for Creator-Ecosystem Optimization
- Almost Bayesian: Dynamics of SGD Through Singular Learning Theory
- AlphaAgentEvo: Evolution-Oriented Alpha Mining via Self-Evolving Agentic Reinforcement Learning
- AlphaAlign: Incentivizing Safety Alignment with Extremely Simplified Reinforcement Learning
- AlphaBench: Benchmarking Large Language Models in Formulaic Alpha Factor Mining
- AlphaFlow: Understanding and Improving MeanFlow Models
- AlphaSAGE: Structure-Aware Alpha Mining via GFlowNets for Robust Exploration
- AlphaSteer: Learning Refusal Steering with Principled Null-Space Constraint
- Alternating Diffusion for Proximal Sampling with Zeroth Order Queries
- AMemGym: Interactive Memory Benchmarking for Assistants in Long-Horizon Conversations
- A Memory-Efficient Hierarchical Algorithm for Large-scale Optimal Transport Problems
- AMiD: Knowledge Distillation for LLMs with $\alpha$-mixture Assistant Distribution
- AMLRIS: Alignment-aware Masked Learning for Referring Image Segmentation
- Amortising Inference and Meta-Learning Priors in Neural Networks
- AMPED: Adaptive Multi-objective Projection for balancing Exploration and skill Diversification
- An Agentic Framework with LLMs for Solving Complex Vehicle Routing Problems
- Analysis of approximate linear programming solution to Markov decision problem with log barrier function
- Analytica: Soft Propositional Reasoning for Robust and Scalable LLM-Driven Analysis
- Analyzing and Evaluating Unbiased Language Model Watermark
- Analyzing the Training Dynamics of Image Restoration Transformers: A Revisit to Layer Normalization
- Anatomy-aware Representation Learning for Medical Ultrasound
- Anchored Supervised Fine-Tuning
- Anchor Frame Bridging for Coherent First-Last Frame Video Generation
- A Near-Optimal Best-of-Both-Worlds Algorithm for Federated Bandits
- An efficient, provably optimal, practical algorithm for the 0-1 loss linear classification problem
- An Efficient SE(p)-Invariant Transport Metric Driven by Polar Transport Discrepancy-based Representation
- An Empirical Study of Attention and Diversity for Adaptive Visual Token Pruning in Large Vision-Language Models
- An Ensemble Framework for Unbiased Language Model Watermarking
- AnesSuite: A Comprehensive Benchmark and Dataset Suite for Anesthesiology Reasoning in LLMs
- A New Approach to Controlling Linear Dynamical Systems
- A New Initialization to Control Gradients in Sinusoidal Neural Networks
- A New Paradigm for Genome-wide DNA Methylation Prediction Without Methylation Input
- Angle K-Means
- Animal behavioral analysis and neural encoding with transformer-based self-supervised pretraining
- Animating the Uncaptured: Humanoid Mesh Animation with Video Diffusion Models
- Anime-Ready: Controllable 3D Anime Character Generation with Body-Aligned Component-Wise Garment Modeling
- An Improved Model-free Decision-estimation Coefficient with Applications in Adversarial MDPs
- An Information-Theoretical Framework For Optimizing Experimental Design To Distinguish Probabilistic Neural Codes
- An Information-Theoretic Parameter-Free Bayesian Framework for Probing Labeled Dependency Trees from Attention Score
- Annotation-Efficient Honesty Alignment via Confidence Elicitation and Calibration
- A Noise is Worth Diffusion Guidance
- An Open-Ended Benchmark and Formal Framework for Adjuvant Research with MLLM
- An Optimal Diffusion Approach to Quadratic Rate-Distortion Problems: New Solution and Approximation Methods
- An Orthogonal Learner for Individualized Outcomes in Markov Decision Processes
- Antibody: Strengthening Defense Against Harmful Fine-Tuning for Large Language Models via Attenuating Harmful Gradient Influence
- AntigenLM: Structure-Aware DNA Language Modeling for Influenza
- Antislop: A Comprehensive Framework for Identifying and Eliminating Repetitive Patterns in Language Models
- Antithetic Noise in Diffusion Models
- AnyBCQ: Hardware Efficient Flexible Binary-Coded Quantization for Multi-Precision LLMs
- Any-Depth Alignment: Unlocking Innate Safety Alignment of LLMs to Any-Depth
- Any-Order Any-Subset AutoRegressive Model
- Any-Order Flexible Length Masked Diffusion
- Any-step Generation via N-th Order Recursive Consistent Velocity Field Estimation
- Any-Subgroup Equivariant Networks via Symmetry Breaking
- Any-to-Bokeh: Arbitrary-Subject Video Refocusing with Video Diffusion Model
- AnyTouch 2: General Optical Tactile Representation Learning For Dynamic Tactile Perception
- AnyUp: Universal Feature Upsampling
- A One-shot Framework for Directed Evolution of Antibodies
- APC-RL: Exceeding data-driven behavior priors with adaptive policy composition
- A Physics-Inspired Optimizer: Velocity Regularized Adam
- AP-OOD: Attention Pooling for Out-of- Distribution Detection
- APPLE: Toward General Active Perception via Reinforcement Learning
- A Primer on SO(3) Action Representations in Deep Reinforcement Learning
- A Probabilistic Hard Concept Bottleneck for Steerable Generative Models
- A Problem-Oriented Perspective and Anchor Verification for Code Optimization
- APT: Towards Universal Scene Graph Generation via Plug-in Adaptive Prompt Tuning
- AQER: A Scalable and Efficient Data Loader for Digital Quantum Computers
- AQuA: Toward Strategic Response Generation for Ambiguous Visual Questions
- Arbitrary Generative Video Interpolation
- Arbitrary-Order Block SignSGD for Memory-Efficient LLM Fine-Tuning
- Arbitrary-Shaped Image Generation via Spherical Neural Field Diffusion
- Architecture-Agnostic Test-Time Adaptation via Backprop-Free Embedding Alignment
- A Recovery Guarantee for Sparse Neural Networks
- Are Deep Speech Denoising Models Robust to Adversarial Noise?
- Are EEG Foundation Models Worth It? Comparative Evaluation with Traditional Decoders in Diverse BCI Tasks
- Are Global Dependencies Necessary? Scalable Time Series Forecasting via Local Cross-Variate Modeling
- A Relative Error-Based Evaluation Framework of Heterogeneous Treatment Effect Estimators
- Are LLMs Really Not Knowledgeable? Mining the Submerged Knowledge in LLMs' Memory
- A Representer Theorem for Hawkes Processes via Penalized Least Squares Minimization
- Are Reasoning LLMs Robust to Interventions on their Chain-of-Thought?
- ARES: Multimodal Adaptive Reasoning via Difficulty-Aware Token-Level Entropy Shaping
- A Resolution-Agnostic Geometric Transformer for Chromosome Modeling Using Inertial Frame
- A Revisit of Active Sequential Prediction-Powered Mean Estimation
- A Reward-Free Viewpoint on Multi-Objective Reinforcement Learning
- Are we measuring oversmoothing in graph neural networks correctly?
- ARFlow: Auto-regressive Optical Flow Estimation for Arbitrary-Length Videos via Progressive Next-Frame Forecasting
- Aria: an Agent for Retrieval and Iterative Auto-Formalization via Dependency Graph
- A Rich Knowledge Space for Scalable Deepfake Detection
- ARINBEV: Bird's-Eye View Layout Estimation with Conditional Autoregressive Model
- ARM-FM: Automated Reward Machines via Foundation Models for Compositional Reinforcement Learning
- ARMOR: Aligning Secure and Safe Large Language Models via Meticulous Reasoning
- ARMOR: High-Performance Semi-Structured Pruning via Adaptive Matrix Factorization
- ARMs: Adaptive Red-Teaming Agent against Multimodal Models with Plug-and-Play Attacks
- ARROW: An Adaptive Rollout and Routing Method for Global Weather Forecasting
- ARTDECO: Toward High-Fidelity On-the-Fly Reconstruction with Hierarchical Gaussian Structure and Feed-Forward Guidance
- Articulation in Motion: Prior-free Part Mobility Analysis for Articulated Objects By Dynamic-Static Disentanglement
- ArtUV: Artist-style UV Unwrapping
- ArtVIP: Articulated Digital Assets of Visual Realism, Modular Interaction, and Physical Fidelity for Robot Learning
- A Scalable Constant-Factor Approximation Algorithm for $W_p$ Optimal Transport
- A Scalable Distributed Framework for Multimodal GigaVoxel Image Registration
- A Scalable Inter-edge Correlation Modeling in CopulaGNN for Link Sign Prediction
- A Scene is Worth a Thousand Features: Feed-Forward Camera Localization from a Collection of Image Features
- A Schrödinger Eigenfunction Method for Long-Horizon Stochastic Optimal Control
- ASCIIEval: Benchmarking Models' Visual Perception in Text Strings via ASCII Art
- ASGuard: Activation-Scaling Guard to Mitigate Targeted Jailbreaking Attack
- A Sharp KL-Convergence Analysis for Diffusion Models under Minimal Assumptions
- ASIDE: Architectural Separation of Instructions and Data in Language Models
- A Simple "Motivation" Can Enhance Reinforcement Finetuning of Large Reasoning Models
- A Single Architecture for Representing Invariance Under Any Space Group
- ASMIL: Attention-Stabilized Multiple Instance Learning for Whole-Slide Imaging
- A Spectral-Grassmann Wasserstein metric for operator representations of dynamical systems
- Assembling the Mind's Mosaic: Towards EEG Semantic Intent Decoding
- ASSESS: A Semantic and Structural Evaluation Framework for Statement Similarity
- AssetFormer: Modular 3D Assets Generation with Autoregressive Transformer
- AssoMem: Scalable Memory QA with Multi-Signal Associative Retrieval
- AstaBench: Rigorous Benchmarking of AI Agents with a Scientific Research Suite
- A State-Transition Framework for Efficient LLM Reasoning
- A Statistical Benchmark for Diffusion Posterior Sampling Algorithms
- A Statistical Learning Perspective on Semi-dual Adversarial Neural Optimal Transport Solvers
- A Statistical Theory of Overfitting for Imbalanced Classification
- A Step to Decouple Optimization in 3DGS
- ASTGI: Adaptive Spatio-Temporal Graph Interactions for Irregular Multivariate Time Series Forecasting
- A Stitch in Time Saves Nine: Proactive Self-Refinement for Language Models
- ASTRAEA: A Token-wise Acceleration Framework for Video Diffusion Transformers
- Astra: General Interactive World Model with Autoregressive Denoising
- A Structured, Tagged, and Localized Visual Question Answering Dataset with Full Sentence Answers and Scene Graphs for Chest X-ray Images
- A Study of Posterior Stability in Time-Series Latent Diffusion
- A Study on PAVE Specification for Learnware
- Asymmetric Proximal Policy Optimization: mini-critics boost LLM reasoning
- Asymmetric Synthetic Data Update for Domain Incremental Dataset Distillation
- Asymptotic analysis of shallow and deep forgetting in replay with neural collapse
- AsyncBEV: Cross-modal flow alignment in Asynchronous 3D Object Detection
- Asynchronous Denoising Diffusion Models for Aligning Text-to-Image Generation
- Asynchronous Matching with Dynamic Sampling for Multimodal Dataset Distillation
- Asynchronous Policy Gradient Aggregation for Efficient Distributed Reinforcement Learning
- A Tale of Two Smoothness Notions: Adaptive Optimizers and Non-Euclidean Descent
- A tale of two tails: Preferred and anti-preferred natural stimuli in visual cortex
- AtC: Aggregate-then-Calibrate for Human-centered Assessment
- ATEX-CF: Attack-Informed Counterfactual Explanations for Graph Neural Networks
- ATGen: Adversarial Reinforcement Learning for Test Case Generation
- A Theoretical Analysis of Mamba’s Training Dynamics: Filtering Relevant Features for Generalization in State Space Models
- ATLAS: Adaptive Transfer Scaling Laws for Multilingual Pretraining, Finetuning, and Decoding the Curse of Multilinguality
- ATLAS: Alibaba Dataset and Benchmark for Learning-Augmented Scheduling
- ATLAS: Constraints-Aware Multi-Agent Collaboration for Real-World Travel Planning
- AtlasKV: Augmenting LLMs with Billion-Scale Knowledge Graphs in 20GB VRAM
- ATOM: A Pretrained Neural Operator for Multitask Molecular Dynamics
- Atomic HINs: Entity-Attribute Duality for Heterogeneous Graph Modeling
- ATPO: ADAPTIVE TREE POLICY OPTIMIZATION FOR MULTI-TURN MEDICAL DIALOGUE
- A-TPT: Angular Diversity Calibration Properties for Test-Time Prompt Tuning of Vision-Language Models
- A Training-Free Framework for Long Video Understanding via Video-Query-Options Similarity
- Attack-Resistant Watermarking for AIGC Image Forensics via Diffusion-based Semantic Deflection
- Attend to the Active: Structure-Aware Dynamic Attention in LLMs for Compositional Instruction Following
- Attention as a Compass: Efficient Exploration for Process-Supervised RL in Reasoning Models
- Attention Is All You Need for KV Cache in Diffusion LLMs
- Attention, Please! Revisiting Attentive Probing Through the Lens of Efficiency
- Attention Sinks and Compression Valleys in LLMs are Two Sides of the Same Coin
- Attention Smoothing Is All You Need For Unlearning
- Attributing Response to Context: A Jensen–Shannon Divergence Driven Mechanistic Study of Context Attribution in Retrieval-Augmented Generation
- Attribution-Guided Decoding
- AttriCtrl: A Generalizable Framework for Controlling Semantic Attribute Intensity in Diffusion Models
- ATTS: Asynchronous Test-Time Scaling via Conformal Prediction
- AttTok: Marrying Attribute Tokens with Generative Pre-trained Vision-Language Models towards Medical Image Understanding
- A Two-Phase Deep Learning Framework for Adaptive Time-Stepping in High-Speed Flow Modeling
- AudioTrust: Benchmarking The Multifaceted Trustworthiness of Audio Large Language Models
- AudioX: A Unified Framework for Anything-to-Audio Generation
- Auditing Black-Box LLM APIs with a Rank-Based Uniformity Test
- Augmented Radiance Field: A General Framework for Enhanced Gaussian Splatting
- AUHead: Realistic Emotional Talking Head Generation via Action Units Control
- A Unification of Discrete, Gaussian, and Simplicial Diffusion
- A Unified Federated Framework for Trajectory Data Preparation via LLMs
- A Unifying Framework for Causal Imitation Learning with Hidden Confounders
- A Unifying View of Coverage in Linear Off-policy Evaluation
- A universal compression theory: Lottery ticket hypothesis and superpolynomial scaling laws
- Aurelius: Relation Aware Text-to-Audio Generation At Scale
- Aurora: Towards Universal Generative Multimodal Time Series Forecasting
- AutoBio: A Simulation and Benchmark for Robotic Automation in Digital Biology Laboratory
- AutoCodeBench: Large Language Models are Automatic Code Benchmark Generators
- AutoCode: LLMs as Problem Setters for Competitive Programming
- AutoDA-Timeseries: Automated Data Augmentation for Time Series
- AutoDrive-R²: Incentivizing Reasoning and Self-Reflection Capacity for VLA Model in Autonomous Driving
- AutoDV: An End-to-End Deep Learning Model for High-Dimensional Data Visualization
- Autoencoding-Free Context Compression for LLMs via Contextual Semantic Anchors
- AutoEP: LLMs-Driven Automation of Hyperparameter Evolution for Metaheuristic Algorithms
- AutoFigure: Generating and Refining Publication-Ready Scientific Illustrations
- AutoFly: Vision-Language-Action Model for UAV Autonomous Navigation in the Wild
- AutoGPS: Automated Geometry Problem Solving via Multimodal Formalization and Deductive Reasoning
- AutoLibra: Agent Metric Induction from Open-Ended Human Feedback
- Automata Learning and Identification of the Support of Language Models
- Automated Formalization via Conceptual Retrieval-Augmented LLMs
- Automated Interpretability Metrics Do Not Distinguish Trained and Random Transformers
- Automated Stateful Specialization for Adaptive Agent Systems
- Automatic and Structure-Aware Sparsification of Hybrid Neural ODEs with Application to Glucose Prediction
- Automatic Dialectic Jailbreak: A Framework for Generating Effective Jailbreak Strategies
- Automatic Image-Level Morphological Trait Annotation for Organismal Images
- Automatic Stage Lighting Control: Is it a Rule-Driven Process or Generative Task?
- Automating the Refinement of Reinforcement Learning Specifications
- AutoMetrics: Approximate Human Judgments with Automatically Generated Evaluators
- Autonomous Play with Correspondence-Driven Trajectory Warping
- AutoQD: Automatic Discovery of Diverse Behaviors with Quality-Diversity Optimization
- AutoQVLA: Not All Channels Are Equal in Vision-Language-Action Model's Quantization
- Autoregressive-based Progressive Coding for Ultra-Low Bitrate Image Compression
- Autoregressive Image Generation with Randomized Parallel Decoding
- Autoregressive Visual Decoding from EEG Signals
- Auto-RT: Automatic Jailbreak Strategy Exploration for Red-Teaming Large Language Models
- AutoSP: Unlocking Long-Context LLM Training Via Compiler-Based Sequence Parallelism
- AutoTool: Automatic Scaling of Tool-Use Capabilities in RL via Decoupled Entropy Constraints
- AVERE: Improving Audiovisual Emotion Reasoning with Preference Optimization
- Avey Bidirectional Architecture
- AVoCaDO: An Audiovisual Video Captioner Driven by Temporal Orchestration
- Avoid Catastrophic Forgetting with Rank-1 Fisher from Diffusion Models
- AWM: Accurate Weight-Matrix Fingerprint for Large Language Models
- Back to Square Roots: An Optimal Bound on the Matrix Factorization Error for Multi-Epoch Differentially Private SGD
- BAH Dataset for Ambivalence/Hesitancy Recognition in Videos for Behavioural Change
- Balancing the Experts: Unlocking LoRA-MoE for GRPO via Mechanism-Aware Rewards
- BA-LoRA: Bias-Alleviating Low-Rank Adaptation to Mitigate Catastrophic Inheritance in Large Language Models
- Bandit Learning in Matching Markets Robust to Adversarial Corruptions
- Bandits with Single-Peaked Preferences and Limited Resources
- BANZ-FS: BANZSL Fingerspelling Dataset
- BAR: Refactor the Basis of Autoregressive Visual Generation
- BARREL: Boundary-Aware Reasoning for Factual and Reliable LRMs
- Barriers for Learning in an Evolving World: Mathematical Understanding of Loss of Plasticity
- BaseReward: A Strong Baseline for Multimodal Reward Model
- Batch and Sequential Unlearning for Neural Networks
- Batch Pruning by Activation Stability
- Battery Fault: A Comprehensive Dataset and Benchmark for Battery Fault Diagnosis
- Bayes Adaptive Monte Carlo Tree Search for Offline Model-based Reinforcement Learning
- Bayesian Attention Mechanism: A Probabilistic Framework for Positional Encoding and Context Length Extrapolation
- Bayesian Ensemble for Sequential Decision-Making
- Bayesian Evidence-Driven Prototype Evolution for Federated Domain Adaptation
- Bayesian Influence Functions for Hessian-Free Data Attribution
- Bayesian Neural Networks for Functional ANOVA Model
- Bayesian Parameter Shift Rules in Variational Quantum Eigensolvers
- Bayesian Post Training Enhancement of Regression Models with Calibrated Rankings
- Bayesian Robust Cooperative Multi-Agent Reinforcement Learning Against Unknown Adversaries
- Bayesian Test-Time Adaptation via Dirichlet feature projection and GMM-Driven Inference for Motor Imagery EEG Decoding
- BBQ: Boosting Quantization Entropy with Bell Box Quantization
- Be Careful When Fine-tuning On Open-Source LLMs: Your Fine-tuning Data Could Be Secretly Stolen!
- BED-LLM: Intelligent Information Gathering with LLMs and Bayesian Experimental Design
- Bee: A High-Quality Corpus and Full-Stack Suite to Unlock Advanced Fully Open MLLMs
- Behavioral Embeddings of Programs: A Quasi-Dynamic Approach for Optimization Prediction
- Behavior Learning
- Belief-Based Offline Reinforcement Learning for Delay-Robust Policy Optimization
- Benchmarking Bias Mitigation Toward Fairness Without Harm from Vision to LVLMs
- Benchmarking ECG Foundational Models: A Reality Check Across Clinical Tasks
- Benchmarking Empirical Privacy Protection for Adaptations of Large Language Models
- Benchmarking Large Vision-Language Models on Fine-Grained Image Tasks: A Comprehensive Evaluation
- Benchmarking LLM Tool-Use in the Wild
- Benchmarking Multi-Agent Reinforcement Learning in Power Grid Operations
- Benchmarking Open-ended Segmentation
- Benchmarking Overton Pluralism in LLMs
- Benchmarking Stochastic Approximation Algorithms for Fairness-Constrained Training of Deep Neural Networks
- Benefits and Limitations of Communication in Multi-Agent Reasoning
- Benefits and Pitfalls of Reinforcement Learning for Language Model Planning: A Theoretical Perspective
- BEP: A Binary Error Propagation Algorithm for Binary Neural Networks Training
- Best-of-Infinity: Asymptotic Performance of Test-Time Compute
- Best-of-Majority: Minimax-Optimal Strategy for Pass@k Inference Scaling
- Best-of-N through the Smoothing Lens: KL Divergence and Regret Analysis
- Best-of-three-worlds Analysis for Dueling Bandits with Borda Winner
- Better Bounds for the Distributed Experts Problem
- Better Learning-Augmented Spanning Tree Algorithms via Metric Forest Completion
- Better Together: Leveraging Unpaired Multimodal Data for Stronger Unimodal Models
- Beware Untrusted Simulators -- Reward-Free Backdoor Attacks in Reinforcement Learning
- Beyond Accuracy: Are Time Series Foundation Models Well-Calibrated?
- Beyond Aggregation: Guiding Clients in Heterogeneous Federated Learning
- Beyond a Million Tokens: Benchmarking and Enhancing Long-Term Memory in LLMs
- BeyondBench: Benchmark-Free Evaluation of Reasoning in Language Models
- Beyond Binary Preferences: A Principled Framework for Reward Modeling with Ordinal Feedback
- Beyond Binary Rewards: Training LMs to Reason About Their Uncertainty
- Beyond Classification Accuracy: Neural-MedBench and the Need for Deeper Reasoning Benchmarks
- Beyond DAGs: A Latent Partial Causal Model for Multimodal Learning
- Beyond Distributions: Geometric Action Control for Continuous Reinforcement Learning
- Beyond English-Centric Training: How Reinforcement Learning Improves Cross-Lingual Reasoning in LLMs
- Beyond Ensembles: Simulating All-Atom Protein Dynamics in a Learned Latent Space
- Beyond Entity Correlations: Disentangling Event Causal Puzzles in Temporal Knowledge Graphs
- Beyond Fixed: Training-Free Variable-Length Denoising for Diffusion Large Language Models
- Beyond Frequency: Scoring-Driven Debiasing for Object Detection via Blueprint-Prompted Image Synthesis
- Beyond Grid-Locked Voxels: Neural Response Functions for Continuous Brain Encoding
- Beyond Hearing: Learning Task-agnostic ExG Representations from Earphones via Physiology-informed Tokenization
- Beyond In-Domain Detection: SpikeScore for Cross-Domain Hallucination Detection
- Beyond Instance-Level Alignment: Dual-Level Optimal Transport for Audio-Text Retrieval
- Beyond Length: Quantifying Long-Range Information for Long-Context LLM Pretraining Data
- Beyond Linear Probes: Dynamic Safety Monitoring for Language Models
- Beyond Linear Processing: Dendritic Bilinear Integration in Spiking Neural Networks
- Beyond Magic Words: Sharpness-Aware Prompt Evolving for Robust Large Language Models with TARE
- Beyond Markovian Drifts: Action-Biased Geometric Walks with Memory for Personalized Summarization
- Beyond Markovian: Reflective Exploration via Bayes-Adaptive RL for LLM Reasoning
- Beyond Masks: Efficient, Flexible Diffusion Language Models via Deletion-Insertion Processes
- Beyond Match Maximization and Fairness: Retention-Objectified Two-Sided Matching
- Beyond Membership: Limitations of Add/Remove Adjacency in Differential Privacy
- Beyond Multi-Token Prediction: Pretraining LLMs with Future Summaries
- Beyond Noisy-TVs: Noise-Robust Exploration Via Learning Progress Monitoring
- Beyond Outliers: A Study of Optimizers Under Quantization
- Beyond Pairwise: Empowering LLM Alignment With (Ranked) Choice Modeling
- Beyond Pass@ 1: Self-Play with Variational Problem Synthesis Sustains RLVR
- Beyond Penalization: Diffusion-based Out-of-Distribution Detection and Selective Regularization in Offline Reinforcement Learning
- Beyond Prompt-Induced Lies: Investigating LLM Deception on Benign Prompts
- Beyond RAG vs. Long-Context: Learning Distraction-Aware Retrieval for Efficient Knowledge Grounding
- Beyond Raw Detection Scores: Markov-Informed Calibration for Boosting Machine-Generated Text Detection
- Beyond Real: Imaginary Extension of Rotary Position Embeddings for Long-Context LLMs
- Beyond RLHF and NLHF: Population-Proportional Alignment under an Axiomatic Framework
- Beyond Scattered Acceptance: Fast and Coherent Inference for DLMs via Longest Stable Prefixes
- Beyond Sequential Reranking: Reranker-Guided Search Improves Reasoning Intensive Retrieval
- Beyond Short Steps in Frank-Wolfe Algorithms
- Beyond Simple Graphs: Neural Multi-Objective Routing on Multigraphs
- Beyond Skeletons: Learning Animation Directly from Driving Videos with Same2X Training Strategy
- Beyond Softmax and Entropy: $f$-Regularized Policy Gradients with Coupled Parametrizations
- Beyond Spectra: Eigenvector Overlaps in Loss Geometry
- Beyond Speedup - Utilizing KV Cache for Sampling and Reasoning
- Beyond Static Vision: Scene Dynamic Field Unlocks Intuitive Physics Understanding in Multi-modal Large Language Models
- Beyond Structure: Invariant Crystal Property Prediction with Pseudo-Particle Ray Diffraction
- Beyond Student: An Asymmetric Network for Neural Network Inheritance
- Beyond Text-Only: Towards Multimodal Table Retrieval in Open-World
- Beyond Text-to-Image: Liberating Generation with a Unified Discrete Diffusion Model
- Beyond the Heatmap: A Rigorous Evaluation of Component Impact in MCTS-Based TSP Solvers
- Beyond the Known: An Unknown-Aware Large Language Model for Open-Set Text Classification
- Beyond Uniformity: Regularizing Implicit Neural Representations through a Lipschitz Lens
- Beyond Uniformity: Sample and Frequency Meta Weighting for Post-Training Quantization of Diffusion Models
- Beyond URLs: Metadata Diversity and Position for Efficient LLM Pretraining
- Beyond Visual Reconstruction Quality: Object Perception-aware 3D Gaussian Splatting for Autonomous Driving
- BézierFlow: Learning Bézier Stochastic Interpolant Schedulers for Few-Step Generation
- BFM-Zero: A Promptable Behavioral Foundation Model for Humanoid Control Using Unsupervised Reinforcement Learning
- BiasBusters: Uncovering and Mitigating Tool Selection Bias in Large Language Models
- BiasFreeBench: a Benchmark for Mitigating Bias in Large Language Model Responses
- BiasScope: Towards Automated Detection of Bias in LLM-as-a-Judge Evaluation
- Bias Similarity Measurement: A Black-Box Audit of Fairness Across LLMs
- Bi-Criteria Metric Distortion
- BideDPO: Conditional Image Generation with Simultaneous Text and Condition Alignment
- Bi-directional Bias Attribution: Debiasing Large Language Models without Modifying Prompts
- Bidirectional Predictive Coding
- BigMac3D: A Big Macaque Motion and Animation Dataset Bridging Image and 3D Pose Representations
- Bilateral Information-aware Test-time Adaptation for Vision-Language Models
- Bilevel Optimization with Lower-Level Uniform Convexity: Theory and Algorithm
- Bilinear relational structure fixes reversal curse and enables consistent model editing
- Bi-Lipschitz Autoencoder With Injectivity Guarantee
- Bi-LoRA: Efficient Sharpness-Aware Minimization for Fine-Tuning Large-Scale Models
- BindWeave: Subject-Consistent Video Generation via Cross-Modal Integration
- Binomial Gradient-Based Meta-Learning for Enhanced Meta-Gradient Estimation
- BioBO: Biology-informed Bayesian Optimization for Perturbation Design
- BioCAP: Exploiting Synthetic Captions Beyond Labels in Biological Foundation Models
- Biologically Plausible Learning via Bidirectional Spike-Based Distillation
- BioMD: All-atom Generative Model for Biomolecular Dynamics Simulation
- BioTamperNet: Affinity-Guided State-Space Model Detecting Tampered Biomedical Images
- BioX-Bridge: Model Bridging for Unsupervised Cross-Modal Knowledge Transfer across Biosignals
- Birch SGD: A Tree Graph Framework for Local and Asynchronous SGD Methods
- BIRD: Behavior Induction via Representation-structure Distillation
- BIRD-INTERACT: Re-imagining Text-to-SQL Evaluation via Lens of Dynamic Interactions
- Bird's-eye-view Informed Reasoning Driver
- Black-Box Privacy Attacks on Shared Representations in Multitask Learning
- BLADE: Block-Sparse Attention Meets Step Distillation for Efficient Video Generation
- Block Recurrent Dynamics in Vision Transformers
- Block-sample MAC-Bayes generalization bounds
- Block-wise Adaptive Caching for Accelerating Diffusion Policy
- BoGrape: Bayesian optimization over graphs with shortest-path encoded
- BOLT: Decision‑Aligned Distillation and Budget-Aware Routing for Constrained Multimodal QA on Robots
- Bongard-RWR+: Real-World Representations of Fine-Grained Concepts in Bongard Problems
- Boolean Satisfiability via Imitation Learning
- Boomerang Distillation Enables Zero-Shot Model Size Interpolation
- Boosted Trees on a Diet: Compact Models for Resource-Constrained Devices
- Boosting Agentic Reasoning in LLM Judges via Tool-Integrated Reinforcement Learning
- Boosting for Predictive Sufficiency
- Boosting Medical Visual Understanding From Multi-Granular Language Learning
- Boosting Multi-Domain Reasoning of LLMs via Curvature-Guided Policy Optimization
- Boosting Open Set Recognition Performance through Modulated Representation Learning
- Bootstrapping MLLM for Weakly‑Supervised Class‑Agnostic Object Counting
- BoRA: Towards More Expressive Low-Rank Adaptation with Block Diversity
- BoreaRL: A Multi-Objective Reinforcement Learning Environment for Climate-Adaptive Boreal Forest Management
- BOTS: A Unified Framework for Bayesian Online Task Selection in LLM Reinforcement Finetuning
- Bottlenecked Transformers: Periodic KV Cache Consolidation for Generalised Reasoning
- Bound by semanticity: universal laws governing the generalization-identification tradeoff
- Bounds of Chain-of-Thought Robustness: Reasoning Steps, Embed Norms, and Beyond
- Bradley-Terry and Multi-Objective Reward Modeling Are Complementary
- Brain-IT: Image Reconstruction from fMRI via Brain-Interaction Transformer
- Brain-Semantoks: Learning Semantic Tokens of Brain Dynamics with a Self-Distilled Foundation Model
- Branch and Bound Search for Exact MAP Inference in Credal Networks
- Branched Schrödinger Bridge Matching
- BranchGRPO: Stable and Efficient GRPO with Structured Branching in Diffusion Models
- Breaking Agent Backbones: Evaluating the Security of Backbone LLMs in AI Agents
- Breaking and Fixing Defenses Against Control Flow Hijacking in Multi-Agent Systems
- Breaking Barriers: Do Reinforcement Fine-tuning Gains Transfer To Unseen Domains?
- Breaking Gradient Temporal Collinearity for Robust Spiking Neural Networks
- Breaking Safety Paradox with Feasible Dual Policy Iteration
- Breaking Scale Anchoring: Frequency Representation Learning for Accurate High-Resolution Inference from Low-Resolution Training
- Breaking the Correlation Plateau: On the Optimization and Capacity Limits of Attention-Based Regressors
- Breaking the SFT Plateau: Multimodal Structured Reinforcement Learning for Chart-to-Code Generation
- Breaking the Total Variance Barrier: Sharp Sample Complexity for Linear Heteroscedastic Bandits with Fixed Action Set
- Break the Trade-off Between Watermark Strength and Speculative Sampling Efficiency for Language Models
- BRIDGE: Bi-level Reinforcement Learning for Dynamic Group Structure in Coalition Formation Games
- BridgeDrive: Diffusion Bridge Policy for Closed-Loop Trajectory Planning in Autonomous Driving
- Bridging Degradation Discrimination and Generation for Universal Image Restoration
- Bridging Draft Policy Misalignment: Group Tree Optimization for Speculative Decoding
- Bridging Explainability and Embeddings: BEE Aware of Spuriousness
- Bridging Fairness and Explainability: Can Input-Based Explanations Promote Fairness in Hate Speech Detection?
- Bridging Generalization Gap of Heterogeneous Federated Clients Using Generative Models
- Bridging Input Feature Spaces Towards Graph Foundation Models
- Bridging Kolmogorov Complexity and Deep Learning: Asymptotically Optimal Description Length Objectives for Transformers
- Bridging ML and algorithms: comparison of hyperbolic embeddings
- Bridging Past and Future: Distribution-Aware Alignment for Time Series Forecasting
- Bridging Piano Transcription and Rendering via Disentangled Score Content and Style
- Bridging Radiology and Pathology Foundation Models via Concept-Based Multimodal Co-Adaptation
- Bridging Successor Measure and Online Policy Learning with Flow Matching-Based Representations
- Bridging the Distribution Gap to Harness Pretrained Diffusion Priors for Super-Resolution
- Bridging the Gap Between Promise and Performance for FP4 Quantization
- Bridging the performance-gap between target-free and target-based reinforcement learning
- Bringing Stability to Diffusion: Decomposing and Reducing Variance of Training Masked Diffusion Models
- BrowseNet: Knowledge Graph-Based Associative Memory for Contextual Information Retrieval
- BTZSC: A Benchmark for Zero-Shot Text Classification Across Cross-Encoders, Embedding Models, and Rerankers
- Buckingham $\pi$-Invariant Test‑Time Projection for Robust PDE Surrogate Modeling
- Buffer Matters: Unleashing the Power of Off-Policy Reinforcement Learning in Large Language Model Reasoning
- Building a Foundational Guardrail for General Agentic Systems via Synthetic Data
- Building spatial world models from sparse transitional episodic memories
- Bures Generalized Category Discovery
- Bures-Wasserstein Flow Matching for Graph Generation
- BWCache: Accelerating Video Diffusion Transformers through Block-Wise Caching
- ByteFlow: Language Modeling through Adaptive Byte Compression without a Tokenizer
- Byzantine-Robust Federated Learning with Learnable Aggregation Weights
- Cache-to-Cache: Direct Semantic Communication Between Large Language Models
- Cache What Lasts: Token Retention for Memory-Bounded KV Cache in LLMs
- Cactus: Accelerating Auto-Regressive Decoding with Constrained Acceptance Speculative Sampling
- cadrille: Multi-modal CAD Reconstruction with Reinforcement Learning
- CAGE: A Framework for Culturally Adaptive Red-Teaming Benchmark Generation
- Calibrated Information Bottleneck for Trusted Multi-modal Clustering
- Calibrating Verbalized Confidence with Self-Generated Distractors
- CALM: Co-evolution of Algorithms and Language Model for Automatic Heuristic Design
- Cancer-Myth: Evaluating Large Language Models on Patient Questions with False Presuppositions
- Can Language Models Discover Scaling Laws?
- Can Large Language Models Match the Conclusions of Systematic Reviews?
- Can LLMs Move Beyond Short Exchanges to Realistic Therapy Conversations?
- Can LLMs Reason Soundly in Law? Auditing Inference Patterns for Legal Judgment
- Can LLMs Refuse Questions They Do Not Know? Measuring Knowledge-Aware Refusal in Factual Tasks
- Cannistraci-Hebb Training on Ultra-Sparse Spiking Neural Networks
- Can SAEs reveal and mitigate racial biases of LLMs in healthcare?
- Can Small Training Runs Reliably Guide Data Curation? Rethinking Proxy-Model Practice
- Can Speech LLMs Think while Listening?
- Can Transformers Really Do It All? On the Compatibility of Inductive Biases Across Tasks
- Can Vision-Language Models Answer Face to Face Questions in the Real-World?
- Can Vision–Language Models Assess Graphic Design Aesthetics? A Benchmark, Evaluation, and Dataset Perspective.
- Can we generate portable representations for clinical time series data using LLMs?
- Can You Hear Me Now? A Benchmark for Long-Range Graph Propagation
- Capability-Based Scaling Laws for LLM-Based Red-Teaming
- Capacity-Aware Inference: Mitigating the Straggler Effect in Mixture of Experts
- CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning
- CAPSUL: A Comprehensive Human Protein Benchmark for Subcellular Localization
- Captain Cinema: Towards Short Movie Generation
- Capturing Visual Environment Structure Correlates with Control Performance
- CardioComposer: Leveraging Differentiable Geometry for Compositional Control of Anatomical Diffusion Models
- CARD: Towards Conditional Design of Multi-agent Topological Structures
- CaReBench: A Fine-grained Benchmark for Video Captioning and Retrieval
- CaRe-BN: Precise Moving Statistics for Stabilizing Spiking Neural Networks in Reinforcement Learning
- CARE: Covariance-Aware and Rank-Enhanced Decomposition for Enabling Multi-Head Latent Attention
- CARE: Towards Clinical Accountability in Multi-Modal Medical Reasoning with an Evidence-Grounded Agentic Framework
- CARL: Camera-Agnostic Representation Learning for Spectral Image Analysis
- CAR-LoRA: Training Compression-Aware and Robust LoRA Adapters for Evolving LLMs
- CARL: Preserving Causal Structure in Representation Learning
- CARPRT: Class-Aware Zero-Shot Prompt Reweighting for Vision-Language Model
- Carré du champ flow matching: better quality-generalisation tradeoff in generative models
- Cartridges: Lightweight and general-purpose long context representations via self-study
- Cascadia: An Efficient Cascade Serving System for Large Language Models
- CASteer: Cross-Attention Steering for Controllable Concept Erasure
- Catalog-Native LLM: Speaking Item-ID dialect with Less Entanglement for Recommendation
- CatalystBench: A Comprehensive Multi-Task Benchmark for Advancing Language Models in Catalysis Science
- Catch, Adapt, and Operate: Monitoring ML Models Under Drift
- Catching the Details: Self-Distilled RoI Predictors for Fine-Grained MLLM Perception
- Cat-PO: Cross-modal Adaptive Token-rewards for Preference Optimization in Truthful Multimodal LLMs
- CaTs and DAGs: Integrating Directed Acyclic Graphs with Transformers for Causally Constrained Predictions
- CaTS: Calibrated Test-Time Scaling for Efficient LLM Inference
- CauKer: Classification Time Series Foundation Models Can Be Pretrained on Synthetic Data
- Causal Discovery in the Wild: A Voting-Theoretic Ensemble Approach
- Causal Discovery via Quantile Partial Effect
- Causal Interpretation of Neural Network Computations with Contribution Decomposition (CODEC)
- Causality ≠ Invariance: Function vs Concept Vectors in LLMs
- Causally Robust Preference Learning with Reasons
- Causal-Steer: Disentangled Continuous Style Control without Parallel Corpora
- Causal Structure Learning in Hawkes Processes with Complex Latent Confounder Networks
- Cautious Optimizers: Improving Training with One Line of Code
- Cautious Weight Decay
- CDBridge: A Cross-omics Post-training Bridge Strategy for Context-aware Biological Modeling
- CDE: Curiosity-Driven Exploration for Efficient Reinforcement Learning in Large Language Models
- CellAgent: LLM-Driven Multi-Agent Framework for Natural Language-Based Single-Cell Analysis
- CellDuality: Unlocking Biological Reasoning in LLMs with Self-Supervised RLVR
- CE-Nav: Flow-Guided Reinforcement Refinement for Cross-Embodiment Local Navigation
- CerebraGloss: Instruction-Tuning a Large Vision-Language Model for Fine-Grained Clinical EEG Interpretation
- Certified Evaluation of Model-Level Explanations for Graph Neural Networks
- CERTIFIED VS. EMPIRICAL ADVERSARIAL ROBUSTNESS VIA HYBRID CONVOLUTIONS WITH ATTENTION STOCHASTICITY
- Certifying the Full YOLO Pipeline: A Probabilistic Verification Approach
- C-Evolve: Consensus-based Evolution for Prompt Groups
- CFO: Learning Continuous-Time PDE Dynamics via Flow-Matched Neural Operators
- CFT-RAG: An Entity Tree Based Retrieval Augmented Generation Algorithm With Cuckoo Filter
- CGSA: Class-Guided Slot-Aware Adaptation for Source-Free Object Detections
- ChainGPT: Dual-Reasoning Model with Recurrent Depth and Multi-Rank State Updates
- ChainMPQ: Interleaved Text-Image Reasoning Chains for Mitigating Relation Hallucinations
- Chain-of-Context Learning: Dynamic Constraint Understanding for Multi-Task VRPs
- CHAMMI-75: pre-training multi-channel models with heterogeneous microscopy images
- Change Point Localization and Inference in Dynamic Multilayer Networks
- Channel-Aware Mixed-Precision Quantization for Efficient Long-Context Inference
- Characteristic Root Analysis and Regularization for Linear Time Series Forecasting
- Characterization and Learning of Causal Graphs with Latent Confounders and Post-treatment Selection from Interventional Data
- Characterizing and Mitigating Reasoning Drift in Large Language Models
- Characterizing and Optimizing the Spatial Kernel of Multi Resolution Hash Encodings
- Characterizing Deep Research: A Benchmark and Formal Definition
- Characterizing Human Semantic Navigation in Concept Production as Trajectories in Embedding Space
- Characterizing Pattern Matching and Its Limits on Compositional Task Structures
- Characterizing the Discrete Geometry of ReLU Networks
- Chart Deep Research in LVLMs via Parallel Relative Policy Optimization
- ChartGalaxy: A Dataset for Infographic Chart Understanding and Generation
- Charts Are Not Images: On the Challenges of Scientific Chart Editing
- Chasing the Tail: Effective Rubric-based Reward Modeling for Large Language Model Post-Training
- ChatInject: Abusing Chat Templates for Prompt Injection in LLM Agents
- CheckMate! Watermarking Graph Diffusion Models in Polynomial Time
- ChemEval: A Multi-level and Fine-grained Chemical Capability Evaluation for Large Language Models
- Chessformer: A Unified Architecture for Chess Modeling
- Children's Intelligence Tests Pose Challenges for MLLMs? KidGym: A 2D Grid-Based Reasoning Benchmark for MLLMs
- ChinaTravel: An Open-Ended Travel Planning Benchmark with Compositional Constraint Validation for Language Agents
- Choices Speak Louder than Questions
- CHROMA: Consistent Harmonization of Multi-View Appearance via Bilateral Grid Prediction
- ChronoEdit: Towards Temporal Reasoning for In-Context Image Editing and World Simulation
- ChronoPlay: A Framework for Modeling Dual Dynamics and Authenticity in Game RAG Benchmarks
- Chunking the Critic: A Transformer-based Soft Actor-Critic with N-Step Returns
- CIAR: Interval-based Collaborative Decoding for Image Generation Acceleration
- CIMemories: A Compositional Benchmark For Contextual Integrity In LLMs
- CineTrans: Learning to Generate Videos with Cinematic Transitions via Masked Diffusion Models
- Circuit Insights: Towards Interpretability Beyond Activations
- CircuitNet 3.0: A Multi-Modal Dataset with Task-Oriented Augmentation for AI-Driven Circuit Design
- CircuitSense: A Hierarchical Circuit System Benchmark Bridging Visual Comprehension and Symbolic Reasoning in Engineering Design Process
- Cite Pretrain: Retrieval-Free Knowledge Attribution for Large Language Models
- CityLens: Evaluating Large Vision-Language Models for Urban Socioeconomic Sensing
- CitySeeker: How Do VLMs Explore Embodied Urban Navigation with Implicit Human Needs?
- CLAP: Unsupervised 3D Representation Learning for Fusion 3D Perception via Curvature Sampling and Prototype Learning
- CLARC: C/C++ Benchmark for Robust Code Search
- ClarifyVC: Clarifying Ambiguous Commands in Vehicle Control with a Hybrid Data Augmentation Pipeline
- CLASH: Evaluating Language Models on Judging High-Stakes Dilemmas from Multiple Perspectives
- CLAUSE: Agentic Neuro-Symbolic Knowledge Graph Reasoning via Dynamic Learnable Context Engineering
- CL-DPS: A Contrastive Learning Approach to Blind Nonlinear Inverse Problem Solving via Diffusion Posterior Sampling
- CLEAR: Calibrated Learning for Epistemic and Aleatoric Risk
- CLIP Behaves like a Bag-of-Words Model Cross-modally but not Uni-modally
- CLIP-FMoE: Scalable CLIP via Fused Mixture-of-Experts with Enforced Specialization
- Clipped Gradient Methods for Nonsmooth Convex Optimization under Heavy-Tailed Noise: A Refined Analysis
- CLoD-GS: Continuous Level-of-Detail via 3D Gaussian Splatting
- CloDS: Visual-Only Unsupervised Cloth Dynamics Learning in Unknown Conditions
- Closed-form $\ell_r$ norm scaling with data for overparameterized linear regression and diagonal linear networks under $\ell_p$ bias
- Closing the Gap Between Text and Speech Understanding in LLMs
- Closing the Modality Gap Aligns Group-Wise Semantics
- Closing the Safety Gap: Surgical Concept Erasure in Visual Autoregressive Models
- CLUE: Conflict-guided Localization for LLM Unlearning Framework
- Clustering by Denoising: Latent plug-and-play diffusion for single-cell embeddings
- CLUTCH: Contextualized Language model for Unlocking Text-Conditioned Hand motion modelling in the wild
- CMPhysBench: A Benchmark for Evaluating Large Language Models in Condensed Matter Physics
- CMT-Benchmark: A Benchmark for Condensed Matter Theory Built by Expert Researchers
- CMT: Mid-Training for Efficient Learning of Consistency, Mean Flow, and Flow-Map Models
- CO3: CONTRASTING CONCEPTS COMPOSE BETTER
- CoAct-1: Computer-using Multi-agent System with Coding Actions
- Coarse-to-Fine Learning of Dynamic Causal Structures
- CoCoDiff: Correspondence-Consistent Diffusion Model for Fine-grained Style Transfer
- CoDA: Agentic Systems for Collaborative Data Visualization
- CoDA: From Text-to-Image Diffusion Models to Truly Training-Free Dataset Distillation
- Code2Bench: Scaling Source and Rigor for Dynamic Benchmark Construction
- CodeBrain: Towards Decoupled Interpretability and Multi-Scale Architecture for EEG Foundation Model
- Code Driven Planning with Domain-Adaptive Selector
- CodeGenGuard: A Robust Watermark for Code Generation Models
- CodeQuant: Unified Clustering and Quantization for Enhanced Outlier Smoothing in Low-Precision Mixture-of-Experts
- CodeSense: a Real-World Benchmark and Dataset for Code Semantic Reasoning
- Code World Models for General Game Playing
- Codified Finite-state Machines for Role-playing
- CoDi: Subject-Consistent and Pose-Diverse Text-to-Image Generation
- CoEmoGen: Towards Semantically-Coherent and Scalable Emotional Image Content Generation
- CoFact: Conformal Factuality Guarantees for Language Models under Distribution Shift
- CogFlow: Bridging Perception and Reasoning through Knowledge Internalization for Visual Mathematical Problem Solving
- CogMoE: Signal-Quality–Guided Multimodal MoE for Cognitive Load Prediction
- CogniLoad: A Synthetic Natural Language Reasoning Benchmark With Tunable Length, Intrinsic Difficulty, and Distractor Density
- CogniMap3D: Cognitive 3D Mapping and Rapid Retrieval
- CoLA: Co-Calibrated Logit Adjustment for Long-Tailed Semi-Supervised Learning
- COLD-Steer: Steering Large Language Models via In-Context One-step Learning Dynamics
- Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent Collaboration
- CollectiveKV: Decoupling and Sharing Collaborative Information in Sequential Recommendation
- CoLLMLight: Cooperative Large Language Model Agents for Network-Wide Traffic Signal Control
- Color3D: Controllable and Consistent 3D Colorization with Personalized Colorizer
- COMAL: A Convergent Meta-Algorithm for Aligning LLMs with General Preferences
- CoMAS: Co-Evolving Multi-Agent Systems via Interaction Rewards
- Combination-of-Experts with Knowledge Sharing for Cross-Task Vehicle Routing Problems
- Combinatorial Bandit Bayesian Optimization for Tensor Outputs
- Combinatorial Rising Bandits
- CoMem: Compositional Concept-Graph Memory for Continual Vision–Language Learning
- ComGS: Efficient 3D Object-Scene Composition via Surface Octahedral Probes
- COMI: Coarse-to-fine Context Compression via Marginal Information Gain
- CoMind: Towards Community-Driven Agents for Machine Learning Engineering
- Command-V: Training-Free Representation Finetuning Transfer
- Common Corpus: The Largest Collection of Ethical Data for LLM Pre-Training
- Communication-Efficient Decentralized Optimization via Double-Communication Symmetric ADMM
- COMPACT: COMPositional Atomic-to-Complex Visual Capability Tuning
- Compactness and Consistency: A Conjoint Framework for Deep Graph Clustering
- Comparing AI Agents to Cybersecurity Professionals in Real-World Penetration Testing
- Comparing the learning dynamics of in-context learning and fine-tuning in language models
- CompassNav: Steering From Path Imitation to Decision Understanding In Navigation
- COMPASS: Robust Feature Conformal Prediction for Medical Segmentation Metrics
- ComPhy: Composing Physical Models with end-to-end Alignment
- Complementing Self-Consistency with Cross-Model Disagreement for Uncertainty Quantification
- Completing Missing Annotation: Multi-Agent Debate for Accurate and Scalable Relevant Assessment for IR Benchmarks
- Complexity Analysis of Normalizing Constant Estimation: from Jarzynski Equality to Annealed Importance Sampling and beyond
- Complexity- and Statistics-Guided Anomaly Detection in Time Series Foundation Models
- CompMarkGS: Robust Watermarking for Compressed 3D Gaussian Splatting
- CompoDistill: Attention Distillation for Compositional Reasoning in Multimodal LLMs
- Composable Sparse Subnetworks via Maximum-Entropy Principle
- Compose and Fuse: Revisiting the Foundational Bottlenecks in Multimodal Reasoning
- Composer: A Search Framework for Hybrid Neural Architecture Design
- Compose Your Policies! Improving Diffusion-based or Flow-based Robot Policies via Test-time Distribution-level Composition
- Composite Optimization with Error Feedback: the Dual Averaging Approach
- Compositional amortized inference for large-scale hierarchical Bayesian models
- Compositional-ARC: Assessing Systematic Generalization in Abstract Spatial Reasoning
- Compositional Diffusion with Guided search for Long-Horizon Planning
- Compositional Generalization from Learned Skills via CoT Training: A Theoretical and Structural Analysis for Reasoning
- Compositional Generalization through Gradient Search in Nonparametric Latent Space
- Compositional Neuro-Symbolic Concepts in Neural Activities
- Compositional Visual Planning via Inference-Time Diffusion Scaling
- Composition-Grounded Instruction Synthesis for Visual Reasoning
- Composition of Memory Experts for Diffusion World Models
- Computational Bottlenecks for Denoising Diffusions
- Compute-Optimal Quantization-Aware Training
- Computer Agent Arena: Toward Human-Centric Evaluation and Analysis of Computer-Use Agents
- ComputerRL: Scaling End-to-End Online Reinforcement Learning for Computer Use Agents
- Computing Equilibrium beyond Unilateral Deviation
- CoNavBench: Collaborative Long-Horizon Vision-Language Navigation Benchmark
- Concept-Aware Privacy Mechanisms for Defending Embedding Inversion Attacks
- Concept-based Adversarial Attack: a Probabilistic Perspective
- Concept Insertion Success over Time in Diffusion Models through Prompt-Conditioned Interventions
- Concepts' Information Bottleneck Models
- Concept-TRAK: Understanding how diffusion models learn concepts through concept attribution
- CONCUR: A Framework for Continual Constrained and Unconstrained Routing
- Conditional Advantage Estimation for Reinforcement Learning in Large Reasoning Models
- Conditional Independent Component Analysis For Estimating Causal Structure with Latent Variables
- Conditionally Whitened Generative Models for Probabilistic Time Series Forecasting
- Conditioned Initialization for Attention
- Condition Errors Refinement in Autoregressive Image Generation with Diffusion Loss
- Condition Matters in Full-head 3D GANs
- ConfHit: Conformal Generative Design via Nested Testing
- Confident and Adaptive Generative Speech Recognition via Conformal Risk Control
- Confident Block Diagonal Structure-Aware Invariable Graph Completion for Incomplete Multi-view Clustering
- Conformalized Decision Risk Assessment
- Conformalized Hierarchical Calibration for Uncertainty-Aware Adaptive Hashing
- Conformalized Survival Counterfactuals Prediction for General Right-Censored Data
- Conformal Prediction for Long-Tailed Classification
- Conformal Prediction with Corrupted Labels: Uncertain Imputation and Robust Re-weighting
- Conformal Robustness Control: A New Strategy for Robust Decision
- Conjuring Semantic Similarity
- ConRep4CO: Contrastive Representation Learning of Combinatorial Optimization Instances across Types
- CONSIGN: Conformal Segmentation Informed by Spatial Groupings via Decomposition
- ConsisDrive: Identity-Preserving Driving World Models for Video Generation by Instance Mask
- Consis-GCPO: Consistency-Preserving Group Causal Preference Optimization for Vision Customization
- Consistency-Driven Calibration and Matching for Few-Shot Class Incremental Learning
- Consistency Geodesic Bridge: Image Restoration with Pretrained Diffusion Models
- Consistent Low-Rank Approximation
- Consistent Noisy Latent Rewards for Trajectory Preference Optimization in Diffusion Models
- Consistent Text-to-Image Generation via Scene De-Contextualization
- Consolidating Reinforcement Learning for Multimodal Discrete Diffusion Models
- Constant Degree Matrix-Driven Incomplete Multi-View Clustering via Connectivity-Structure and Embedding Tensor Learning
- Constantly Improving Image Models Need Constantly Improving Benchmarks
- Constitutional Classifiers++: Production-Grade Defenses against Universal Jailbreaks
- Constrained Decoding of Diffusion LLMs with Context-Free Grammars
- Constrained Diffusion for Protein Design with Hard Structural Constraints
- Constraint-guided Hardware-aware NAS through Gradient Modification
- Constraint Matters: Multi-Modal Representation for Reducing Mixed-Integer Linear programming
- Constructive Distortion: Improving MLLMs with Attention-Guided Image Warping
- Contact-guided Real2Sim from Monocular Video with Planar Scene Primitives
- Contact Wasserstein Geodesics for Non-Conservative Schrödinger Bridges
- Contamination Detection for VLMs Using Multi‑Modal Semantic Perturbations
- Content-Aware Mamba for Learned Image Compression
- Context and Diversity Matter: The Emergence of In-Context Learning in World Models
- ContextBench: Modifying Contexts for Targeted Latent Activation and Behaviour Elicitation
- ContextGen: Contextual Layout Anchoring for Identity-Consistent Multi-Instance Generation
- ContextIF: Enhancing Instruction-Following through Context Reward
- Context Learning for Multi-Agent Discussion
- ContextNav: Towards Agentic Multimodal In-Context Learning
- Context parroting: A simple but tough-to-beat baseline for foundation models in scientific machine learning
- ContextPRM: Leveraging Contextual Coherence for multi-domain Test-Time Scaling
- Context Tokens are Anchors: Understanding the Repetition Curse in Diffusion MLLMs from an Information Flow Perspective
- Contextual and Seasonal LSTMs for Time Series Anomaly Detection
- Contextual Causal Bayesian Optimisation
- Contextual Similarity Distillation: Ensemble Uncertainties with a Single Model
- Continual Low-Rank Adapters for LLM-based Generative Recommender Systems
- Continual Unlearning for Text-to-Image Diffusion Models: A Regularization Perspective
- Continuous Audio Language Models
- Continuous Chain of Thought: Parallel Exploration and Reasoning through a Theoretical Lens
- Continuously Augmented Discrete Diffusion model for Categorical Generative Modeling
- Continuous multinomial logistic regression for neural decoding
- Continuous Space-Time Video Super-Resolution with 3D Fourier Fields
- Continuous-Time Value Iteration for Multi-Agent Reinforcement Learning
- Continuum Transformers Perform In-Context Learning by Operator Gradient Descent
- Contractive Diffusion Policies: Robust Action Diffusion via Contractive Score-Based Sampling with Differential Equations
- Contrastive Diffusion Guidance for Spatial Inverse Problems
- Contrastive Predictive Coding Done Right for Mutual Information Estimation
- Controllable diffusion-based generation for multi-channel biological data
- Controllable Exploration in Hybrid-Policy RLVR for Multi-Modal Reasoning
- Controllable First-Frame-Guided Video Editing via Mask-Aware LoRA Fine-Tuning
- Controllable Logical Hypothesis Generation for Abductive Reasoning in Knowledge Graphs
- Controllable Sequence Editing for Biological and Clinical Trajectories
- Controllable Video Generation with Provable Disentanglement
- Controlling Repetition in Protein Language Models
- Control Tax: The Price of Keeping AI in Check
- Converge Faster, Talk Less: Hessian-Informed Federated Zeroth-Order Optimization
- Convergence Analysis of Tsetlin Machines for Basic Boolean Operators under Noise-Free and Noisy Training Conditions
- Convergence Dynamics of Over-Parameterized Score Matching for a Single Gaussian
- Convergence of Actor-Critic gradient flow for entropy regularised MDPs in general spaces
- Convergence of Muon with Newton-Schulz
- Convergence of Regret Matching in Potential Games and Constrained Optimization
- Convergent Differential Privacy Analysis for General Federated Learning
- Convex Dominance in Deep Learning: A Scaling Law of Loss and Learning Rate
- Convex Efficient Coding
- ConvRec-R1: Training LLM-based Conversational Recommender Systems with Reinforcement Learning
- ConvT3: Structured State Kernels for Convolutional State Space Models
- Co-occurring Associated REtained concepts in Diffusion Unlearning
- Cooperative Sheaf Neural Networks
- CooperTrim: Adaptive Data Selection for Uncertainty-Aware Cooperative Perception
- CoPRS: Learning Positional Prior from Chain-of-Thought for Reasoning Segmentation
- Copy-Paste to Mitigate Large Language Model Hallucinations
- CoRA: Boosting Time Series Foundation Models for Multivariate Forecasting through Correlation-aware Adapter
- CORDS - Continuous Representations of Discrete Structures
- CORE: Concept-Oriented Reinforcement for Bridging the Definition–Application Gap in Mathematical Reasoning
- Co-rewarding: Stable Self-supervised RL for Eliciting Reasoning in Large Language Models
- Corner Gradient Descent
- Correlated Policy Optimization in Multi-Agent Subteams
- Correlations in the Data Lead to Semantically Rich Feature Geometry Under Superposition
- Cortical Policy: A Dual-Stream View Transformer for Robotic Manipulation
- CortiLife: A Unified Framework for Cortical Representation Learning across the Lifespan
- COSA: Context-aware Output-Space Adapter for Test-Time Adaptation in Time Series Forecasting
- COSMO-INR: Complex Sinusoidal Modulation for Implicit Neural Representations
- COSMOS: A Hybrid Adaptive Optimizer for Efficient Training of Large Language Models
- Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning
- Cost-Aware Dynamic Tree Construction for Efficient Large Language Model Inference
- Cost-of-Pass: An Economic Framework for Evaluating Language Models
- CoT-Evo: Evolutionary Distillation of Chain-of-Thought for Scientific Reasoning
- CoT-RVS: Zero-Shot Chain-of-Thought Reasoning Segmentation for Videos
- CoT Vectors: Transferring and Probing the Reasoning Mechanisms of LLMs
- CounselBench: A Large-Scale Expert Evaluation and Adversarial Benchmarking of Large Language Models in Mental Health Question Answering
- Count Bridges enable Modeling and Deconvolving Transcriptomics
- Count Counts: Motivating Exploration in LLM Reasoning with Count-based Intrinsic Rewards
- Counterfactual Explanations on Robust Perceptual Geodesics
- Counterfactual LLM-based Framework for Measuring Rhetorical Style
- Counterfactual Reasoning for Retrieval-Augmented Generation
- Counterfactual Structural Causal Bandits
- Coupled Transformer Autoencoder for Disentangling Multi-Region Neural Latent Dynamics
- Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss
- Covariate-Guided Clusterwise Linear Regression for Generalization to Unseen Data
- CP-Agent: Context‑Aware Multimodal Reasoning for Cellular Morphological Profiling under Chemical Perturbations
- CPiRi: Channel Permutation-Invariant Relational Interaction for Multivariate Time Series Forecasting
- CPQS-Tuning: A Model Self-Perception-Based Data Filtering Algorithm for Efficient Instruction Fine-Tuning
- CreatiDesign: A Unified Multi-Conditional Diffusion Transformer for Creative Graphic Design
- Credit-Budgeted ICPC-Style Coding: When LLM Agents Must Pay for Every Decision
- CREPE: Controlling diffusion with REPlica Exchange
- Critic–Adviser–Reviser Cyclic Refinement: Towards High-Quality EMR Corpus Generation with LLMs
- Critical attention scaling in long-context transformers
- Critical Confabulation: Can LLMs Hallucinate for Social Good?
- Critique-Coder: Enhancing Coder Models by Critique Reinforcement Learning
- Critique-RL: Training Critiquing Language Models Through Two-Stage RL for Improved Discrimination and Constructive Feedback
- CR-Net: Scaling Parameter-Efficient Training with Cross-Layer Low-Rank Structure
- CroCoDiLight: Repurposing Cross-View Completion Encoders for Relighting
- CRONOS: Continuous time reconstruction for 4D medical longitudinal series
- Cross-ControlNet: Training-Free Fusion of Multiple Conditions for Text-to-Image Generation
- Cross-Domain Lossy Compression via Rate- and Classification-Constrained Optimal Transport
- Cross-Domain Policy Optimization via Bellman Consistency and Hybrid Critics
- Cross-Embodied Co-Design for Dexterous Hands
- Cross-Embodiment Offline Reinforcement Learning for Heterogeneous Robot Datasets
- Cross-Modal Redundancy and the Geometry of Vision–Language Embeddings
- CrossPL: Systematic Evaluation of Large Language Models for Cross Programming Language Interoperating Code Generation
- Cross-Timestep: 3D Diffusion Model with Trans-temporal Memory LSTM and Adaptive Priori Decoding Strategy for Medical Segmentation
- Cross-Tokenizer Likelihood Scoring Algorithms for Language Model Distillation
- CryoLVM: Self-supervised Learning from Cryo-EM Density Maps with Large Vision Models
- CryoNet.Refine: A One-step Diffusion Model for Rapid Refinement of Structural Models with Cryo-EM Density Map Restraints
- CryoSplat: Gaussian Splatting for Cryo-EM Homogeneous Reconstruction
- CSRv2: Unlocking Ultra-Sparse Embeddings
- CTBench: Cryptocurrency Time Series Generation Benchmark
- CTC-DRO: Robust Optimization for Reducing Language Disparities in Speech Recognition
- CTRL&SHIFT: High-quality Geometry-Aware Object Manipulation in Visual Generation
- Ctrl-World: A Controllable Generative World Model for Robot Manipulation
- CubeBench: Diagnosing Interactive, Long-Horizon Physical Intelligence under Partial Observations
- CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning
- Cultivating Pluralism In Algorithmic Monoculture: The Community Alignment Dataset
- Culture in Action: Evaluating Text-to-Image Models through Social Activities
- Culture In a Frame: C$^3$B as a Comic-Based Benchmark for Multimodal Culturally Awareness
- CUPID: A Plug-in Framework for Joint Aleatoric and Epistemic Uncertainty Estimation with a Single Model
- Curation Leaks: Membership Inference Attacks against Data Curation for Machine Learning
- CurES: From Gradient Analysis to Efficient Curriculum Learning for Reasoning LLMs
- Curriculum Reinforcement Learning from Easy to Hard Tasks Improves LLM Reasoning
- Curse of Slicing: Why Sliced Mutual Information is a Deceptive Measure of Statistical Dependence
- Curvature-Guided Task Synergy for Skeleton based Temporal Action Segmentation
- Customizing Visual Emotion Evaluation for MLLMs: An Open-vocabulary, Multifaceted, and Scalable Approach
- Cut Less, Fold More: Model Compression through the Lens of Projection Geometry
- Cutting the Skip: Training Residual-Free Transformers
- C-Voting: Confidence-Based Test-Time Voting without Explicit Energy Functions
- CyberGym: Evaluating AI Agents' Real-World Cybersecurity Capabilities at Scale
- Cyber-Zero: Training Cybersecurity Agents without Runtime
- CyclicReflex: Improving Reasoning Models via Cyclical Reflection Token Scheduling
- CylinderSplat: 3D Gaussian Splatting with Cylindrical Triplanes for Panoramic Novel View Synthesis
- d$^2$Cache: Accelerating Diffusion-Based LLMs via Dual Adaptive Caching
- D$^2$GS: Depth-and-Density Guided Gaussian Splatting for Stable and Accurate Sparse-View Reconstruction
- D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI
- DA$^2$: Depth Anything in Any Direction
- DAComp: Benchmarking Data Agents across the Full Data Intelligence Lifecycle
- DADA: Dual Averaging with Distance Adaptation
- DAG-Math: Graph-Guided Mathematical Reasoning in LLMs
- DAMR: Efficient and Adaptive Context-Aware Knowledge Graph Question Answering with LLM-Guided MCTS
- DanceTogether: Generating Interactive Multi-Person Video without Identity Drifting
- Dancing in Chains: Strategic Persuasion in Academic Rebuttal via Theory of Mind
- D-AR: Diffusion via Autoregressive Models
- DARE-bench: Evaluating Modeling and Instruction Fidelity of LLMs in Data Science
- Darwin Gödel Machine: Open-Ended Evolution of Self-Improving Agents
- DASH: Deterministic Attention Scheduling for High-throughput Reproducible LLM Training
- Data Aware and Scalable Sensitivity Analysis for Decision Tree Ensembles
- Data-Centric Lessons To Improve Speech-Language Pretraining
- Dataless Weight Disentanglement in Task Arithmetic via Kronecker-Factored Approximate Curvature
- DataMIL: Selecting Data for Robot Imitation Learning with Datamodels
- Data Provenance for Image Auto-Regressive Generation
- Data Selection for LLM Alignment Using Fine-Grained Preferences
- Dataset Color Quantization: A Training-Oriented Framework for Dataset-Level Compression
- Dataset Distillation as Pushforward Optimal Quantization
- Dataset Distillation for Memorized Data: Soft Labels can Leak Held-Out Teacher Knowledge
- Data-to-Energy Stochastic Dynamics
- DAVE: A VLM Vision Encoder for Document Understanding and Web Agents
- DaVinci: Reinforcing Visual-Structural Syntax in MLLMs for Generalized Scientific Diagram Parsing
- DCFold: Efficient Protein Structure Generation with Single Forward Pass
- DeAltHDR: Learning HDR Video Reconstruction from Degraded Alternating Exposure Sequences
- DEAS: DEtached value learning with Action Sequence for Scalable Offline RL
- Death of the Novel(ty): Beyond N-Gram Novelty as a Metric for Textual Creativity
- Debiased and Denoised Projection Learning for Incomplete Multi-view Clustering
- Debugging Concept Bottleneck Models through Removal and Retraining
- DecAlign: Hierarchical Cross-Modal Alignment for Decoupled Multimodal Representation Learning
- Decentralized Attention Fails Centralized Signals: Rethinking Transformers for Medical Time Series
- Decentralized Nonconvex Optimization under Heavy-Tailed Noise: Normalization and Optimal Convergence
- Decision Aggregation under Quantal Response
- Decision Pre-Trained Transformer is a Scalable In-Context Reinforcement Learner
- Decision-Theoretic Approaches for Improved Learning-Augmented Algorithms
- Declarative Audio Editing with Audio Language Model
- DeCo-DETR: Decoupled Cognition DETR for efficient Open-Vocabulary Object Detection
- Decoding Dynamic Visual Experience from Calcium Imaging via Cell-Pattern-Aware SSL
- Decoding Inner Speech with an End-to-End Brain-to-Text Neural Interface
- Decoding Open-Ended Information Seeking Goals from Eye Movements in Reading
- DecompGAIL: Learning Realistic Traffic Behaviors with Decomposed Multi-Agent Generative Adversarial Imitation Learning
- Decomposed Attention Fusion in MLLMs for Training-free Video Reasoning Segmentation
- Decomposing Extrapolative Problem Solving: Spatial Transfer and Length Scaling with Map Worlds
- Decomposing Representation Space into Interpretable Subspaces with Unsupervised Learning
- Decomposition of Concept-Level Rules in Visual Scenes
- Deconstructing Guidance: A Semantic Hierarchy for Precise Diffusion Model Editing
- Deconstructing Positional Information: From Attention Logits to Training Biases
- Decoupled DMD: CFG Augmentation as the Spear, Distribution Matching as the Shield
- Decoupled MeanFlow: Turning Flow Models into Flow Maps for Accelerated Sampling
- Decoupled Q-Chunking
- Decoupling Dynamical Richness from Representation Learning: Towards Practical Measurement
- Decoupling Positional and Symbolic Attention in Transformers
- Decoupling Primitive with Experts: Dynamic Feature Alignment for Compositional Zero-Shot Learning
- Decoupling the Class Label and the Target Concept in Machine Unlearning
- DeepAFL: Deep Analytic Federated Learning
- DeepCompress: A Dual Reward Strategy for Dynamically Exploring and Compressing Reasoning Chains
- DeepEyes: Incentivizing "Thinking with Images" via Reinforcement Learning
- DeepEyesV2: Toward Agentic Multimodal Model
- Deep FlexQP: Accelerated Nonlinear Programming via Deep Unfolding
- DeepFRC: An End-to-End Deep Learning Model for Functional Registration and Classification
- Deep Generative Model in Machine Learning: Theory, Principle and Efficacy (2nd Workshop)
- Deep Global-sense Hard-negative Discriminative Generation Hashing for Cross-modal Retrieval
- Deep Hierarchical Learning with Nested Subspace Networks
- Deep-ICE: The first globally optimal algorithm for empirical risk minimization of two-layer maxout and ReLU networks
- Deep Ignorance: Filtering Pretraining Data Builds Tamper-Resistant Safeguards into Open-Weight LLMs
- Deep Latent Variable Model based Vertical Federated Learning with Flexible Alignment and Labeling Scenarios
- Deep Learning for Subspace Regression
- Deep Learning with Learnable Product-Structured Activations
- DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning
- DeepPrim: a Physics-Driven 3D Short-term Weather Forecaster via Primitive Equation Learning
- DeepRAG: Thinking to Retrieve Step by Step for Large Language Models
- DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents
- DeepSADR: Deep Transfer Learning with Subsequence Interaction and Adaptive Readout for Cancer Drug Response Prediction
- DeepScientist: Advancing Frontier-Pushing Scientific Findings Progressively
- DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search
- Deep SPI: Safe Policy Improvement via World Models
- Deep Think with Confidence
- DeepTRACE: Auditing Deep Research AI Systems for Tracking Reliability Across Citations and Evidence
- DeepWeightFlow: Re-Basined Flow Matching for Generating Neural Network Weights
- Defending against Backdoor Attacks via Module Switching
- Deforming Videos to Masks: Flow Matching for Referring Video Segmentation
- Deft Scheduling of Dynamic Cloud Workflows with Varying Deadlines via Mixture-of-Experts
- Delay Flow Matching
- DeLeaker: Dynamic Inference-Time Reweighting For Semantic Leakage Mitigation in Text-to-Image Models
- DeLiVR: Differential Spatiotemporal Lie Bias for Efficient Video Deraining
- DELTA-Code: How RL Unlocks and Transfers New Programming Algorithms in LLMs
- Delta-XAI: A Unified Framework for Explaining Prediction Changes in Online Time Series Monitoring
- Delving into Spectral Clustering with Vision-Language Representations
- DeMo: Decoupled Momentum Optimization
- DemoGrasp: Universal Dexterous Grasping from a Single Demonstration
- Demystifying and Enhancing the Efficiency of Large Language Model Based Search Agents
- Demystifying Deep Search: A Holistic Evaluation with Hint-free Multi-Hop Questions and Factorised Metrics
- Demystifying Emergent Exploration in Goal-Conditioned RL
- Demystifying Robot Diffusion Policies: Action Memorization and a Simple Lookup Table Alternative
- Demystifying Supervision Data Generalization in Multimodal LMs
- DeNOTS: Stable Deep Neural ODEs for Time Series
- Dens3R: A Foundation Model for 3D Geometry Prediction
- DenseGRPO: From Sparse to Dense Reward for Flow Matching Model Alignment
- Densemarks: Learning Canonical Embeddings for Human Heads Images via Point Tracks
- Deploying Models to Non-participating Clients in Federated Learning without Fine-tuning: A Hypernetwork-based Approach
- DePO: Demonstration-guided Policy Optimization for Molecular Optimization
- Depth Anything 3: Recovering the Visual Space from Any Views
- Depth Anything with Any Prior
- DepthLM: Metric Depth from Vision Language Models
- DeRaDiff: Denoising Time Realignment of Diffusion Models
- Derandomized Online-to-Non-convex Conversion for Stochastic Weakly Convex Optimization
- DESIGNER: Design-Logic-Guided Multidisciplinary Data Synthesis for LLM Reasoning
- Designing Affine-Invariant Neural Networks for Photometric Corruption Robustness and Generalization
- Designing Rules to Pick a Rule: Aggregation by Consistency
- Designing Time Series Experiments in A/B Testing with Transformer Reinforcement Learning
- DES-LOC: Desynced Low Communication Adaptive Optimizers for Foundation Models
- Detect, Decide, Unlearn: A Transfer-Aware Framework for Continual Learning
- Detecting and Mitigating Memorization in Diffusion Models through Anisotropy of the Log-Probability
- Detecting Data Contamination from Reinforcement Learning Post-training for Large Language Models
- Detecting Data Contamination in LLMs via In-Context Learning
- Detecting Invariant Manifolds in ReLU-Based RNNs
- Detecting Misbehaviors of Large Vision-Language Models by Evidential Uncertainty Quantification
- Detecting Temporal Misalignment Attacks in Multimodal Fusion for Autonomous Driving
- Detection of unknown unknowns in autonomous systems
- Detective SAM: Adaptive AI-Image Forgery Localization
- Deterministic Bounds and Random Estimates of Metric Tensors on Neuromanifolds
- DETR-ViP: Detection Transformer with Robust Discriminative Visual Prompts
- Developmental Federated Tuning: A Cognitive-Inspired Paradigm for Efficient LLM Adaptation
- DevOps-Gym: Benchmarking AI Agents in Software DevOps Cycle
- DexMove: Learning Tactile-Guided Non-Prehensile Manipulation with Dexterous Hands
- DexNDM: Closing the Reality Gap for Dexterous In-Hand Rotation via Joint-Wise Neural Dynamics Model
- DGNet: Learning Spatiotemporal PDEs with Discrete Green Networks
- DHG-Bench: A Comprehensive Benchmark for Deep Hypergraph Learning
- DiaBlo: Diagonal Blocks Are Sufficient For Finetuning
- Diagnosing and Improving Diffusion Models by Estimating Optimal Loss Value
- Diagnosing and Remedying Knowledge Deficiencies in LLMs via Label-free Curricular Meaningful Learning
- Diagnosing Failures in Generalization from Task-Relevant Representational Geometry
- DiCache: Let Diffusion Model Determine Its Own Cache
- Dichotomous Diffusion Policy Optimization
- DiffAdapt: Difficulty-Adaptive Reasoning for Token-Efficient LLM Inference
- Difference-Aware Retrieval Polices for Imitation Learning
- Difference Predictive Coding for Training Spiking Neural Networks
- Differentiable JPEG-based Input Perturbation for Knowledge Distillation Amplification via Conditional Mutual Information Maximization
- Differentiable Lifting for Topological Neural Networks
- Differentiable Model Predictive Control on the GPU
- Differentiable Simulation of Hard Contacts with Soft Gradients for Learning and Control
- Differential Fine-Tuning Large Language Models Towards Better Diverse Reasoning Abilities
- Differentially Private Domain Discovery
- Differentially Private Equilibrium Finding in Polymatrix Games
- Differentially Private Two-Stage Gradient Descent for Instrumental Variable Regression
- Difficult Examples Hurt Unsupervised Contrastive Learning: A Theoretical Perspective
- Difficulty–Diversity Collaborative Filtering for Data-Efficient LLM Fine-Tuning
- DiffInk: Glyph- and Style-Aware Latent Diffusion Transformer for Text to Online Handwriting Generation
- DiffPBR: Point-Based Rendering via Spatial-Aware Residual Diffusion
- DiffSDA: Unsupervised Diffusion Sequential Disentanglement Across Modalities
- DIFFSPARSE: ACCELERATING DIFFUSION TRANSFORMERS WITH LEARNED TOKEN SPARSITY
- DiffTrans: Differentiable Geometry-Materials Decomposition for Reconstructing Transparent Objects
- DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation
- DiffuDETR: Rethinking Detection Transformers with Diffusion Process
- DiffuGuard: How Intrinsic Safety is Lost and Found in Diffusion Large Language Models
- Diffusion & Adversarial Schrödinger Bridges via Iterative Proportional Markovian Fitting
- Diffusion Alignment as Variataional Expectation-Maximization
- Diffusion and Flow-based Copulas: Forgetting and Remembering Dependencies
- Diffusion Blend: Inference-Time Multi-Preference Alignment for Diffusion Models
- DiffusionBlocks: Block-wise Neural Network Training via Diffusion Interpretation
- Diffusion Bridge Variational Inference for Deep Gaussian Processes
- Diffusion-DFL: Decision-focused Diffusion Models for Stochastic Optimization
- Diffusion Fine-Tuning via Reparameterized Policy Gradient of the Soft Q-Function
- Diffusion Language Model Knows the Answer Before It Decodes
- Diffusion Language Models are Provably Optimal Parallel Samplers
- Diffusion Language Models For Code Infilling Beyond Fixed-size Canvas
- Diffusion LLMs Can Do Faster-Than-AR Inference via Discrete Diffusion Forcing
- Diffusion Models as Dataset Distillation Priors
- Diffusion Negative Preference Optimization Made Simple
- DiffusionNFT: Online Diffusion Reinforcement with Forward Process
- Diffusion Transformers with Representation Autoencoders
- DiffVax: Optimization-Free Image Immunization Against Diffusion-Based Editing
- DiffWind: Physics-Informed Differentiable Modeling of Wind-Driven Object Dynamics
- Dimension-Free Decision Calibration for Nonlinear Loss Functions
- Dimension-Free Minimax Rates for Learning Pairwise Interactions in Attention-Style Models
- DiMeR: Disentangled Mesh Reconstruction Model with Normal-only Geometry Training
- Direct Doubly Robust Estimation of Conditional Quantile Contrasts
- Directed Semi-Simplicial Learning with Applications to Brain Activity Decoding
- Directional Convergence, Benign Overfitting of Gradient Descent in leaky ReLU two-layer Neural Networks
- Directional Sheaf Hypergraph Networks: Unifying Learning on Directed and Undirected Hypergraphs
- Directional Textual Inversion for Personalized Text-to-Image Generation
- Direct Preference Optimization for Primitive-Enabled Hierarchical RL: A Bilevel Approach
- Direct Reward Fine-Tuning on Poses for Single Image to 3D Human in the Wild
- DirMoE: Dirichlet-Routed Mixture of Experts
- Discern Truth from Falsehood: Reducing Over-Refusal via Contrastive Refinement
- Disco: Densely-overlapping Cell Instance Segmentation via Adjacency-aware Collaborative Coloring
- DISCO: Diversifying Sample Condensation for Accelerating Model Evaluation
- Discounted Online Convex Optimization: Uniform Regret Across a Continuous Interval
- Discount Model Search for Quality Diversity Optimization in High-Dimensional Measure Spaces
- Discovering alternative solutions beyond the simplicity bias in recurrent neural networks
- Discovering and Steering Interpretable Concepts in Large Generative Music Models
- Discovering Diverse Behaviors via Temporal Contrastive Learning
- Discovering heterogeneous synaptic plasticity rules via large-scale neural evolution
- Discovering Hierarchical Software Engineering Agents via Bandit Optimization
- Discovering Novel LLM Experts via Task-Capability Coevolution
- DiscoX: Benchmarking Discourse-Level Translation in Expert Domains
- Discrete Adjoint Matching
- Discrete Bayesian Sample Inference for Graph Generation
- Discrete Compositional Generation via General Soft Operators and Robust Reinforcement Learning
- Discrete Diffusion for Bundle Construction
- Discrete Diffusion for Reflective Vision-Language-Action Models in Autonomous Driving
- Discrete Diffusion Trajectory Alignment via Stepwise Decomposition
- Discrete Guidance Matching: Exact Guidance for Discrete Flow Matching
- Discrete Latent Features Ablate Adversarial Attack: A Robust Prompt Tuning Framework for VLMs
- Discrete Variational Autoencoding via Policy Search
- Disentangled Hierarchical VAE for 3D Human-Human Interaction Generation
- Disentangled Representation Learning for Parametric Partial Differential Equations
- Disentangled representation learning through unsupervised symmetry group discovery
- Disentangled Robot Learning via Separate Forward and Inverse Dynamics Pretraining
- Disentanglement of Variations with Multimodal Generative Modeling
- Disentangling Knowledge Representations for Large Language Model Editing
- Disentangling Length Bias in Preference Learning via Response-Conditioned Modeling
- Disentangling the Factors of Convergence between Brains and Computer Vision Models
- Displacement-Resistant Extensions of DPO with Nonconvex $f$-Divergences
- DispViT: Direct Stereo Disparity Regression with a Single-Stream Vision Transformer
- DiSRouter: Distributed Self-Routing for LLM Selections
- Disrupting Hierarchical Reasoning: Adversarial Protection for Geographic Privacy in Multimodal Reasoning Models
- Dissecting Representation Misalignment in Contrastive Learning via Influence Function
- DisTaC: Conditioning Task Vectors via Distillation for Robust Model Merging
- DistDF: Time-series Forecasting Needs Joint-distribution Wasserstein Alignment
- Distillation of Large Language Models via Concrete Score Matching
- Distilled Pretraining: A modern lens of Data, In-Context Learning and Test-Time Scaling
- Distilling and Adapting: A Topology-Aware Framework for Zero-Shot Interaction Prediction in Multiplex Biological Networks
- Distilling the Thought, Watermarking the Answer: A Principle Semantic Guided Watermark for Reasoning Large Language Models
- Distilling to Hybrid Attention Models via KL-Guided Layer Selection
- DistillKac: Few-Step Image Generation via Damped Wave Equations
- Distill-SynthKG: Distilling Knowledge Graph Synthesis Workflow for Improved Coverage and Efficiency
- DistMLIP: A Distributed Inference Platform for Machine Learning Interatomic Potentials
- Distractor-free Generalizable 3D Gaussian Splatting
- Distributed Algorithms for Euclidean Clustering
- Distributional Consistency Loss: Beyond Pointwise Data Terms in Inverse Problems
- Distributional Equivalence in Linear Non-Gaussian Latent-Variable Cyclic Causal Models: Characterization and Learning
- Distributionally Robust Classification for Multi-source Unsupervised Domain Adaptation
- Distributionally Robust Cooperative Multi-agent Reinforcement Learning with Value Factorization
- Distributionally Robust Linear Regression with Block Lewis Weights
- Distributionally Robust Optimization via Generative Ambiguity Modeling
- Distributional value gradients for stochastic environments
- Distributional Vision-Language Alignment by Cauchy-Schwarz Divergence
- Distribution-Aware Multi-Granularity Phase Coding: Towards Lower Conversion Error for Spike-Driven Large Language Models
- Distribution-informed Online Conformal Prediction
- Distributions as Actions: A Unified Framework for Diverse Action Spaces
- DIVA-GRPO: Enhancing Multimodal Reasoning through Difficulty-Adaptive Variant Advantage
- DiVE-k: DIFFERENTIAL VISUAL REASONING FOR FINE-GRAINED IMAGE RECOGNITION
- DiVeQ: Differentiable Vector Quantization Using the Reparameterization Trick
- Divergence-Free Neural Networks with Application to Image Denoising
- Diverse and Sparse Mixture-of-Experts for Causal Subgraph–Based Out-of-Distribution Graph Learning
- Diverse Dictionary Learning
- DIVERSE: Disagreement-Inducing Vector Evolution for Rashomon Set Exploration
- Diverse Text Decoding via Iterative Reweighting
- Diverse Text-to-Image Generation via Contrastive Noise Optimization
- Diversified Multinomial Logit Contextual Bandits
- Diversity-Aware Online Prompt Assignment to Generative Models
- Diversity-Enhanced Reasoning for Subjective Questions
- Diversity-Incentivized Exploration for Versatile Reasoning
- Divid: Disentangled Spatial-Temporal Modeling within LLMs for Temporally Grounded Video Understanding
- Divide and Abstract: Autoformalization via Decomposition and Abstraction Learning
- Divide, Harmonize, Then Conquer It: Shooting Multi-Commodity Flow Problems with Multimodal Language Models
- DM4CT: Benchmarking Diffusion Models for Computed Tomography Reconstruction
- DMAP: A Distribution Map for Text
- DND: Boosting Large Language Models with Dynamic Nested Depth
- DNT: a Deeply Normalized Transformer that can be trained by Momentum SGD
- Do 3D Large Language Models Really Understand 3D Spatial Relationships?
- Doctor-R1: Mastering Clinical Inquiry with Experiential Agentic Reinforcement Learning
- Does “Do Differentiable Simulators Give Better Policy Gradients?” Give Better Policy Gradients?
- Does FLUX Already Know How to Perform Physically Plausible Image Composition?
- Does Higher Interpretability Imply Better Utility? A Pairwise Analysis on Sparse Autoencoders
- Does the Data Processing Inequality Reflect Practice? On the Utility of Low-Level Tasks
- Does Weak-to-strong Generalization Happen under Spurious Correlations?
- DoFlow: Flow-based Generative Models for Interventional and Counterfactual Forecasting on Time Series
- Do Large Language Models Know What They Are Capable Of?
- Do LLM Agents Know How to Ground, Recover, and Assess? A Benchmark for Epistemic Competence in Information-Seeking Agents
- Do LLMs Forget What They Should? Evaluating In-Context Forgetting in Large Language Models
- Doloris: Dual Conditional Diffusion Implicit Bridges with Sparsity Masking Strategy for Unpaired Single-Cell Perturbation Estimation
- Domain Expansion: A Latent Space Construction Framework for Multi-Task Learning
- Do Not Let Low-Probability Tokens Over-Dominate in RL for LLMs
- Don't Forget Its Variance! The Minimum Path Variance Principle for Accurate and Stable Score-Based Density Ratio Estimation
- Don't Just Fine-tune the Agent, Tune the Environment
- Don’t Pass@$k$: A Bayesian Framework for Large Language Model Evaluation
- Don't Settle Too Early: Self-Reflective Remasking for Diffusion Language Models
- Don't Shift the Trigger: Robust Gradient Ascent for Backdoor Unlearning
- Don't Throw Away Your Beams: Improving Consistency-based Uncertainties in LLMs via Beam Search
- Don't Throw Away Your Pretrained Model
- DOPPLER: Dual-Policy Learning for Device Assignment in Asynchronous Dataflow Graphs
- Doubly-Regressing Approach for Subgroup Fairness
- Doubly-Robust LLM-as-a-Judge: Externally Valid Estimation with Imperfect Personas
- DoVer: Intervention-Driven Auto Debugging for LLM Multi-Agent Systems
- Do Vision-Language Models Respect Contextual Integrity in Location Disclosure?
- Do We Need All the Synthetic Data? Targeted Image Augmentation via Diffusion Models
- Do We Really Need Permutations? Impact of Width Expansion on Linear Mode Connectivity
- Downgrade to Upgrade: Optimizer Simplification Enhances Robustness in LLM Unlearning
- Doxing via the Lens: Revealing Location-related Privacy Leakage on Multi-modal Large Reasoning Models
- DPad: Efficient Diffusion Language Models with Suffix Dropout
- dParallel: Learnable Parallel Decoding for dLLMs
- DP-Fusion: Token-Level Differentially Private Inference for Large Language Models
- DPQuant: Efficient and Private Model Training via Dynamic Quantization Scheduling
- Draft-based Approximate Inference for LLMs
- DragFlow: Unleashing DiT Priors with Region-Based Supervision for Drag Editing
- Dragging with Geometry: From Pixels to Geometry-Guided Image Editing
- DRAGON: Guard LLM Unlearning in Context via Negative Detection and Reasoning
- Draw-In-Mind: Rebalancing Designer-Painter Roles in Unified Multimodal Models Benefits Image Editing
- DRBench: A Realistic Benchmark for Enterprise Deep Research
- DreamCS: Geometry-Aware Text-to-3D Generation with Unpaired 3D Reward Supervision
- DreamPhase: Offline Imagination and Uncertainty-Guided Planning for Large-Language-Model Agents
- DreamSwapV: Mask-guided Subject Swapping for Any Customized Video Editing
- D-REX: Differentiable Real-to-Sim-to-Real Engine for Learning Dexterous Grasping
- DR-GGAD: Dual Residual Centering for Mitigating Anomaly Non‑Discriminativity in Generalist Graph Anomaly Detection
- DRIFT: Decompose, Retrieve, Illustrate, then Formalize Theorems
- DRIFT: Divergent Response in Filtered Transformations for Robust Adversarial Defense
- DRIFT: Learning from Abundant User Dissatisfaction in Real-World Preference Learning
- DriftLite: Lightweight Drift Control for Inference-Time Scaling of Diffusion Models
- DRIFT-Net: A Spectral-Coupled Neural Operator for PDEs Learning
- DriveAgent-R1: Advancing VLM-based Autonomous Driving with Active Perception and Hybrid Thinking
- DriveMamba: Task-Centric Scalable State Space Model for Efficient End-to-End Autonomous Driving
- DriveVLA-W0: World Models Amplify Data Scaling Law in Autonomous Driving
- DrivingGen: A Comprehensive Benchmark for Generative Video World Models in Autonomous Driving
- Dr.LLM: Dynamic Layer Routing in LLMs
- Dropping Just a Handful of Preferences Can Change Top Large Language Model Rankings
- DRPO: Efficient Reasoning via Decoupled Reward Policy Optimization
- D&R: Recovery-based AI-Generated Text Detection via a Single Black-box LLM Call
- DR-SAC: Distributionally Robust Soft Actor-Critic for Reinforcement Learning under Uncertainty
- DR-Submodular Maximization with Stochastic Biased Gradients: Classical and Quantum Gradient Algorithms
- Drugging the Undruggable: Benchmarking and Modeling Fragment-Based Screening
- DrugTrail: Explainable Drug Discovery via Structured Reasoning and Druggability‑Tailored Preference Optimization
- DrVoice: Parallel Speech-Text Voice Conversation Model via Dual-Resolution Speech Representations
- DSA: Efficient Inference For Video Generation Models via Distributed Sparse Attention
- DSSA: Dense-Sparse Switchable Attention for Seamless Short-to-Long Adaptation
- DTO-KD: Dynamic Trade-off Optimization for Effective Knowledge Distillation
- DTP: Delta-Guided Two Stage Pruning for Mamba-based Multimodal Large Language Models
- Dual-Branch Representations with Dynamic Gated Fusion and Triple-Granularity Alignment for Deep Multi-View Clustering
- Dual Distillation for Few-Shot Anomaly Detection
- Dual Goal Representations
- Dual-IPO: Dual-Iterative Preference Optimization for Text-to-Video Generation
- Dual-Kernel Adapter: Expanding Spatial Horizons for Data-Constrained Medical Image Analysis
- Dual Language Models: Balancing sample-efficiency and overfitting resilience
- DualMap: Enabling Both Cache Affinity and Load Balancing for Distributed LLM Serving
- Dual-Objective Reinforcement Learning with Novel Hamilton-Jacobi-Bellman Formulations
- Dual Optimistic Ascent (PI Control) is the Augmented Lagrangian Method in Disguise
- Dual-Path Condition Alignment for Diffusion Transformers
- Dual Perspectives on Non-Contrastive Self-Supervised Learning
- Dual Randomized Smoothing: Beyond Global Noise Variance
- Dual-Robust Cross-Domain Offline Reinforcement Learning Against Dynamics Shifts
- Dual-Scale World Models for LLM Agents towards Hard-Exploration Problems
- Dual-Solver: A Generalized ODE Solver for Diffusion Models with Dual Prediction
- Dual-Space Smoothness for Robust and Balanced LLM Unlearning
- DualToken: Towards Unifying Visual Understanding and Generation with Dual Visual Vocabularies
- DUET: DISTILLED LLM UNLEARNING FROM AN EFFICIENTLY CONTEXTUALIZED TEACHER
- DUET: Optimizing Training Data Mixtures via Coarse, Noisy Feedback from Unseen Evaluation Tasks
- DuPO: Enabling Reliable Self-Verification via Dual Preference Optimization
- Durian: Dual Reference Image-Guided Portrait Animation with Attribute Transfer
- DVD-Quant: Data-free Video Diffusion Transformers Quantization
- DVLA-RL: Dual-Level Vision–Language Alignment with Reinforcement Learning Gating for Few-Shot Learning
- DynaGuard: A Dynamic Guardian Model With User-Defined Policies
- Dynamical properties of dense associative memory
- Dynamic Chunking for End-to-End Hierarchical Sequence Modeling
- Dynamic Classifier-Free Diffusion Guidance via Online Feedback
- Dynamic-dLLM: Dynamic Cache-Budget and Adaptive Parallel Decoding for Training-Free Acceleration of Diffusion LLM
- Dynamic Early Exit in Reasoning Models
- DynamicInfer: Runtime-Aware Sparse Offloading for LLMs Inference on a Consumer-Grade GPU
- Dynamic Kernel Graph Sparsifiers
- Dynamic Multimodal Activation Steering for Hallucination Mitigation in Large Vision-Language Models
- Dynamic Multi-sample Mixup with Gradient Exploration for Open-set Graph Anomaly Detection
- Dynamic Novel View Synthesis in High Dynamic Range
- Dynamic Reflections: Probing Video Representations with Text Alignment
- Dynamic Speculative Agent Planning
- Dynamics-Predictive Sampling for Active RL Finetuning of Large Reasoning Models
- Dynamic Texture Modeling of 3D Clothed Gaussian Avatars from a Single Video
- Dyna-Mind: Learning to Simulate from Experience for Better AI Agents
- Dyslexify: A Mechanistic Defense Against Typographic Attacks in CLIP
- E²LoRA: Efficient and Effective Low-Rank Adaptation with Entropy-Guided Adaptive Sharing
- e3: Learning to Explore Enables Extrapolation of Test-Time Compute for LLMs
- EA3D: Event-Augmented 3D Diffusion for Generalizable Novel View Synthesis
- EAMET: ROBUST MASSIVE MODEL EDITING VIA EMBEDDING ALIGNMENT OPTIMIZATION
- Early Signs of Steganographic Capabilities in Frontier LLMs
- Earth-Agent: Unlocking the Full Landscape of Earth Observation with Agents
- EarthSE: A Benchmark Evaluating Earth Scientific Exploration Capability for Large Language Models
- Easier Painting Than Thinking: Can Text-to-Image Models Set the Stage, but Not Direct the Play?
- EAST: Early Action Prediction Sampling Strategy with Token Masking
- EasyCreator: Empowering 4D Creation through Video Inpainting
- EasyTune: Efficient Step-Aware Fine-Tuning for Diffusion-Based Motion Generation
- Echoes as Anchors: Probabilistic Costs and Attention Refocusing in LLM Reasoning
- EchoGen: Generating Visual Echoes in Any Scene via Feed-Forward Subject-Driven Auto-Regressive Model
- EchoMind: An Interrelated Multi-level Benchmark for Evaluating Empathetic Speech Language Models
- EchoMotion: Unified Human Video and Motion Generation via Dual-Modality Diffusion Transformer
- ECHO: Toward Contextual Seq2Seq Paradigms in Large EEG Models
- Echo: Towards Advanced Audio Comprehension via Audio-Interleaved Reasoning
- EdgeCape: Edge Weight Prediction For Category-Agnostic Pose Estimation
- EDINET-Bench: Evaluating LLMs on Complex Financial Tasks using Japanese Financial Statements
- EditAnyShape: Shape-Aware Image Editing via Trajectory-Guided Region Control
- Edit-Based Flow Matching for Temporal Point Processes
- EditBench: Evaluating LLM Abilities to Perform Real-World Instructed Code Edits
- EditLens: Quantifying the Extent of AI Editing in Text
- EditReward: A Human-Aligned Reward Model for Instruction-Guided Image Editing
- EditScore: Unlocking Online RL for Image Editing via High-Fidelity Reward Modeling
- EditVerse: Unifying Image and Video Editing and Generation with In-Context Learning
- EdiVal-Agent: An Object-Centric Framework for Automated, Fine-Grained Evaluation of Multi-Turn Editing
- EEPO: Exploration-Enhanced Policy Optimization via Sample-Then-Forget
- Efficient Adversarial Attacks on High-dimensional Offline Bandits
- Efficient Agent Training for Computer Use
- Efficient algorithms for Incremental Metric Bipartite Matching
- Efficient and Sharp Off-Policy Learning under Unobserved Confounding
- Efficient Approximate Posterior Sampling with Annealed Langevin Monte Carlo
- Efficient Audio-Visual Speech Separation with Discrete Lip Semantics and Multi-Scale Global-Local Attention
- Efficient Autoregressive Inference for Transformer Probabilistic Models
- Efficient Benchmarking of Functional Connectivity Modeling via Structure-aware Core-set Selection
- Efficient Best-of-Both-Worlds Algorithms for Contextual Combinatorial Semi-Bandits
- Efficient Credal Prediction through Decalibration
- Efficient Degradation-agnostic Image Restoration via Channel-Wise Functional Decomposition and Manifold Regularization
- Efficient Differentiable Contact Model with Long-range Influence
- Efficient Discriminative Joint Encoders for Large Scale Vision-Language Reranking
- Efficient Ensemble Conditional Independence Test Framework for Causal Discovery
- Efficient Estimation of Kernel Surrogate Models for Task Attribution
- Efficient Learning on Large Graphs using a Densifying Regularity Lemma
- Efficient-LVSM: Faster, Cheaper, and Better Large View Synthesis Model via Decoupled Co-Refinement Attention
- Efficient Message-Passing Transformer for Error Correcting Codes
- Efficient Morphology–Control Co-Design via Stackelberg PPO under Non-Differentiable Leader–Follower Interfaces
- Efficient Multimodal Spatial Reasoning via Dynamic and Asymmetric Routing
- Efficient Multi-objective Prompt Optimization via Pure-exploration Bandits
- Efficient Offline Reinforcement Learning via Peer-Influenced Constraint
- Efficient Orthogonal Fine-Tuning with Principal Subspace Adaptation
- Efficient Prediction of Large Protein Complexes via Subunit-Guided Hierarchical Refinement
- Efficient Quantization of Mixture-of-Experts with Theoretical Generalization Guarantees
- Efficient Reasoning with Balanced Thinking
- Efficient Regression-based Training of Normalizing Flows for Boltzmann Generators
- Efficient Reinforcement Learning by Guiding World Models with Non-Curated Data
- Efficient Resource-Constrained Training of Vision Transformers via Subspace Optimization
- Efficient-SAM2: Accelerating SAM2 with Object-Aware Visual Encoding and Memory Retrieval
- Efficient Sliced Wasserstein Distance Computation via Adaptive Bayesian Optimization
- Efficient Spatially-Variant Convolution via Differentiable Sparse Kernel Complex
- Efficient Submodular Maximization for Sums of Concave over Modular Functions
- Efficient Testing for Correlation Clustering: Improved Algorithms and Optimal Bounds
- Efficient Test-Time Scaling for Small Vision-Language Models
- Efficient Turing Machine Simulation with Transformers
- Efficient Zero-shot Inpainting with Decoupled Diffusion Guidance
- EffiVMT: Video Motion Transfer via Efficient Spatial-Temporal Decoupled Finetuning
- Egalitarian Gradient Descent: A Simple Approach to Accelerated Grokking
- EGG-SR: Embedding Symbolic Equivalence into Symbolic Regression via Equality Graph
- EgoBrain: Synergizing Minds and Eyes For Human Action Understanding
- EgoDex: Learning Dexterous Manipulation from Large-Scale Egocentric Video
- Ego-Foresight: Self-supervised Learning of Agent-Aware Representations for Improved RL
- EgoHandICL: Egocentric 3D Hand Reconstruction with In-Context Learning
- EgoNight: Towards Egocentric Vision Understanding at Night with a Challenging Benchmark
- EgoTwin: Dreaming Body and View in First Person
- EgoWorld: Translating Exocentric View to Egocentric View using Rich Exocentric Observations
- Eigen-1: Scientific Reasoning through Adaptive Multi-Agent Refinement and Monitor-based RAG
- EigenBench: A Comparative Behavioral Measure of Value Alignment
- EigenScore: OOD Detection using Posterior Covariance in Diffusion Models
- Einstein Fields: A Neural Perspective To Computational General Relativity
- Elastic Optimal Transport: Theory, Application, and Empirical Evaluation
- ELEPHANT: Measuring and understanding social sycophancy in LLMs
- Eliciting Harmful Capabilities by Fine-Tuning on Safeguarded Outputs
- Eliciting Numerical Predictive Distributions of LLMs Without Auto-Regression
- Eliminating Inductive Bias in Reward Models with Information-Theoretic Guidance
- Eliminating VAE for Fast and High-Resolution Generative Detail Restoration
- ELLMob: Event-Driven Human Mobility Generation with Self-Aligned LLM Framework
- ELMUR: External Layer Memory with Update/Rewrite for Long-Horizon RL
- ELViS: Efficient Visual Similarity from Local Descriptors that Generalizes Across Domains
- Embedding-Based Context-Aware Reranker
- Embodied Agents Meet Personalization: Investigating Challenges and Solutions Through the Lens of Memory Utilization
- Embodied Navigation Foundation Model
- Embodied-R1: Reinforced Embodied Reasoning for General Robotic Manipulation
- Embracing Discrete Search: A Reasonable Approach to Causal Structure Learning
- EMBridge: Enhancing Gesture Generalization from EMG Signals Through Cross-modal Representation Learning
- Emergence of Spatial Representation in an Actor-Critic Agent with Hippocampus-Inspired Sequence Generator
- Emergence of Superposition: Unveiling the Training Dynamics of Chain of Continuous Thought
- Emergent Coordination in Multi-Agent Language Models
- Emergent Dexterity Via Diverse Resets and Large-Scale Reinforcement Learning
- Emergent Discrete Controller Modules for Symbolic Planning in Transformers
- Emergent Hierarchical Reasoning in LLMs through Reinforcement Learning
- Emergent Misalignment is Easy, Narrow Misalignment is Hard
- EMFuse: Energy-based Model Fusion for Decision Making
- EmoPrefer: Can Large Language Models Understand Human Emotion Preferences?
- EmotionHallucer: Evaluating Emotion Hallucinations in Multimodal Large Language Models
- Emotions Where Art Thou: Understanding and Characterizing the Emotional Latent Space of Large Language Models
- EmotionThinker: Prosody-Aware Reinforcement Learning for Explainable Speech Emotion Reasoning
- Empowering Efficiency and Efficacy in WebAgent via Enabling Info-Rich Seeking
- Empowering LLM Tool Invocation with Tool-call Reward Model
- Empowering Multi-Robot Cooperation via Sequential World Models
- Empowering Small VLMs to Think with Dynamic Memorization and Exploration
- Enabling arbitrary inference in spatio-temporal dynamic systems: A physics-inspired perspective
- Enabling Fine-Tuning of Direct Feedback Alignment via Feedback-Weight Matching
- Enabling True Global Perception in State Space Models for Visual Tasks
- Enabling Your Forensic Detector Know How Well It Performs on Distorted Samples
- ENACT: Evaluating Embodied Cognition with World Modeling of Egocentric Interaction
- Endowing GPT-4 with a Humanoid Body: Building the Bridge Between Off-the-Shelf VLMs and the Physical World
- End-to-end Listen, Look, Speak and Act
- End-to-End Probabilistic Framework for Learning with Hard Constraints
- Energy-Based Transformers are Scalable Learners and Thinkers
- Energy-Efficient Random Variate Generation via Compressed Lookup Tables
- Energy-Regularized Sequential Model Editing on Hyperspheres
- Enforcing Axioms for AI Alignment under Loss-Based Rules
- Enhanced Continual Learning of Vision-Language Models with Model Fusion
- Enhanced Generative Model Evaluation with Clipped Density and Coverage
- Enhancing Agentic Search via Data Synthesis on Hierarchical Constraint Satisfaction
- Enhancing Communication Compression via Discrepancy-aware Calibration for Federated Learning
- Enhancing Complex Symbolic Logical Reasoning of Large Language Models via Sparse Multi-Agent Debate
- Enhancing Diffusion-Based Sampling with Molecular Collective Variables
- Enhancing Generative Auto-bidding with Offline Reward Evaluation and Policy Search
- Enhancing Geometric Perception in VLMs via Translator-Guided Reinforcement Learning
- Enhancing Hallucination Detection through Noise Injection
- Enhancing Image-Conditional Coverage in Segmentation: Adaptive Thresholding via Differentiable Miscoverage Loss
- Enhancing Instruction Following of LLMs via Activation Steering with Dynamic Rejection
- Enhancing Language Model Reasoning with Structured Multi-Level Modeling
- Enhancing Learning with Noisy Labels via Rockafellian Relaxation
- Enhancing LLMs for Knowledge Base Question Answering by Chain-of-Decomposition
- Enhancing Molecular Property Predictions by Learning from Bond Modelling and Interactions
- Enhancing Multi-Image Understanding through Delimiter Token Scaling
- Enhancing Multivariate Time Series Forecasting with Global Temporal Retrieval
- Enhancing Persona Following at Decoding Time via Dynamic Importance Estimation for Role-Playing Agents
- Enhancing Shortcut Models with Cumulative Self-Consistency Loss for One-Step Diffusion
- Enhancing Sparse Event Detection in Healthcare Time-Series via Adaptive Gate of Context–Detail Interaction
- Enhancing Stability of Physics-Informed Neural Network Training Through Saddle-Point Reformulation
- Enhancing Trustworthiness of Fine-Tuned LLMs via Regularized Subset Selection
- Enhancing Vision Transformers for Object Detection via Context-Aware Token Selection and Packing
- Enhancing Visual Token Representations for Video Large Language Models via Training-free Spatial-Temporal Pooling and Gridding
- Enough is as good as a feast: A Comprehensive Analysis of How Reinforcement Learning Mitigates Task Conflicts in LLMs
- Ensemble Prediction of Task Affinity for Efficient Multi-Task Learning
- Ensembling Pruned Attention Heads For Uncertainty-Aware Efficient Transformers
- Entering the Era of Discrete Diffusion Models: A Benchmark for Schrödinger Bridges and Entropic Optimal Transport
- Entropic Confinement and Mode Connectivity in Overparameterized Neural Networks
- Entropy-Based Block Pruning for Efficient Large Language Models
- Entropy-Guided Dynamic Tokens for Graph-LLM Alignment in Molecular Understanding
- EntropyLong: Effective Long-Context Training via Predictive Uncertainty
- Entropy-Monitored Kernelized Token Distillation for Audio-Visual Compression
- Entropy-preserving reinforcement learning
- Entropy Regularizing Activation: Boosting Continuous Control, Large Language Models, and Image Classification with Activation as Entropy Constraints
- EnvSocial-Diff: A Diffusion-Based Crowd Simulation Model with Environmental Conditioning and Individual- Group Interaction
- Epistemic Uncertainty Quantification To Improve Decisions From Black-Box Models
- EquAct: An SE(3)-Equivariant Multi-Task Transformer for 3D Robotic Manipulation
- Equilibrium Language Models
- Equivariant Splitting: Self-supervised learning from incomplete data
- Erase or Hide? Suppressing Spurious Unlearning Neurons for Robust Unlearning
- Erase to Improve: Erasable Reinforcement Learning for Search-Augmented LLMs
- ERGO: Efficient High-Resolution Visual Understanding for Vision-Language Models
- Error as Signal: Stiffness-Aware Diffusion Sampling via Embedded Runge-Kutta Guidance
- Error Feedback for Muon and Friends
- Error Notebook-Guided, Training-Free Part Retrieval in 3D CAD Assemblies via Vision-Language Models
- ERTACache: Error Rectification and Timesteps Adjustment for Efficient Diffusion
- Escaping Low-Rank Traps: Interpretable Visual Concept Learning via Implicit Vector Quantization
- Escaping Model Collapse via Synthetic Data Verification: Near-term Improvements and Long-term Convergence
- Escaping Policy Contraction: Contraction-Aware PPO (CaPPO) for Stable Language Model Fine-Tuning
- Escaping the Homophily Trap: A Threshold-free Graph Outlier Detection Framework via Clustering-guided Edge Reweighting
- ES-dLLM: Efficient Inference for Diffusion Large Language Models by Early-Skipping
- Estimating Dimensionality of Neural Representations from Finite Samples
- Estimating Semantic Alphabet Size for LLM Uncertainty Quantification
- Estimating Worst-Case Frontier Risks of Open-Weight LLMs
- ETGS: Explicit Thermodynamics Gaussian Splatting for Dynamic Thermal Reconstruction
- EUBRL: Epistemic Uncertainty Directed Bayesian Reinforcement Learning
- EvA: Evolutionary Attacks on Graphs
- Evaluating and Improving Cultural Awareness of Reward Models for LLM Alignment
- Evaluating Cross-Modal Reasoning Ability and Problem Charactaristics with Multimodal Item Response Theory
- Evaluating Data Influence in Meta Learning
- Evaluating GFlowNet from partial episodes for stable and flexible policy-based training
- Evaluating Intuitive Physics Understanding in Video Diffusion Models via Likelihood Preference
- Evaluating Language Models' Evaluations of Games
- EVALUATING MEMORY IN LLM AGENTS VIA INCRE- MENTAL MULTI-TURN INTERACTIONS
- Evaluating SAE interpretability without generating explanations
- Evaluating Text Creativity across Diverse Domains: a Dataset and Large Language Model Evaluator
- EventFlash: Towards Efficient MLLMs for Event-Based Vision
- Event-T2M: Event-level Conditioning for Complex Text-to-Motion Synthesis
- EVEREST: A Transformer for Probabilistic Rare-Event Anomaly Detection with Evidential and Tail-Aware Uncertainty
- Every Language Model Has a Forgery-Resistant Signature
- Everything in Its Place: Benchmarking Spatial Intelligence of Text-to-Image Models
- Evidence for Limited Metacognition in LLMs
- EVLP: Learning Unified Embodied Vision-Language Planner with Reinforced Supervised Fine-Tuning
- Evoking User Memory: Personalizing LLM via Recollection-Familiarity Adaptive Retrieval
- EvolProver: Advancing Automated theorem proving by Evolving Formalized Problems via Symmetry and Difficulty
- Evolution and compression in LLMs: on the emergence of human-aligned categorization
- Evolutionary Caching to Accelerate Your Off-the-Shelf Diffusion Model
- Evolution of Concepts in Language Model Pre-Training
- Evolving Graph Structured Programs for Circuit Generation with Large Language Models
- EvoTest: Evolutionary Test-Time Learning for Self-Improving Agentic Systems
- Exchangeability of GNN Representations with Applications to Graph Retrieval
- ExGRPO: Learning to Reason from Prior Successes
- Exo-Plore: Exploring Exoskeleton Control Space through Human-aligned Simulation
- ExoPredicator: Learning Abstract Models of Dynamic Worlds for Robot Planning
- Expanding Reasoning Potential in Foundation Model by Learning Diverse Chains of Thought Patterns
- Expanding the Capability Frontier of LLM Agents with ZPD-Guided Data Synthesis
- EXP-Bench: Can AI Conduct AI Research Experiments?
- Experience-based Knowledge Correction for Robust Planning in Minecraft
- Expert Divergence Learning for MoE-based Language Models
- Expert Heads: Robust Evidence Identification for Large Language Models
- Expertise Can Be Helpful for Reinforcement Learning-based Macro Placement
- ExpertLongBench: Benchmarking Language Models on Expert-Level Long-Form Generation Tasks with Structured Checklists
- Expert Merging in Sparse Mixture of Experts with Nash Bargaining
- Expert Merging: Model Merging with Unsupervised Expert Alignment and Importance-Guided Layer Chunking
- ExpGuard: LLM Content Moderation in Specialized Domains
- Explainable $ K $-means Neural Networks for Multi-view Clustering
- Explainable LLM Unlearning through Reasoning
- Explainable Mixture Models through Differentiable Rule Learning
- Explainable Token-level Noise Filtering for LLM Fine-tuning Datasets
- Explaining Grokking and Information Bottleneck through Neural Collapse Emergence
- Explain in Your Own Words: Improving Reasoning via Token-Selective Dual Knowledge Distillation
- Exploiting Low-Dimensional Manifold of Features for Few-shot Whole Slide Image Classification
- Exploration v.s. Exploitation: Rethinking RLVR through Clipping, Entropy, and Spurious Reward
- Exploratory Causal Inference in SAEnce
- Exploratory Diffusion Model for Unsupervised Reinforcement Learning
- Exploratory Memory-Augmented LLM Agent via Hybrid On- and Off-Policy Optimization
- Explore-on-Graph: Incentivizing Autonomous Exploration of Large Language Models on Knowledge Graphs with Path-refined Reward Modeling
- Exploring Cross-Modal Flows for Few-Shot Learning
- Exploring Diverse Generation Paths via Inference-time Stiefel Activation Steering
- Exploring Interpretability for Visual Prompt Tuning with Cross-layer Concepts
- Exploring Knowledge Purification in Multi-Teacher Knowledge Distillation for LLMs
- Exploring Mode Connectivity in Krylov Subspace for Domain Generalization
- Exploring Real-Time Super-Resolution: Benchmarking and Fine-Tuning for Streaming Content
- Exploring Specular Reflection Inconsistency for Generalizable Face Forgery Detection
- Exploring State-Space Models for Data-Specific Neural Representations
- Exploring Synthesizable Chemical Space with Iterative Pathway Refinements
- Exploring the Basin-Like Loss Landscape in Large Language Models
- Exploring the Design Space of Transition Matching
- Exploring the Limits of Sub-Billion Language Model Reasoners with Open Training Recipes
- Exploring the Potential of Encoder-free Architectures in 3D LMMs
- ExPO-HM: Learning to Explain-then-Detect for Hateful Meme Detection
- Exponential-Wrapped Mechanisms: Differential Privacy on Hadamard Manifolds Made Practical
- Exposing and Defending the Achilles' Heel of Video Mixture-of-Experts
- Exposing Mixture and Annotating Confusion for Active Universal Test-Time Adaptation
- Exposing Weaknesses of Large Reasoning Models through Graph Algorithm Problems
- EXPO: Stable Reinforcement Learning with Expressive Policies
- Expressive and Invariant Graph Learning via Canonical Tree Cover Neural Networks
- Expressiveness of Multi-Neuron Convex Relaxations in Neural Network Certification
- Expressive yet Efficient Feature Expansion with Adaptive Cross-Hadamard Products
- ExpVid: A Benchmark for Experiment Video Understanding & Reasoning
- Extending Fourier Neural Operators for Modeling Parameterized and Coupled PDEs
- Extending Sequence Length is Not All You Need: Effective Integration of Multimodal Signals for Gene Expression Prediction
- Extending the Context of Pretrained LLMs by Dropping Their Positional Embedding
- Extreme Weather Nowcasting via Local Precipitation Pattern Prediction
- FACET: A Fragment-Aware Conformer Ensemble Transformer
- FACM: Flow-Anchored Consistency Models
- FACT: a first-principles alternative to the Neural Feature Ansatz for how networks learn representations
- FACT: Fine-grained Across-variable Convolution for Multivariate Time Series Forecasting
- Factuality Matters: When Image Generation and Editing Meet Structured Visuals
- Fair Classification by Direct Intervention on Operating Characteristics
- Fair Conformal Classification via Learning Representation-Based Groups
- Fair Decision Utility in Human-AI Collaboration: Interpretable Confidence Adjustment for Humans with Cognitive Disparities
- Fair Graph Machine Learning under Adversarial Missingness Processes
- Fair in Mind, Fair in Action? A Synchronous Benchmark for Understanding and Generation in UMLLMs
- Fairness-Aware Multi-view Evidential Learning with Adaptive Prior
- Fairness via Independence: A General Regularization Framework for Machine Learning
- Fair Policy Aggregation from Standard Policy Optimization
- FaithCoT-Bench: Benchmarking Instance-Level Faithfulness of Chain-of-Thought Reasoning
- Faithful Bi-Directional Model Steering via Distribution Matching and Distributed Interchange Interventions
- Faithfulness Under the Distribution: A New Look at Attribution Evaluation
- FakeXplain: AI-Generated Images Detection via Human-Aligned Grounded Reasoning
- Falcon: Fast Proximal Linearization of Normalized Cuts for Unsupervised Image Segmentation
- FALCON: Few-step Accurate Likelihoods for Continuous Flows
- FaLW: A Forgetting-aware Loss Reweighting for Long-tailed Unlearning
- FAME: $\underline{F}$ormal $\underline{A}$bstract $\underline{M}$inimal $\underline{E}$xplanation for neural networks
- Fantastic Pretraining Optimizers and Where to Find Them
- Fantastic Tractor-Dogs and How Not to Find Them With Open-Vocabulary Detectors
- FantasyWorld: Geometry-Consistent World Modeling via Unified Video and 3D Prediction
- FAPO: Flawed-Aware Policy Optimization for Efficient and Reliable Reasoning
- FARI: Robust One-Step Inversion for Watermarking in Diffusion Models
- FARTrack: Fast Autoregressive Visual Tracking with High Performance
- FASA: FREQUENCY-AWARE SPARSE ATTENTION
- FaSTA*: Fast-Slow Toolpath Agent with Subroutine Mining for Efficient Multi-turn Image Editing
- Fast and Interpretable Protein Substructure Alignment via Optimal Transport
- Fast and Stable Riemannian Metrics on SPD Manifolds via Cholesky Product Geometry
- FastAvatar: Towards Unified and Fast 3D Avatar Reconstruction with Large Gaussian Reconstruction Transformers
- Fastcar: Cache Attentive Replay for Fast Auto-Regressive Video Generation on the Edge
- Fast Convergence of Natural Gradient Descent for Over-parameterized Physics-Informed Neural Networks
- Fast Data Mixture Optimization via Gradient Descent
- FAST‑DIPS: Adjoint‑Free Analytic Steps and Hard‑Constrained Likelihood Correction for Diffusion‑Prior Inverse Problems
- Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding
- Fast-dLLM v2: Efficient Block-Diffusion LLM
- Faster Gradient Methods for Highly-smooth Stochastic Bilevel Optimization
- Faster Parameter-Free Regret Matching Algorithms
- FASTer: Toward Powerful and Efficient Autoregressive Vision–Language–Action Models with Learnable Action Tokenizer and Block-wise Decoding
- Faster Vision Transformers with Adaptive Patches
- Fast Escape, Slow Convergence: Learning Dynamics of Phase Retrieval under Power-Law Data
- Fast Estimation of Wasserstein Distances via Regression on Sliced Wasserstein Distances
- FastFlow: Accelerating The Generative Flow Matching Models with Bandit Inference
- Fast Frank–Wolfe Algorithms with Adaptive Bregman Step-Size for Weakly Convex Functions
- FastGHA: Generalized Few-Shot 3D Gaussian Head Avatars with Real-Time Animation
- FastGRPO: Accelerating Policy Optimization via Concurrency-aware Speculative Decoding and Online Draft Learning
- Fast Language Generation through Discrete Diffusion Divergence Instruct
- Fast Proteome-Scale Protein Interaction Retrieval via Residue-Level Factorization
- Fast training of accurate physics-informed neural networks without gradient descent
- FastVGGT: Fast Visual Geometry Transformer
- FastVMT: Eliminating Redundancy in Video Motion Transfer
- FATE: A Formal Benchmark Series for Frontier Algebra of Multiple Difficulty Levels
- Fathom-DeepResearch: Unlocking Long Horizon Information Retrieval and Synthesis for SLMs
- Feature compression is the root cause of adversarial fragility in neural networks
- Feature segregation by signed weights in artificial vision systems and biological models
- Features Emerge as Discrete States: The First Application of SAEs to 3D Representations
- FeDaL: Federated Dataset Learning for General Time Series Foundation Models
- FedDAG: Clustered Federated Learning via Global Data and Gradient Integration for Heterogeneous Environments
- Fed-Duet: Dual Expert-Orchestrated Framework for Continual Federated Vision-Language Learning
- Federated ADMM from Bayesian Duality
- Federated Graph-Level Clustering Network with Dual Knowledge Separation
- Federated Learning of Quantile Inference under Local Differential Privacy
- Federated Learning with Profile Mapping under Distribution Shifts and Drifts
- FedMC: Federated Manifold Calibration
- FedMuon: Federated Learning with Bias-corrected LMO-based Optimization
- FedOpenMatch: Towards Semi-Supervised Federated Learning in Open-Set Environments
- Feedback-driven recurrent quantum neural network universality
- Feed-forward Human Performance Capture via Progressive Canonical Space Updates
- FERD: Fairness-Enhanced Data-Free Adversarial Robustness Distillation
- FETAL-GAUGE: A BENCHMARK FOR ASSESSING VISION-LANGUAGE MODELS IN FETAL ULTRASOUND
- Fewer Battles, More Gain: An Information-Efficient Framework for Arena-based LLM Evaluation
- Fewer Weights, More Problems: A Practical Attack on LLM Pruning
- FFT-based Dynamic Subspace Selection for Low-Rank Adaptive Optimization of Large Language Models
- FideDiff: Efficient Diffusion Model for High-Fidelity Image Motion Deblurring
- FieryGS: In-the-Wild Fire Synthesis with Physics-Integrated Gaussian Splatting
- Figma2Code: Automating Multimodal Design to Code in the Wild
- FilMaster: Bridging Cinematic Principles and Generative AI for Automated Film Generation
- Financial fraud collusion among generative AI agents in social networks
- f-INE: A Hypothesis Testing Framework for Estimating Influence under Training Randomness
- Fine-Grained Activation Steering: Steering Less, Achieving More
- Fine-Grained Class-Conditional Distribution Balancing for Debiased Learning
- Fine-Grained Iterative Adversarial Attacks with Limited Computation Budget
- Fine-Grained Privacy Extraction from Retrieval-Augmented Generation Systems by Exploiting Knowledge Asymmetry
- FineNib: A Query Synthesizer For Static Analysis of Security Vulnerabilities
- Fine-R1: Make Multi-modal LLMs Excel in Fine-Grained Visual Recognition by Chain-of-Thought Reasoning
- Fine-tuning Behavioral Cloning Policies with Preference‑Based Reinforcement Learning
- Fine-Tuning Diffusion Models via Intermediate Distribution Shaping
- Fine-tuning Done Right in Model Editing
- Fine-tuning Quantized Neural Networks with Zeroth-order Optimization
- Fingerprinting Deep Neural Networks for Ownership Protection: An Analytical Approach
- FingerTip 20K: A Benchmark for Proactive and Personalized Mobile LLM Agents
- Finite-Time Analysis of Actor-Critic Methods with Deep Neural Network Approximation
- Finite-Time Convergence Analysis of ODE-based Generative Models for Stochastic Interpolants
- FinSearchComp: Towards a Realistic, Expert-Level Evaluation of Financial Search and Reasoning
- FIRE: Frobenius-Isometry Reinitialization for Balancing the Stability–Plasticity Tradeoff
- First is Not Really Better Than Last: Evaluating Layer Choice and Aggregation Strategies in Language Model Data Influence Estimation
- Fisher-Rao Sensitivity for Out-of-Distribution Detection in Deep Neural Networks
- Fixing the Broken Compass: Diagnosing and Improving Inference-Time Reward Modeling
- FLARE: Fully Integration of Vision-Language Representations for Deep Cross-Modal Understanding
- FlashDLM: Accelerating Diffusion Language Model Inference via Efficient KV Caching and Guided Diffusion
- Flash-Mono: Feed-Forward Accelerated Gaussian Splatting Monocular SLAM
- FlashRNN: Unlocking Parallel Training of Nonlinear RNNs for Large Language Models
- Flash-Searcher: Fast and Effective Web Agents via DAG-Based Parallel Execution
- FlashVID: Efficient Video Large Language Models via Training-free Tree-based Spatiotemporal Token Merging
- FlashWorld: High-quality 3D Scene Generation within Seconds
- Flatness Guided Test-Time Adaptation for Vision-Language Models
- Flatter Tokens are More Valuable for Speculative Draft Model Training
- Flattery, Fluff, and Fog: Diagnosing and Mitigating Idiosyncratic Biases in Preference Models
- FlexHiNM-GP: Flexible Hierarchical Pruning via Region Allocation and Channel Permutation
- FlexiCodec: A Dynamic Neural Audio Codec for Low Frame Rates
- FlexLoRA: Entropy-Guided Flexible Low-Rank Adaptation
- FlexProtein: Joint Sequence and Structure Pretraining for Protein Modeling
- Flipping the Dialogue: Training and Evaluating User Language Models
- FLoC: Facility Location-Based Efficient Visual Token Compression for Long Video Understanding
- Flock: A Knowledge Graph Foundation Model via Learning on Random Walks
- floq: Training Critics via Flow-Matching for Scaling Compute in Value-Based RL
- FLoRG: Federated Fine-tuning with Low-rank Gram Matrices and Procrustes Alignment
- Flow2GAN: Hybrid Flow Matching and GAN with Multi-Resolution Network for One-/Two-step High-Fidelity Audio Generation
- Flow Actor-Critic for Offline Reinforcement Learning
- FlowAD: Ego-Scene Interactive Modeling for Autonomous Driving
- FlowAlign: Trajectory-Regularized, Inversion-Free Flow-based Image Editing
- Flow Along the $K$-Amplitude for Generative Modeling
- Flow Autoencoders are Effective Protein Tokenizers
- Flow-based Conformal Prediction for Multi-dimensional Time Series
- Flow-Based Single-Step Completion for Efficient and Expressive Policy Learning
- FlowBind: Efficient Any-to-Any Generation with Bidirectional Flows
- Flow Caching for Autoregressive Video Generation
- FlowCast: Advancing Precipitation Nowcasting with Conditional Flow Matching
- FlowCast: Trajectory Forecasting for Scalable Zero-Cost Speculative Flow Matching
- Flow-Disentangled Feature Importance
- FLOWER: A Flow-Matching Solver for Inverse Problems
- Flow Expansion via Verifier-Constrained Noised State Space Exploration
- FlowGen: Synthesizing Diverse Flowcharts to Enhance and Benchmark MLLM Reasoning
- Flowing Through States: Neural ODE Regularization for Reinforcement Learning
- Flow Map Learning via Games
- Flow Matching Policy Gradients
- Flow Matching with Injected Noise for Offline-to-Online Reinforcement Learning
- Flow Matching with Semidiscrete Couplings
- FlowNIB: An Information Bottleneck Analysis of Bidirectional vs. Unidirectional Language Models
- Flow of Spans: Generalizing Language Models to Dynamic Span-Vocabulary via GFlowNets
- FlowRL: Matching Reward Distributions for LLM Reasoning
- FlowSearcher: Synthesizing Memory-Guided Agentic Workflows for Web Information Seeking
- Flow Straight and Fast in Hilbert Space: Functional Rectified Flow
- FlowSymm: Physics–Aware, Symmetry–Preserving Graph Attention for Network Flow Completion
- Fluent Alignment with Disfluent Judges: Post-training for lower-resource languages
- FLUX-Reason-6M & PRISM-Bench: A Million-Scale Text-to-Image Reasoning Dataset and Comprehensive Benchmark
- Fly-CL: A Fly-Inspired Framework for Enhancing Efficient Decorrelation and Reduced Training Time in Pre-trained Model-based Continual Representation Learning
- FlyPrompt: Brain-Inspired Random-Expanded Routing with Temporal-Ensemble Experts for General Continual Learning
- FM4NPP: A Scaling Foundation Model for Nuclear and Particle Physics
- FMIP: Joint Continuous-Integer Flow For Mixed-Integer Linear Programming
- FOCUS: Efficient Keyframe Selection for Long Video Understanding
- Following the Navigation: Enhancing Small Language Models Contextual Reasoning with LLM Guidance
- FoNE: Precise Single-Token Number Embeddings via Fourier Features
- Fore-Mamba3D: Mamba-based Foreground-Enhanced Encoding for 3D Object Detection
- Foresight Diffusion: Improving Sampling Consistency in Predictive Diffusion Models
- Forest-Based Graph Learning for Semi-Supervised Node Classification
- ForestPersons: A Large-Scale Dataset for Under-Canopy Missing Person Detection
- Forge: Compiling a Unified Abstraction into Scalable Kernels for Linear Attention
- Forget Forgetting: Continual Learning in a World of Abundant Memory
- Forget Many, Forget Right: Scalable and Precise Concept Unlearning in Diffusion Models
- Formalising Human-in-the-Loop: Computational Reductions, Failure Modes, and Legal-Moral Responsibility
- FormalML: A Benchmark for Evaluating Formal Subgoal Completion in Machine Learning Theory
- Forward-Learned Discrete Diffusion: Learning how to noise to denoise faster
- Fostering Video Reasoning via Next-Event Prediction
- Foundational Automatic Evaluators: Scaling Multi-Task Generative Evaluator Training for Reasoning-Centric Domains
- Foundation Models for Causal Inference via Prior-Data Fitted Networks
- Foundation Visual Encoders Are Secretly Few-Shot Anomaly Detectors
- FRABench and UFEval: Unified Fine-grained Evaluation with Task and Aspect Generalization
- Fractional-Order Spiking Neural Network
- Fracture-GS: Dynamic Fracture Simulation with Physics-Integrated Gaussian Splatting
- FragFM: Hierarchical Framework for Efficient Molecule Generation via Fragment-Level Discrete Flow Matching
- Frame Guidance: Training-Free Guidance for Frame-Level Control in Video Diffusion Model
- FrameThinker: Learning to Think with Long Videos via Multi-Turn Frame Spotlighting
- Frayed RoPE and Long Inputs: A Geometric Perspective
- FREAK: A Fine-grained Hallucination Evaluation Benchmark for Advanced MLLMs
- FreeAdapt: Unleashing Diffusion Priors for Ultra-High-Definition Image Restoration
- Free Energy Mixer
- FreeKV: Boosting KV Cache Retrieval for Efficient LLM Inference
- Free Point-wise Anomaly Detection via Fold-bifurcation
- FreeViS: Training-free Video Stylization with Inconsistent References
- FreqKV: Key-Value Compression in Frequency Domain for Context Window Extension
- Frequency-aware Dynamic Gaussian Splatting
- Frequency-Balanced Retinal Representation Learning with Mutual Information Regularization
- Frequency-Domain Better than Time-Domain for Causal Structure Recovery in Dynamical Systems on Networks
- Fresh in memory: Training-order recency is linearly encoded in language model activations
- FRIEDA: Benchmarking Multi-Step Cartographic Reasoning in Vision-Language Models
- From Abstract to Contextual: What LLMs Still Cannot Do in Mathematics
- From Assistant to Independent Developer — Are GPTs Ready for Software Development?
- From Assumptions to Actions: Turning LLM Reasoning into Uncertainty-Aware Planning for Embodied Agents
- From atom to space: A region-based readout function for spatial properties of materials
- From Broad Exploration to Stable Synthesis: Entropy-Guided Optimization for Autoregressive Image Generation
- From Cheap Geometry to Expensive Physics: Elevating Neural Operators via Latent Shape Pretraining
- From Collapse to Control: Understanding and Extending Context Length in Emerging Hybrid Models via Universal Position Interpolation
- From Concepts to Components: Concept-Agnostic Attention Module Discovery in Transformers
- From Conversation to Query Execution: Benchmarking User and Tool Interactions for EHR Database Agents
- From Curiosity to Caution: Mitigating Reward Hacking for Best-of-$N$ with Pessimism
- From EduVisBench to EduVisAgent: A Benchmark and Multi-Agent Framework for Reasoning-Driven Pedagogical Visualization
- From Embedding to Control: Representations for Stochastic Multi-Object Systems
- From Evaluation to Defense: Advancing Safety in Video Large Language Models
- From Fields to Random Trees
- From Five Dimensions to Many: Large Language Models as Precise and Interpretable Psychological Profilers
- From f(x) and g(x) to f(g(x)): LLMs Learn New Skills in RL by Composing Old Ones
- From Gradient Volume to Shapley Fairness: Towards Fair Multi-Task Learning
- From Human Cognition to AI Reasoning: Models, Methods, and Applications
- From Language to Locomotion: Retargeting-free Humanoid Control via Motion Latent Guidance
- From Large to Small: Transferring CUDA Optimization Expertise via Reasoning Graph
- From Markov to Laplace: How Mamba In-Context Learns Markov Chains
- From Medical Records to Diagnostic Dialogues: A Clinical-Grounded Approach and Dataset for Psychiatric Comorbidity
- From movement to cognitive maps: recurrent neural networks reveal how locomotor development shapes hippocampal spatial coding
- From Narrow to Panoramic Vision: Attention-Guided Cold-Start Reshapes Multimodal Reasoning
- From Natural Alignment to Conditional Controllability in Multimodal Dialogue
- From Neural Networks to Logical Theories: The Correspondence between Fibring Modal Logics and Fibring Neural Networks
- From Observations to Events: Event-Aware World Models for Reinforcement Learning
- From Parameters to Behaviors: Unsupervised Compression of the Policy Space
- From Pixels to Semantics: Unified Facial Action Representation Learning for Micro-Expression Analysis
- From Pixels to Words -- Towards Native Vision-Language Primitives at Scale
- From Prediction to Perfection: Introducing Refinement to Autoregressive Image Generation
- From Predictors to Samplers via the Training Trajectory
- From Reproduction to Replication: Evaluating Research Agents with Progressive Code Masking
- From Samples to Scenarios: A New Paradigm for Probabilistic Forecasting
- From Seeing to Doing: Bridging Reasoning and Decision for Robotic Manipulation
- From Seeing to Experiencing: Scaling Navigation Foundation Models with Reinforcement Learning
- From Sequential to Parallel: Reformulating Dynamic Programming as GPU Kernels for Large-Scale Stochastic Combinatorial Optimization
- From Single to Multi-Granularity: Toward Long-Term Memory Association and Selection of Conversational Agents
- From Sorting Algorithms to Scalable Kernels: Bayesian Optimization in High-Dimensional Permutation Spaces
- From Sparse to Dense: Spatio-Temporal Fusion for Multi-View 3D Human Pose Estimation with DenseWarper
- From Spatial to Actions: Grounding Vision-Language-Action Model in Spatial Foundation Priors
- From Static Benchmarks to Dynamic Protocol: Agent-Centric Text Anomaly Detection for Evaluating LLM Reasoning
- From ``Sure" to ``Sorry": Detecting Jailbreak in Large Vision Language Model via JailNeurons
- From Text to Talk: Audio-Language Model Needs Non-Autoregressive Joint Training
- From Ticks to Flows: Dynamics of Neural Reinforcement Learning in Continuous Environments
- From Tokens to Nodes: Semantic-Guided Motion Control for Dynamic 3D Gaussian Splatting
- From Tokens to Thoughts: How LLMs and Humans Trade Compression for Meaning
- From Utterance to Vividity: Training Expressive Subtitle Translation LLM via Adaptive Local Preference Optimization
- From Verifiable Dot to Reward Chain: Harnessing Verifiable Reference-based Rewards for Reinforcement Learning of Open-ended Generation
- From Vicious to Virtuous Cycles: Synergistic Representation Learning for Unsupervised Video Object-Centric Learning
- From What to Why: A Multi-Agent System for Evidence-based Chemical Reaction Condition Reasoning
- FrontierCO: Real-World and Large-Scale Evaluation of Machine Learning Solvers for Combinatorial Optimization
- Front-Loading Reasoning: The Synergy between Pretraining and Post-Training Data
- FROST: Filtering Reasoning Outliers with Attention for Efficient Reasoning
- Frozen Policy Iteration: Computationally Efficient RL under Linear $Q^{\pi}$ Realizability for Deterministic Dynamics
- Frozen Priors, Fluid Forecasts: Prequential Uncertainty for Low-Data Deployment with Pretrained Generative Models
- FrugalRAG: Less is More in RL Finetuning for Multi-hop Question Answering
- Frustratingly Simple Retrieval Improves Challenging, Reasoning-Intensive Benchmarks
- FSA: An Alternative Efficient Implementation of Native Sparse Attention Kernel
- FSD-CAP: Fractional Subgraph Diffusion with Class-Aware Propagation for Graph Feature Imputation
- FS-DFM: Fast and Accurate Long Text Generation with Few-Step Diffusion Language Models
- FS-KAN: Permutation Equivariant Kolmogorov-Arnold Networks via Function Sharing
- FSOD-VFM: Few-Shot Object Detection with Vision Foundation Models and Graph Diffusion
- FSPO: Few-Shot Optimization of Synthetic Preferences Effectively Personalizes to Real Users
- Full-Graph vs. Mini-Batch Training: Comprehensive Analysis from a Batch Size and Fan-Out Size Perspective
- FullPart: Generating each 3D Part at Full Resolution
- Functional MRI Time Series Generation via Wavelet-Based Image Transform and Spectral Flow Matching for Brain Disorder Identification
- Function Induction and Task Generalization: An Interpretability Study with Off-by-One Addition
- Fused-Planes: Why Train a Thousand Tri-Planes When You Can Share?
- Fusing Pixels and Genes: Spatially-Aware Learning in Computational Pathology
- FutureFill: Fast Generation from Convolutional Sequence Models
- FutureMind: Equipping Small Language Models with Strategic Thinking-Pattern Priors via Adaptive Knowledge Distillation
- FutureX: An Advanced Live Benchmark for LLM Agents in Future Prediction
- FZOO: Fast Zeroth-Order Optimizer for Fine‑Tuning Large Language Models towards Adam‑Scale Speed
- G4Splat: Geometry-Guided Gaussian Splatting with Generative Prior
- Gaia2: Benchmarking LLM Agents on Dynamic and Asynchronous Environments
- GaitSnippet: Gait Recognition Beyond Unordered Sets and Ordered Sequences
- GALAX: Graph-Augmented Language Model for Explainable Reinforcement-Guided Subgraph Reasoning in Precision Medicine
- GAP: Gradient Adjustment with Phase-guidance for Robust Vision-Proprioception Policies in Robotic Manipulation
- GAPrune: Gradient-Alignment Pruning for Domain-Aware Embeddings
- GAR: Generative Adversarial Reinforcement Learning for Formal Theorem Proving
- GARLIC: Graph Attention-based Relational Learning of Multivariate Time Series in Intensive Care
- GarmentGPT: Compositional Garment Pattern Generation via Discrete Latent Tokenization
- GAS: Enhancing Reward-Cost Balance of Generative Model-assisted Offline Safe RL
- GAS: Improving Discretization of Diffusion ODEs via Generalized Adversarial Solver
- Gauge Flow Matching: Efficient Constrained Generative Modeling over General Convex Set and Beyond
- Gauge-invariant representation holonomy
- Gaussian certified unlearning in high dimensions: A hypothesis testing approach
- GaussianFusion: Unified 3D Gaussian Representation for Multi-Modal Fusion Perception
- GAVEL: Towards Rule-Based Safety through Activation Monitoring
- GCGNet: Graph-Consistent Generative Network for Time Series Forecasting with Exogenous Variables
- GDGB: A Benchmark for Generative Dynamic Text-Attributed Graph Learning
- GDPval: Evaluating AI Model Performance on Real-World Economically Valuable Tasks
- GDR-learners: Orthogonal Learning of Generative Models for Potential Outcomes
- Gelato: Graph Edit Distance via Autoregressive Neural Combinatorial Optimization
- GEM: A Gym for Generalist LLMs
- gen2seg: Generative Models Enable Generalizable Instance Segmentation
- GenCape: Structure-Inductive Generative Modeling for Category-Agnostic Pose Estimation
- GenCompositor: Generative Video Compositing with Diffusion Transformer
- GenCP: Towards Generative Modeling Paradigm of Coupled physics with Application to Fluid-Structure Interaction
- Gen-DFL: Decision-Focused Generative Learning for Robust Decision Making
- GenDR: Lighten Generative Detail Restoration
- General Exploratory Bonus for Optimistic Exploration in RLHF
- Generalised Flow Maps for Few-Step Generative Modelling on Riemannian Manifolds
- Generalizable Coarse-to-Fine Robot Manipulation via Language-Aligned 3D Keypoints
- Generalizable End-to-End Tool-Use RL with Synthetic CodeGym
- Generalizable Heuristic Generation Through LLMs with Meta-Optimization
- Generalization Below the Edge of Stability: The Role of Data Geometry
- Generalization of Diffusion Models Arises with a Balanced Representation Space
- Generalization of RLVR Using Causal Reasoning as a Testbed
- Generalized Parallel Scaling with Interdependent Generations
- Generalized Spherical Neural Operators: Green’s Function Formulation
- Generalizing Linear Autoencoder Recommenders with Decoupled Expected Quadratic Loss
- General search techniques without common knowledge for imperfect-information games, and application to superhuman Fog of War chess
- Generate Any Scene: Scene Graph Driven Data Synthesis for Visual Generation Training
- Generating Directed Graphs with Dual Attention and Asymmetric Encoding
- Generating metamers of human scene understanding
- Generation then Reconstruction: Accelerating Masked Autoregressive Models via Two-Stage Sampling
- Generative Adversarial Post-Training Mitigates Reward Hacking in Live Human-AI Music Interaction
- Generative Adversarial Reasoner: Enhancing LLM Reasoning with Adversarial Reinforcement Learning
- Generative AI in Genomics (Gen^2): Barriers and Frontiers
- Generative Bayesian Optimization: Generative Models as Acquisition Functions
- Generative Blocks World: Moving Things Around in Pictures
- Generative Diffusion Prior Distillation for Long-Context Knowledge Transfer
- Generative Human Geometry Distribution
- Generative Modeling from Black-Box Corruptions via Self-Consistent Stochastic Interpolants
- Generative Universal Verifier as Multimodal Meta-Reasoner
- Generative Value Conflicts Reveal LLM Priorities
- Generative View Stitching
- Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation
- Genomic Foundationless Models: Pretraining Does Not Promise Performance
- GenSR: Symbolic regression based on equation generative space
- GeoBench: Rethinking Multimodal Geometric Problem-Solving via Hierarchical Evaluation
- GeoDiv: Framework for Measuring Geographical Diversity in Text-to-Image Models
- GeoFAR: Geography-Informed Frequency-Aware Super-Resolution for Climate Data
- GeoGramBench: Benchmarking the Geometric Program Reasoning in Modern LLMs
- Geometric Autoencoder Priors for Bayesian Inversion: Learn First Observe Later
- Geometric Constraints for Small Language Models to Understand and Expand Scientific Taxonomies
- Geometric Graph Neural Diffusion for Stable Molecular Dynamics
- Geometric Image Editing via Effects-Sensitive In-Context Inpainting with Diffusion Transformers
- Geometric-Mean Policy Optimization
- Geometry-aware 4D Video Generation for Robot Manipulation
- Geometry-aware Policy Imitation
- Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling
- Geometry-grounded Representation Learning and Generative Modeling
- GEOMETRY OF UNCERTAINTY: LEARNING METRIC SPACES FOR MULTIMODAL STATE ESTIMATION IN RL
- GeomMotif: A Benchmark for Arbitrary Geometric Preservation in Protein Generation
- GeoPurify: A Data-Efficient Geometric Distillation Framework for Open-Vocabulary 3D Segmentation
- GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning
- GEPO: Group Expectation Policy Optimization for Stable Heterogeneous Reinforcement Learning
- Get RICH or Die Scaling: Profitably Trading Inference Compute for Robustness
- Getting Your LLMs Ready for Reinforcement Learning with Lightweight SFT
- GGBall: Graph Generative Model on Poincaré Ball
- GhostEI-Bench: Do Mobile Agent Resilience to Environmental Injection in Dynamic On-Device Environments?
- GHOST: Hallucination-Inducing Image Generation for Multimodal LLMs
- Ghost in the Cloud: Your Geo-Distributed Large Language Models Training is Easily Manipulated
- GIQ: Benchmarking 3D Geometric Reasoning of Vision Foundation Models with Simulated and Real Polyhedra
- GIR-Bench: Versatile Benchmark for Generating Images with Reasoning
- Gistify: Codebase-Level Understanding via Runtime Execution
- GIT-BO: High-Dimensional Bayesian Optimization with Tabular Foundation Models
- Glance and Focus Reinforcement for Pan-cancer Screening
- Glance for Context: Learning When to Leverage LLMs for Node-Aware GNN-LLM Fusion
- GLASS Flows: Efficient Inference for Reward Alignment of Flow and Diffusion Models
- Global and Local Topology-Aware Graph Generation via Dual Conditioning Diffusion
- Globally aware optimization with resurgence
- Global-Recent Semantic Reasoning on Dynamic Text-Attributed Graphs with Large Language Models
- Global Resolution: Optimal Multi-Draft Speculative Sampling via Convex Optimization
- GlobeDiff: State Diffusion Process for Partial Observability in Multi-Agent System
- GlowQ: Group-Shared LOw-Rank Approximation for Quantized LLMs
- gLSTM: Mitigating Over-Squashing by Increasing Storage Capacity
- G-Merging: Graph Models Merging for Parameter-Efficient Multi-Task Knowledge Consolidation
- GmNet: Revisiting Gating Mechanisms From A Frequency View
- GneissWeb: Preparing High Quality Data for LLMs at Scale
- GNN-as-Judge: Unleashing the Power of LLMs for Graph Few-shot Semi-supervised Learning with GNN Feedback
- GNN Explanations that do not Explain and How to find Them
- Goal-Aware Identification and Rectification of Misinformation in Multi-Agent Systems
- GoalRank: Group-Relative Optimization for a Large Ranking Model
- Goal Reaching with Eikonal-Constrained Hierarchical Quasimetric Reinforcement Learning
- Go Beyond Earth: Understanding Human Actions and Scenes in Microgravity Environments
- Go-Browse: Training Web Agents with Structured Exploration
- Goedel-Prover-V2: Scaling Formal Theorem Proving with Scaffolded Data Synthesis and Self-Correction
- Gogo: Group-wise granularity-ordered codec for stable and efficient speech generation
- GoldenStart: Q-Guided Priors and Entropy Control for Distilling Flow Policies
- GOLDILOCS: GENERAL OBJECT-LEVEL DETECTION AND LABELING OF CHANGES IN SCENES
- Good allocations from bad estimates
- GOOD: Geometry-guided Out-of-Distribution Modeling for Open-set Test-time Adaptation in Point Cloud Semantic Segmentation
- GoR: A Unified and Extensible Generative Framework for Ordinal Regression
- GOT-Edit: Geometry-Aware Generic Object Tracking via Online Model Editing
- GoT-R1: Unleashing Reasoning Capability of Autoregressive Visual Generation with Reinforcement Learning
- GPG: A Simple and Strong Reinforcement Learning Baseline for Model Reasoning
- GPS: Directed Acyclic Graph guided Proactive Information Seeking in Large Language Models
- GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models
- GPTailor: Large Language Model Pruning Through Layer Cutting and Stitching
- GRACE: A Language Model Framework for Explainable Inverse Reinforcement Learning
- GRACE: Generative Representation Learning via Contrastive Policy Optimization
- GRADIEND: Feature Learning within Neural Networks Exemplified through Biases
- Gradient-Aligned Calibration for Post-Training Quantization of Diffusion Models
- Gradient-Based Diversity Optimization with Differentiable Top-$k$ Objective
- Gradient-Based Program Synthesis with Neurally Interpreted Languages
- Gradient Descent Dynamics of Rank-One Matrix Denoising
- Gradient Descent with Large Step Sizes: Chaos and Fractal Convergence Region
- Gradient-Direction-Aware Density Control for 3D Gaussian Splatting
- Gradient Intrinsic Dimensionality Alignment:Narrowing The Gap Between Low-Rank Adaptation and Full Fine-Tuning
- Gradient-Normalized Smoothness for Optimization with Approximate Hessians
- Gradient-Sign Masking for Task Vector Transport Across Pre-Trained Models
- GradPCA: Leveraging NTK Alignment for Reliable Out-of-Distribution Detection
- GradPruner: Gradient-guided Layer Pruning Enabling Efficient Fine-Tuning and Inference for LLMs
- GradShield: Alignment Preserving Finetuning
- GRAM-DTI: Adaptive Multimodal Representation Learning for Drug–Target Interaction Prediction
- GranViT: A Fine-Grained Vision Model With Autoregressive Perception For MLLMs
- Graph-based Nearest Neighbors with Dynamic Updates via Random Walk-Based Analysis
- Graph Diffusion Transformers are In-Context Molecular Designers
- Graph homophily booster: Rethinking the role of discrete features on heterophilic graphs
- Graph Mixing Additive Networks
- Graph-of-Agents: A Graph-based Framework for Multi-Agent LLM Collaboration
- GraphOmni: A Comprehensive and Extensible Benchmark Framework for Large Language Models on Graph-theoretic Tasks
- Graphon Cross-Validation: Assessing Models on Network Data
- GraphPlanner: Graph-Based Agentic Routing for LLMs
- Graph Random Features for Scalable Gaussian Processes
- Graph Representational Learning: When Does More Expressivity Hurt Generalization?
- GraphShield: Graph-Theoretic Modeling of Network-Level Dynamics for Robust Jailbreak Detection
- Graph Signal Processing Meets Mamba2: Adaptive Filter Bank via Delta Modulation
- Graph-Theoretic Intrinsic Reward: Guiding RL with Effective Resistance
- Graph Tokenization for Bridging Graphs and Transformers
- GraphUniverse: Enabling Systematic Evaluation of Inductive Generalization
- Grasp Any Region: Prompting MLLM to Understand the Dense World
- G-reasoner: Foundation Models for Unified Reasoning over Graph-structured Knowledge
- Greater than the Sum of Its Parts: Building Substructure into Protein Encoding Models
- GRL-SNAM: Geometric Reinforcement Learning with Differential Hamiltonians for Navigation and Mapping in Unknown Environments
- Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test
- GRO-RAG: Gradient-aware Re-rank Optimization for Multi-source Retrieval-Augmented Generation
- Grounded Test-Time Adaptation for LLM Agents
- Grounding and Enhancing Informativeness and Utility in Dataset Distillation
- Grounding Computer Use Agents on Human Demonstrations
- Grounding Generative Planners in Verifiable Logic: A Hybrid Architecture for Trustworthy Embodied AI
- Grounding-IQA: Grounding Multimodal Language Model for Image Quality Assessment
- Grounding or Guessing? Visual Signals for Detecting Hallucinations in Sign Language Translation
- Ground Slow, Move Fast: A Dual-System Foundation Model for Generalizable Vision-Language Navigation
- Group Critical-token Policy Optimization for Autoregressive Image Generation
- Grouping Nodes with known Value Differences: A lossless UCT-based Abstraction Algorithm
- Group-Normalized Implicit Value Optimization for Language Models
- Group-Relative REINFORCE Is Secretly an Off-Policy Algorithm: Demystifying Some Myths About GRPO and Its Friends
- Group Representational Position Embedding
- Group Verification-based Policy Optimization for Interactive Coding Agents
- GTA1: GUI Test-time Scaling Agent
- GTM: A General Time-series Model for Enhanced Representation Learning of Time-Series data
- GTool: Graph Enhanced Tool Planning with Large Language Model
- GTR-Bench: Evaluating Geo-Temporal Reasoning in Vision-Language Models
- GT-Space: Enhancing Heterogeneous Collaborative Perception with Ground Truth Feature Space
- Guaranteed Simply Connected Mesh Reconstruction from an Unorganized Point Cloud
- GuardAlign: Robust Safety Alignment in Multimodal Large Language Models
- Guidance Matters: Rethinking the Evaluation Pitfall for Text-to-Image Generation
- Guidance Watermarking for Diffusion Models
- GuidedBench: Measuring and Mitigating the Evaluation Discrepancies of In-the-wild LLM Jailbreak Methods
- Guided Flow Policy: Learning from High-Value Actions in Offline Reinforcement Learning
- Guided Policy Optimization under Partial Observability
- Guided Query Refinement: Multimodal Hybrid Retrieval with Test-Time Optimization
- GuidedSampling: Steering LLMs Towards Diverse Candidate Solutions at Inference-Time
- Guided Speculative Inference for Efficient Test-Time Alignment of LLMs
- GUIDE: Gated Uncertainty-Informed Disentangled Experts for Long-tailed Recognition
- Guiding Mixture-of-Experts with Temporal Multimodal Interactions
- GuirlVG: Incentivize GUI Visual Grounding via Empirical Exploration on Reinforcement Learning
- GUI-Shift: Enhancing VLM-Based GUI Agents through Self-supervised Reinforcement Learning
- Gumbel Distillation for Parallel Text Generation
- H$^3$DP: Triply‑Hierarchical Diffusion Policy for Visuomotor Learning
- H$^3$GNNs: Harmonizing Heterophily and Homophily in GNNs via Self-Supervised Node Encoding
- H2OFlow: Grounding Human-Object Affordances with 3D Generative Models and Dense Diffused Flows
- HackWorld: Evaluating Computer-Use Agents on Exploiting Web Application Vulnerabilities
- Half-order Fine-Tuning for Diffusion Model: A Recursive Likelihood Ratio Optimizer
- Hallucination-aware Intermediate Representation Editing in Large Vision-Lanugage Models
- Hallucination Begins Where Saliency Drops
- Hallucination Reduction with CASAL: Contrastive Activation Steering for Amortized Learning
- HalluGuard: Demystifying Data-Driven and Reasoning-Driven Hallucinations in LLMs
- HAMLET: Hyperadaptive Agent-based Modeling for Live Embodied Theatrics
- HAMLET: Switch Your Vision-Language-Action Model into a History-Aware Policy
- HardcoreLogic: Challenging Large Reasoning Models with Long-tail Logic Puzzle Games
- Harder Is Better: Boosting Mathematical Reasoning via Difficulty-Aware GRPO and Multi-Aspect Question Reformulation
- HARDTESTGEN: A High-Quality RL Verifier Generation Pipeline for LLM Algorithimic Coding
- Harmonized Cone for Feasible and Non-conflict Directions in Training Physics-Informed Neural Networks
- Harnessing Hyperbolic Geometry for Harmful Prompt Detection and Sanitization
- Harnessing Temporal Databases for Systematic Evaluation of Factual Time-Sensitive Question-Answering in LLMs
- HARP: Hallucination Detection via Reasoning Subspace Projection
- Harpoon: Generalised Manifold Guidance for Conditional Tabular Diffusion
- HATSolver: Learning Gröbner Bases with Hierarchical Attention Transformers
- HBO: Hierarchical Balancing Optimization for Fine-Tuning Large Language Models
- HDR-4DGS: High Dynamic Range 4D Gaussian Splatting from Alternating-exposure Monocular Videos
- HDR-NSFF: High Dynamic Range Neural Scene Flow Fields
- Healthcare Insurance Fraud Detection via Continual Fiedler Vector Graph Model
- HEAPr: Hessian-based Efficient Atomic Expert Pruning in Output Space
- Hedonic Neurons: A Mechanistic Mapping of Latent Coalitions in Transformer MLPs
- HEEGNet: Hyperbolic Embeddings for EEG
- HEIST: A Graph Foundation Model for Spatial Transcriptomics and Proteomics Data
- Helix: Evolutionary Reinforcement Learning for Open-Ended Scientific Problem Solving
- Helmsman: Autonomous Synthesis of Federated Learning Systems via Multi-Agent Collaboration
- Hessian-Enhanced Token Attribution (HETA): Interpreting Autoregressive LLMs
- Heterogeneous Agent Q-weighted Policy Optimization
- Heterogeneous Federated Fine-Tuning with Parallel One-Rank Adaptation
- Heterogeneous Front-Door Effects: Debiased Estimation with Quasi-Oracle Guarantees
- HeurekaBench: A Benchmarking Framework for AI Co-scientist
- HeuriGym: An Agentic Benchmark for LLM-Crafted Heuristics in Combinatorial Optimization
- Hey, That's My Model! Introducing Chain & Hash, An LLM Fingerprinting Technique
- HFSTI-Net: Hierarchical Frequency-spatial-temporal Interactions for Video Polyp Segmentation
- HGNet: Scalable Foundation Model for Automated Knowledge Graph Generation from Scientific Literature
- HiCache: A Plug-in Scaled-Hermite Upgrade for Taylor-Style Cache-then-Forecast Diffusion Acceleration
- Hidden Breakthroughs in Language Model Training
- HiddenEcho: Mitigating Noise Amplification in Differentially Private LLMs with Hidden-State Correction
- Hidden Patterns in Chain-of-Thought Reasoning
- HiDivDrop: Vision Token Reduction in MLLMs via Late Injection and Differentiable Top-K
- Hierarchical Concept-based Interpretable Models
- Hierarchical Encoding Tree with Modality Mixup for Cross-modal Hashing
- Hierarchical Entity-centric Reinforcement Learning with Factored Subgoal Diffusion
- Hierarchical Multi-Scale Molecular Conformer Generation with Structural Awareness
- Hierarchical Multi-Stage Recovery Framework for Kronecker Compressed Sensing
- Hierarchical Prototype Learning for Semantic Segmentation
- Hierarchical Semantic-Acoustic Modeling via Semi-Discrete Residual Representations for Expressive End-to-End Speech Synthesis
- Hierarchical Value-Decomposed Offline Reinforcement Learning for Whole-Body Control
- Hierarchy Decoding: A Training-free Parallel Decoding Strategy for Diffusion Large Language Models
- Hierarchy-of-Groups Policy Optimization for Long-Horizon Agentic Tasks
- HierLoc: Hyperbolic Entity Embeddings for Hierarchical Visual Geolocation
- HiFo-Prompt: Prompting with Hindsight and Foresight for LLM-based Automatic Heuristic Design
- High Accuracy, Less Talk (HALT): Reliable LLMs through Capability-Aligned Finetuning
- High-Dimensional Analysis of Single-Layer Attention for Sparse-Token Classification
- High-dimensional Analysis of Synthetic Data Selection
- High-dimensional limit theorems for SGD: Momentum and Adaptive Step-sizes
- High-dimensional Mean-Field Games by Particle-based Flow Matching
- Highly Efficient and Effective LLMs with Multi-Boolean Architectures
- High Probability Bounds for Non-Convex Stochastic Optimization with Momentum
- High-Probability Bounds for the Last Iterate of Clipped SGD
- HiGS: History-Guided Sampling for Plug-and-Play Enhancement of Diffusion Models
- Hilbert-Guided Sparse Local Attention
- Hilbert: Recursively Building Formal Proofs with Informal Reasoning
- HiMAE: Hierarchical Masked Autoencoders Discover Resolution-Specific Structure in Wearable Time Series
- Hinge Regression Tree: A Newton Method for Oblique Regression Tree Splitting
- HiPO: Self-Hint Policy Optimization for RLVR
- Hippoformer: Integrating Hippocampus-inspired Spatial Memory with Transformers
- HippoTune: A Hippocampal Associative Loop–Inspired Fine-Tuning Method for Continual Learning
- HiPRAG: Hierarchical Process Rewards for Efficient Agentic Retrieval Augmented Generation
- Histopathology-Genomics Multi-modal Structural Representation Learning for Data-Efficient Precision Oncology
- HistoPrism: Unlocking Functional Pathway Analysis from Pan-Cancer Histology via Gene Expression Prediction
- HiTeA: Hierarchical Temporal Alignment for Training-Free Long-Video Temporal Grounding
- HiVid: LLM-Guided Video Saliency For Content-Aware VOD And Live Streaming
- HLD: Approximate Hierarchical Linguistic Distribution Modeling for LLM-Generated Text Detection
- h-MINT: Modeling Pocket-Ligand Binding with Hierarchical Molecular Interaction Network
- HOG-Diff: Higher-Order Guided Diffusion for Graph Generation
- Holdout-Loss-Based Data Selection for LLM Finetuning via In-Context Learning
- Holistic Agent Leaderboard: The Missing Infrastructure for AI Agent Evaluation
- HoloPart: Generative 3D Part Amodal Segmentation
- Homeostatic Adaptation of Optimal Population Codes under Metabolic Stress
- Horizon Imagination: Efficient On-Policy Training in Diffusion World Models
- Horseshoe Splatting: Handling Structural Sparsity for Uncertainty-Aware Gaussian-Splatting Radiance Field Rendering
- HOTA: Hamiltonian framework for Optimal Transport Advection
- Hot Fuzz: Temperature-Tunable Composition of Diffusion models with Fuzzy Logic
- Hot PATE: Private Aggregation of Distributions for Diverse Tasks
- Hourglass Persistence for Graphs, Simplices, and Cells
- Householder-Diagonalized Linear Attention (HDLA): Utilizing Enhanced Decay Mechanism for Efficient Sequence Modeling
- How Base Frequency Shapes RoPE: An Analytical Study of Frequency-Band Formation
- How Catastrophic is Your LLM? Certifying Risk in Conversation
- How Dark Patterns Manipulate Web Agents
- How does the optimizer implicitly bias the model merging loss landscape?
- How Do Medical MLLMs Fail? A Study on Visual Grounding in Medical Images
- How Do Transformers Learn to Associate Tokens: Gradient Leading Terms Bring Mechanistic Interpretability
- How Far Are LLMs from Professional Poker Players? Revisiting Game-Theoretic Reasoning with Agentic Tool Use
- How Far Can Unsupervised RLVR Scale LLM Training?
- How hard is learning to cut? Trade-offs and sample complexity
- How Learning Rate Decay Wastes Your Best Data in Curriculum-Based LLM Pretraining
- How Many Code and Test Cases Are Enough? Evaluating Test Cases Generation from a Binary-Matrix Perspective
- How Muon’s Spectral Design Benefits Generalization: A Study on Imbalanced Data
- How NOT to benchmark your SITE metric: Beyond Static Leaderboards and Towards Realistic Evaluation.
- How reinforcement learning after next-token prediction facilitates learning
- How Reliable is Language Model Micro-Benchmarking?
- How Stable is the Next Token? A Geometric View of LLM Prediction Stability
- How Text Quality Interventions Reshape Neural Scaling Laws for LLMs: Empirical Study
- How to Lose Inherent Counterfactuality in Reinforcement Learning
- How to Square Tensor Networks and Circuits Without Squaring Them
- How to train data-efficient LLMs
- How Transformers Learn Causal Structures In-Context: Explainable Mechanism Meets Theoretical Guarantee
- How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks
- HSG-12M: A Large-Scale Dataset of Spatial Multigraphs from the Energy Spectra of non-Hermitian Crystals
- HSIC Bottleneck for Cross-Generator and Domain-Incremental Synthetic Image Detection
- HSSBench: Benchmarking Humanities and Social Sciences Ability for Multimodal Large Language Models
- Hubble: a Model Suite to Advance the Study of LLM Memorization
- Human3R: Everyone Everywhere All at Once
- Human-AI Curation Synergy: Scaling Preference Data Curation via Human-Guided AI Feedback
- Human Behavior Atlas: Benchmarking Unified Psychological And Social Behavior Understanding
- Human-LLM Collaborative Feature Engineering for Tabular Data
- Human-MME: A Holistic Evaluation Benchmark for Human-Centric Multimodal Large Language Models
- Human-Object Interaction via Automatically Designed VLM-Guided Motion Policy
- Human or Machine? A Preliminary Turing Test for Speech-to-Speech Interaction
- HumanPCR: Probing MLLM Capabilities in Diverse Human-Centric Scenes
- Human Uncertainty-Aware Data Selection and Automatic Labeling in Visual Question Answering
- HUME: Measuring the Human-Model Performance Gap in Text Embedding Tasks
- HUMOF: Human Motion Forecasting in Interactive Social Scenes
- Huxley-G\"odel Machine: Human-Level Coding Agent Development by an Approximation of the Optimal Self-Improving Machine
- HWC-Loco: A Hierarchical Whole-Body Control Approach to Robust Humanoid Locomotion
- Hybrid Deep Searcher: Scalable Parallel and Sequential Search Reasoning
- Hybrid Reinforcement: when reward is sparse, better to be dense
- Hybrid Training for Vision-Language-Action Models
- Hyden: A Hybrid Dual-Path Encoder for Monocular Geometry of High-resolution Images
- HYPER: A Foundation Model for Inductive Link Prediction with Knowledge Hypergraphs
- Hyperbolic Aware Minimization: Implicit Bias for Sparsity
- Hyperparameter Trajectory Inference with Conditional Lagrangian Optimal Transport
- Hyper-SET: Designing Transformers via Hyperspherical Energy Minimization
- Hyperspherical Latents Improve Continuous-Token Autoregressive Generation
- Hystar: Hypernetwork-driven Style-adaptive Retrieval via Dynamic SVD Modulation
- I2Mole: Interaction-aware Invariant Molecular Learning For Generalizable Property Prediction
- IA2: Alignment with ICL Activations improves Supervised Fine-Tuning
- IAGA: Identity-Aware Gaussian Approximation for Efficient 3D Molecular Generation
- I Can't Believe It's Not Better: Where Large Language Models need to improve
- ICaRus: Identical Cache Reuse for Efficient Multi-Model Inference
- IC-Custom: Diverse Image Customization via In-Context Learning
- ICDiffAD: Implicit Conditioning Diffusion Model for Time Series Anomaly Detection
- IceCache: Memory-Efficient KV-cache Management for Long-Sequence LLMs
- Ice Cream Doesn’t Cause Drowning: Benchmarking LLMs Against Statistical Pitfalls in Causal Inference
- ICLR 2026 Workshop on AI with Recursive Self-Improvement
- ICLR 2026 Workshop on Memory for LLM-Based Agentic Systems (MemAgents)
- ICLR 2026 Workshop on Multimodal Intelligence: Next Token Prediction and Beyond
- ICPO: Provable and Practical In-Context Policy Optimization for Test-Time Scaling
- ICYM2I: The illusion of multimodal informativeness under missingness
- IDEAL: Data Equilibrium Adaptation for Multi-Capability Language Model Alignment
- Identifiability and recoverability in self-supervised models
- Identifiability Challenges in Sparse Linear Ordinary Differential Equations
- Identifying and Evaluating Inactive Heads in Pretrained LLMs
- Identifying Robust Neural Pathways: Few-Shot Adversarial Mask Tuning for Vision-Language Models
- Identity-Free Deferral For Unseen Experts
- IDER: IDEMPOTENT EXPERIENCE REPLAY FOR RELIABLE CONTINUAL LEARNING
- I-DRUID: Layout to image generation via instance-disentangled representation and unpaired data
- iFusion: Integrating Dynamic Interest Streams via Diffusion Model for Click-Through Rate Prediction
- IF-VidCap: Can Video Caption Models Follow Instructions?
- IGC-Net for conditional average potential outcome estimation over time
- IGGT: Instance-Grounded Geometry Transformer for Semantic 3D Reconstruction
- IGU-LoRA: Adaptive Rank Allocation via Integrated Gradients and Uncertainty-Aware Scoring
- iLLaVA: An Image is Worth Fewer Than 1/3 Input Tokens in Large Multimodal Models
- Image Can Bring Your Memory Back: A Novel Multi-Modal Guided Attack against Image Generation Model Unlearning
- ImageDoctor: Diagnosing Text-to-Image Generation via Grounded Image Reasoning
- Image Inpainting with Preference Alignment
- Image is All You Need: Towards Efficient and Effective Large Language Model-Based Recommender Systems
- ImagenWorld: Stress-Testing Image Generation Models with Explainable Human Evaluation on Open-ended Real-World Tasks
- Image Quality Assessment for Embodied AI
- ImageRAG: Dynamic Image Retrieval for Reference-Guided Image Generation
- Imagine How To Change: Explicit Procedure Modeling for Change Captioning
- Imitating the Truth: Attention-aware Truth-Guided Enhancement for Hallucination Mitigation in Large Vision-Language Models
- Imitation Learning as Return Distribution Matching
- Implicit 4D Gaussian Splatting for Fast Motion with Large Inter-Frame Displacements
- Implicit Bias and Loss of Plasticity in Matrix Completion: Depth Promotes Low-Rankness
- Implicit Bias of Per-sample Adam on Separable Data: Departure from the Full-batch Regime
- Implicit bias produces neural scaling laws in learning curves, from perceptrons to deep networks
- Implicit Inversion turns CLIP into a Decoder
- Implicit Models: Expressive Power Scales with Test-Time Compute
- Implicit Regularisation in Diffusion Models: An Algorithm-Dependent Generalisation Analysis
- Implicit Sensing for Fourier Sparse Boolean Functions
- Importance Sampling for Multi-Negative Multimodal Direct Preference Optimization
- ImpossibleBench: Measuring LLMs' Propensity of Exploiting Test Cases
- Improved $\ell_{p}$ Regression via Iteratively Reweighted Least Squares
- Improved Adversarial Diffusion Compression for Real-World Video Super-Resolution
- Improved high-dimensional estimation with Langevin dynamics and stochastic weight averaging
- Improved Object-Centric Diffusion Learning with Registers and Contrastive Alignment
- Improved Quality, Synchrony, and Preference Alignment for Joint Audio-Video Generation
- Improving 2D Diffusion Models for 3D Medical Imaging with Inter‑Slice Consistent Stochasticity
- Improving and Accelerating Offline RL in Large Discrete Action Spaces with Structured Policy Initialization
- Improving Attributed Long-form Question Answering with Intent Awareness
- Improving Autoregressive Video Modeling with History Understanding
- Improving Black-Box Generative Attacks via Generator Semantic Consistency
- Improving Block-Wise LLM Quantization by 4-bit Block-Wise Optimal Float (BOF4): Analysis and Variations
- Improving Classifier-Free Guidance in Masked Diffusion: Low-Dim Theoretical Insights with High-Dim Impact
- Improving Code Localization with Repository Memory
- Improving Diffusion Models for Class-imbalanced Training Data via Capacity Manipulation
- Improving Discrete Diffusion Unmasking Policies Beyond Explicit Reference Policies
- Improving Extreme Wind Prediction with Frequency-Informed Learning
- Improving Feasibility via Fast Autoencoder-Based Projections
- Improving Human-AI Coordination through Online Adversarial Training and Generative Models
- Improving LLM Alignment with References
- Improving LLM-based Global Optimization with Search Space Partitioning
- Improving Long-Range Interactions in Graph Neural Simulators via Hamiltonian Dynamics
- Improving Online-to-Nonconvex Conversion for Smooth Optimization via Double Optimism
- Improving Reasoning for Diffusion Language Models via Group Diffusion Policy Optimization
- Improving Semantic Proximity in English-Centric Information Retrieval through Cross-Lingual Alignment
- Improving Set Function Approximation with Quasi-Arithmetic Neural Networks
- Improving Text-guided CAD Prototyping via Modality-Specific Tokenization
- IMSE: Intrinsic Mixture of Spectral Experts Fine-tuning for Test-Time Adaptation
- In Agents We Trust, but Who Do Agents Trust? Latent Preferences Steer LLM Generations
- Incentive-Aligned LLM Summaries
- Incentives in Federated Learning with Heterogeneous Agents
- Incentivizing Consistent, Effective and Scalable Reasoning Capability in Audio LLMs via Reasoning Process Rewards
- Incentivizing LLM Reasoning via Reinforcement Learning with Functional Monte Carlo Tree Search
- InclusiveVidPose: Bridging the Pose Estimation Gap for Individuals with Limb Deficiencies in Video-Based Motion
- Incomplete Data, Complete Dynamics: A Diffusion Approach
- Incomplete Multi-View Multi-Label Classification via Shared Codebook and Fused-Teacher Self-Distillation
- Inconsistency Biases in Dynamic Data Pruning
- In-Context Algebra
- In-Context Algorithm Emulation in Fixed-Weight Transformers
- In-Context Compositional Q-Learning for Offline Reinforcement Learning
- In-Context Learning for Pure Exploration
- In-Context Learning of Temporal Point Processes with Foundation Inference Models
- In Context Semi-Supervised Learning
- In-Context Watermarks for Large Language Models
- Incorporating Expert Priors into Bayesian Optimization via Dynamic Mean Decay
- IncVGGT: Incremental VGGT for Memory-Bounded Long-Range 3D Reconstruction
- Independence Test for Linear Non-Gaussian Data and Applications in Causal Discovery
- IndicVisionBench: Benchmarking Cultural and Multilingual Understanding in VLMs
- Inducing Dyslexia in Vision Language Models
- Inductive Reasoning for Temporal Knowledge Graphs with Emerging Entities
- InfBaGel: Human-Object-Scene Interaction Generation with Dynamic Perception and Iterative Refinement
- Inference-Time Dynamic Modality Selection for Incomplete Multimodal Classification
- Inference-Time Personalized Safety Control via Paired Difference-in-Means Intervention
- Inference-time scaling of diffusion models through classical search
- Inference-Time Scaling of Discrete Diffusion Models via Importance Weighting and Optimal Proposal Design
- Inferring brain plasticity rule under long-term stimulation with structured recurrent dynamics
- Inferring the Invisible: Neuro-Symbolic Rule Discovery for Missing Value Imputation
- InfGen: Scenario Generation as Next Token Group Prediction
- Infinite Horizon Markov Economies
- Influence Dynamics and Stagewise Data Attribution
- Influence-Preserving Proxies for Gradient-Based Data Selection in LLM FineTuning
- Influence without Confounding: Causal Discovery from Temporal Data with Long-term Carry-over Effects
- InfoBridge: Mutual Information estimation via Bridge Matching
- InfoDet: A Dataset for Infographic Element Detection
- InfoMosaic-Bench: Evaluating Multi-Source Information Seeking in Tool-Augmented Agents
- InfoNCE Induces Gaussian Distribution
- Information-based Value Iteration Networks for Decision Making Under Uncertainty
- Information Estimation with Discrete Diffusion
- Information Gain-based Policy Optimization: A Simple and Effective Approach for Multi-Turn LLM Agents
- Information Shapes Koopman Representation
- InfoScan: Information-Efficient Visual Scanning via Resource-Adaptive Walks
- InfoTok: Adaptive Discrete Video Tokenizer via Information-Theoretic Compression
- InftyThink: Breaking the Length Limits of Long-Context Reasoning in Large Language Models
- In Good GRACES: Principled Teacher Selection for Knowledge Distillation
- Inheriting Generalizable Knowledge from LLMs to Diverse Vertical Tasks
- Initialization Schemes for Kolmogorov–Arnold Networks: An Empirical Study
- Inlier-Centric Post-Training Quantization for Object Detection Models
- InnoGym: Benchmarking the Innovation Potential of AI Agents
- InnovatorBench: Evaluating Agents’ Ability to Conduct Innovative AI Research
- Inoculation Prompting: Eliciting traits from LLMs during training can reduce trait expression at test-time
- INO-SGD: Addressing Utility Imbalance under Individualized Differential Privacy
- Inpainting-Guided Policy Optimization for Diffusion Large Language Models
- In-Place Test-Time Training
- InputDSA: Demixing, then comparing recurrent and externally driven dynamics
- InSight-o3: Empowering Multimodal Foundation Models with Generalized Visual Search
- Instance-Dependent Fixed-Budget Pure Exploration in Reinforcement Learning
- Instance-wise Adaptive Scheduling via Derivative-Free Meta-Learning
- INSTANT: Compressing Gradients and Activations for Resource-Efficient Training
- Instilling an Active Mind in Avatars via Cognitive Simulation
- Integrating Generative and Experimental Platforms for Biomolecular Design
- Intention-Conditioned Flow Occupancy Models
- InterActHuman: Multi-Concept Human Animation with Layout-Aligned Audio Conditions
- Interaction-aware Representation Modeling With Co-Occurrence Consistency for Egocentric Hand-Object Parsing
- Interaction Field Matching: Overcoming Limitations of Electrostatic Models
- Interactive Agents to Overcome Underspecificity in Software Engineering
- Interactive Learning of Single-Index Models via Stochastic Gradient Descent
- Interact-RAG: Reason and Interact with the Corpus, Beyond Black-Box Retrieval
- Inter-Agent Relative Representations for Multi-Agent Option Discovery
- Interference-Isolated Elastic Weight Consolidation and Knowledge Calibration for Incremental Object Detection
- Interleave-VLA: Enhancing Robot Manipulation with Image-Text Interleaved Instructions
- Interleaving Reasoning for Better Text-to-Image Generation
- Internal Evaluation of Density-Based Clusterings with Noise
- InternSpatial: A Comprehensive Dataset for Spatial Reasoning in Vision-Language Models
- InternSVG: Towards Unified SVG Tasks with Multimodal Large Language Models
- Interp3D: Correspondence-aware Interpolation for Generative Textured 3D Morphing
- Interpolation-Based Conditioning of Flow Matching Models for Bioisosteric Ligand Design
- Interpretable 3D Neural Object Volumes for Robust Conceptual Reasoning
- In-The-Flow Agentic System Optimization for Effective Planning and Tool Use
- INTIMA: A Benchmark for Human-AI Companionship Behavior
- Into the Rabbit Hull: From Task-Relevant Concepts in DINO to Minkowski Geometry
- Intrinsic Entropy of Context Length Scaling in LLMs
- Intrinsic Explanation of Random Subspace Method for Enhanced Security Applications
- Intrinsic Lorentz Neural Network
- Intrinsic training dynamics of deep neural networks
- Inverse IFEval: Can LLMs Unlearn Stubborn Training Conventions to Follow Real Instructions?
- Inverse Reinforcement Learning with Dynamic Reward Scaling for LLM Alignment
- Inverse Virtual Try-On: Generating Multi-Category Product-Style Images from Clothed Individuals
- Invert4TVG: A Temporal Video Grounding Framework with Inversion Tasks Preserving Action Understanding Ability
- Investigating Redundancy in Multimodal Large Language Models with Multiple Vision Encoders
- Invisible Safety Threat: Malicious Finetuning for LLM via Steganography
- I Predict Therefore I Am: Is Next Token Prediction Enough to Learn Human-Interpretable Concepts from Data?
- IR-Agent: Expert-Inspired LLM Agents for Structure Elucidation from Infrared Spectra
- Is Finer Better? The Limits of Microscaling Formats in Large Language Models
- Is Graph Unlearning Ready for Practice? A Benchmark on Efficiency, Utility, and Forgetting
- Is In-Context Learning Learning?
- Is it Thinking or Cheating? Detecting Implicit Reward Hacking by Measuring Reasoning Effort
- Is On-Policy Data always the Best Choice for Direct Preference Optimization-Based LM Alignment?
- Is Pure Exploitation Sufficient in Exogenous MDPs with Linear Function Approximation?
- Is the Reversal Curse a Binding Problem? Uncovering Limitations of Transformers from a Basic Generalization Failure
- Is This Just Fantasy? Language Model Representations Reflect Human Judgments of Event Plausibility
- Is Your Paper Being Reviewed by an LLM? Benchmarking AI Text Detection in Peer Review
- Iterative Distillation for Reward-Guided Fine-Tuning of Diffusion Models in Biomolecular Design
- Iterative Training of Physics-Informed Neural Networks with Fourier-enhanced Features
- IterResearch: Rethinking Long-Horizon Agents via Markovian State Reconstruction
- It's All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization
- It's All Just Vectorization: einx, a Universal Notation for Tensor Operations
- IV-Bench: A Benchmark for Image-Grounded Video Perception and Reasoning in Multimodal LLMs
- IVC-Prune: Revealing the Implicit Visual Coordinates in LVLMs for Vision Token Pruning
- IVEBench: Modern Benchmark Suite for Instruction-Guided Video Editing Assessment
- IWR-Bench: Can LVLMs reconstruct interactive webpage from a user interaction video?
- J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning
- Jackpot: Align Actor-Policy Distribution for scalable and stable RL for LLM
- Jacobian Aligned Random Forests
- Jailbreaking on Text-to-Video Models via Scene Splitting Strategy
- Jailbreaking the Matrix: Nullspace Steering for Controlled Model Subversion
- JailbreakLoRA: Your Downloaded LoRA from Sharing Platforms might be Unsafe
- Jailbreak Transferability Emerges from Shared Representations
- JailNewsBench: Multi-Lingual and Regional Benchmark for Fake News Generation under Jailbreak Attacks
- JALMBench: Benchmarking Jailbreak Vulnerabilities in Audio Language Models
- JanusCoder: Towards a Foundational Visual-Programmatic Interface for Code Intelligence
- JanusVLN: Decoupling Semantics and Spatiality with Dual Implicit Memory for Vision-Language Navigation
- JAPAN: Joint Adaptive Prediction Areas with Normalising Flow
- Jet Expansions: Restructuring LLM Computation for Model Inspection
- Johnson-Lindenstrauss Lemma Guided Network for Efficient 3D Medical Segmentation
- Joint Adaptation of Uni-modal Foundation Models for Multi-modal Alzheimer's Disease Diagnosis
- Joint Audio-Video Diffusion Transformer with Hierarchical Spatio-Temporal Prior Synchronization
- JointAVBench: A Benchmark for Joint Audio-Visual Reasoning Evaluation
- JointDiff: Bridging Continuous and Discrete in Multi-Agent Trajectory Generation
- Joint Discriminative-Generative Modeling via Dual Adversarial Training
- Joint Distillation for Fast Likelihood Evaluation and Sampling in Flow-based Models
- Joint Distribution–Informed Shapley Values for Sparse Counterfactual Explanations
- Joint Optimization for 4D Human-Scene Reconstruction in the Wild
- Joint Selection for Large-Scale Pre-Training Data via Policy Gradient-based Mask Learning
- Joint Shadow Generation and Relighting via Light-Geometry Interaction Maps
- Journey to the Centre of Cluster: Harnessing Interior Nodes for A/B Testing under Network Interference
- jqBench: a benchmark for reading and editing JSON from natural language and/or examples
- Judo: A Juxtaposed Domain-oriented Multimodal Reasoner for Industrial Anomaly QA
- JULI: Jailbreak Large Language Models by Self-Introspection
- Just Do It!? Computer-Use Agents Exhibit Blind Goal-Directedness
- K²-Agent: Co-Evolving Know-What and Know-How for Hierarchical Mobile Device Control
- Kaleidoscope: In-language Exams for Massively Multilingual Vision Evaluation
- KaLM-Embedding-V2: Superior Training Techniques and Data Inspire A Versatile Embedding Model
- KANO: Kolmogorov-Arnold Neural Operator
- KaVa: Latent Reasoning via Compressed KV-Cache Distillation
- KBVQ-MoE: KLT-guided SVD with Bias-Corrected Vector Quantization for MoE Large Language Models
- KDP: Simplifying Representation Dynamics in Kernel Space
- KeepLoRA: Continual Learning with Residual Gradient Adaptation
- Keep the Best, Forget the Rest: Reliable Alignment with Order-Aware Preference Optimization
- KernelFusion: Zero-Shot Blind Super-Resolution via Patch Diffusion
- Kevin: Multi-Turn RL for Generating CUDA Kernels
- KGOT: Unified Knowledge Graph and Optimal Transport Pseudo-Labeling for Molecule-Protein Interaction Prediction
- Kimi-Dev: Agentless Training as Skill Prior for SWE-agents
- KinemaDiff: Towards Diffusion for Coherent and Physically Plausible Human Motion Prediction
- KLAS: Using Similarity to Stitch Neural Networks for an Improved Accuracy-Efficiency Tradeoff
- KL-Regularized Reinforcement Learning is Designed to Mode Collapse
- KnowGuard: Knowledge-Driven Abstention for Multi-Round Clinical Reasoning
- Knowing When to Quit: Probabilistic Early Exits for Speech Separation Networks
- Knowledgeable Language Models as Black-Box Optimizers for Personalized Medicine
- Knowledge Distillation as Decontamination? Revisiting the “Data Laundering” Concern
- Knowledge Distillation for Large Language Models through Residual Learning
- Knowledge Editing with Subspace-Aware Key-Value Mappings
- Knowledge Exchange with Confidence: Cost-Effective LLM Integration for Reliable and Efficient Visual Question Answering
- Knowledge Externalization: Reversible Unlearning and Modular Retrieval in Multimodal Large Language Models
- Knowledge Fusion of Large Language Models via Modular SkillPacks
- Knowledge Reasoning Language Model: Unifying Knowledge and Language for Inductive Knowledge Graph Reasoning
- KnowledgeSmith: Uncovering Knowledge Updating in LLMs with Model Editing and Unlearning
- KnowProxy: Adapting Large Language Models by Knowledge-guided Proxy
- Know When to Abstain: Optimal Selective Classification with Likelihood Ratios
- Koopman-Assisted Trajectory Synthesis: A Data Augmentation Framework for Offline Imitation Learning
- K-Prism: A Knowledge-Guided and Prompt Integrated Universal Medical Image Segmentation Model
- KRAMABENCH: A Benchmark for AI Systems on Data-to-Insight Pipelines over Data Lakes
- K-Sort Eval: Efficient Preference Evaluation for Visual Generation via Corrected VLM-as-a-Judge
- KV-Cache Transform Coding for Compact Storage in LLM Inference
- KVComm: Enabling Efficient LLM Communication through Selective KV Sharing
- LABEL-FREE MITIGATION OF SPURIOUS CORRELATIONS IN VLMS USING SPARSE AUTOENCODERS
- Label Smoothing Improves Machine Unlearning
- LadderSym: A Multimodal Interleaved Transformer for Music Practice Error Detection
- LaDiR: Latent Diffusion Enhances LLMs for Text Reasoning
- LAMDA: A Longitudinal Android Malware Benchmark for Concept Drift Analysis
- Landing with the Score: Riemannian Optimization through Denoising
- Landscape of Thoughts: Visualizing the Reasoning Process of Large Language Models
- LANE: Label-Aware Noise Elimination for Fine-Grained Text Classification
- Language Agents for Hypothesis-driven Clinical Decision Making with Reinforcement Learning
- Language and Experience: A Computational Model of Social Learning in Complex Tasks
- Language Confusion Gate: Language-Aware Decoding Through Model Self-Distillation
- Language-guided Open-world Video Anomaly Detection under Weak Supervision
- Language Identification in the Limit with Computational Trace
- Language-Instructed Vision Embeddings for Controllable and Generalizable Perception
- Language in the Flow of Time: Time-Series-Paired Texts Weaved into a Unified Temporal Narrative
- Language Model Planning from an Information Theoretic Perspective
- Language Models are Injective and Hence Invertible
- Language Models Use Lookbacks to Track Beliefs
- LaplacianFormer:Rethinking Linear Attention with Laplacian Kernel
- Laplacian Kernelized Bandit
- Laplacian Multi-scale Flow Matching for Generative Modeling
- La-Proteina: Atomistic Protein Generation via Partially Latent Flow Matching
- Large Depth Completion Model from Sparse Observations
- Large Language Model Compression with Global Rank and Sparsity Optimization
- Larger Datasets Can Be Repeated More: A Theoretical Analysis of Multi-Epoch Scaling in Linear Regression
- Large Scale Diffusion Distillation via Score-Regularized Continuous-Time Consistency
- LaSeR: Reinforcement Learning with Last-Token Self-Rewarding
- Latent Adaptation of Foundation Policies for Sim-to-Real Transfer
- Latent Concept Disentanglement in Transformer-based Language Models
- Latent Denoising Makes Good Visual Tokenizers
- Latent Diffusion Model without Variational Autoencoder
- Latent Fourier Transform
- Latent Geometry-Driven Network Automata for Complex Network Dismantling
- Latent-Guided Reasoning: Empowering Small LLMs with Large-Model Thinking
- Latent & Implicit Thinking – Going Beyond CoT Reasoning
- Latent Particle World Models: Self-supervised Object-centric Stochastic Dynamics Modeling
- Latent Planning Emerges with Scale
- LatentQA: Teaching LLMs to Decode Activations Into Natural Language
- Latent Speech-Text Transformer
- Latent Stochastic Interpolants
- Latent Thinking Optimization: Your Latent Reasoning Language Model Secretly Encodes Reward Signals in its Latent Thoughts
- Latent-to-Data Cascaded Diffusion Models for Unconditional Time Series Generation
- Latent Veracity Inference for Identifying Errors in Stepwise Reasoning
- Latent Visual Reasoning
- Latent Wasserstein Adversarial Imitation Learning
- Latent Wavelet Diffusion For Ultra High-Resolution Image Synthesis
- Late-to-Early Training: LET LLMs Learn Earlier, So Faster and Better
- LaTo: Landmark-tokenized Diffusion Transformer for Fine-grained Human Face Editing
- LaVCa: LLM-assisted Visual Cortex Captioning
- Lavida-O: Elastic Large Masked Diffusion Models for Unified Multimodal Understanding and Generation
- LayerSync: Self-aligning Intermediate Layers
- Layerwise Federated Learning for Heterogeneous Quantum Clients using Quorus
- LazyDrag: Enabling Stable Drag-Based Editing on Multi-Modal Diffusion Transformers via Explicit Correspondence
- LCA: Local Classifier Alignment for Continual Learning
- LD-EnSF: Synergizing Latent Dynamics with Ensemble Score Filters for Fast Data Assimilation with Sparse Observations
- LD-MoLE: Learnable Dynamic Routing for Mixture of LoRA Experts
- LDT: Layer-Decomposition Training Makes Networks More Generalizable
- Lean Finder: Semantic Search for Mathlib That Understands User Intents
- LeanForPhysics: Comprehensive Reasoning Framework for University-level Physics in Lean4
- LEAP: Local ECT-Based Learnable Positional Encodings for Graphs
- Learnability and Privacy Vulnerability are Entangled in a Few Critical Weights
- Learnable Fractional Superlets with a Spectro-Temporal Emotion Encoder for Speech Emotion Recognition
- Learnable Sparsity for Vision Generative Models
- LearNAT: Learning NL2SQL with AST-guided Task Decomposition for Large Language Models
- Learned Meta-Tokens for Language Modeling
- Learning Adaptive Distribution Alignment with Neural Characteristic Function for Graph Domain Adaptation
- Learning a distance measure from the information-estimation geometry of data
- Learning Admissible Heuristics for A*: Theory and Practice
- Learning a Game by Paying the Agents
- Learning AND–OR Templates for Compositional Representation in Art and Design
- Learning an Image Editing Model without Image Editing Pairs
- Learning-Augmented Moment Estimation on Time-Decay Models
- Learning Boltzmann Generators via Constrained Mass Transport
- Learning Brain Representation with Hierachical Visual Embeddings
- Learning Collective Variables from BioEmu with Time-Lagged Generation
- Learning Concept Bottleneck Models from Mechanistic Explanations
- Learning Correlated Reward Models: Statistical Barriers and Opportunities
- Learning Data-Efficient and Generalizable Neural Operators via Fundamental Physics Knowledge
- Learning Distributions over Permutations and Rankings with Factorized Representations
- Learning Domain-Aware Task Prompt Representations for Multi-Domain All-in-One Image Restoration
- Learning Dynamic Causal Graphs Under Parametric Uncertainty via Polynomial Chaos Expansions
- Learning Dynamics Feature Representation via Policy Attention for Dynamic Path Planning in Urban Road Networks
- Learning Dynamics of Logits Debiasing for Long-Tailed Semi-Supervised Learning
- Learning Efficient and Interpretable Multi-Agent Communication
- Learning Escorted Protocols For Multistate Free-Energy Estimation
- Learning Explicit Single-Cell Dynamics Using ODE Representations
- Learning Exposure Mapping Functions for Inferring Heterogeneous Peer Effects
- Learning Facts at Scale with Active Reading
- Learning Flexible Forward Trajectories for Masked Molecular Diffusion
- Learning for Highly Faithful Explainability
- Learning from Algorithm Feedback: One-Shot SAT Solver Guidance with GNNs
- Learning From Dictionary: Enhancing Robustness of Machine-Generated Text Detection in Zero-Shot Language via Adversarial Training
- Learning from Historical Activations in Graph Neural Networks
- Learning from Label Proportions via Proportional Value Classification
- Learning from Noisy Preferences: A Semi-Supervised Learning Approach to Direct Preference Optimization
- Learning from Synthetic Data Improves Multi-hop Reasoning
- Learning from the Electronic Structure of Molecules across the Periodic Table
- Learning From the Past with Cascading Eligibility Traces
- Learning Global Hypothesis Space for Enhancing Synergistic Reasoning Chain
- Learning Heterogeneous Degradation Representation for Real-World Super-Resolution
- Learning Hierarchical and Geometry-Aware Graph Representations for Text-to-CAD
- Learning Human Habits with Rule-Guided Active Inference
- Learning in Prophet Inequalities with Noisy Observations
- Learning is Forgetting; LLM Training As Lossy Compression
- Learning Ising Models under Hard Constraints using One Sample
- Learning Koopman Representations with Controllability Guarantees
- Learning linear state-space models with sparse system matrices
- Learning Massively Multitask World Models for Continuous Control
- Learning Meaningful Representations of Life (LMRL) Workshop @ ICLR 2026
- Learning Mixtures of Linear Dynamical Systems (MoLDS) via Hybrid Tensor–EM Method
- Learning Molecular Chirality via Chiral Determinant Kernels
- Learning More with Less: A Dynamic Dual-Level Down-Sampling Framework for Efficient Policy Optimization
- Learning multimodal dictionary decompositions with group-sparse autoencoders
- Learning Nonlinear Causal Reductions to Explain Reinforcement Learning Policies
- Learning of Population Dynamics: Inverse Optimization Meets JKO Scheme
- Learning on a Razor’s Edge: Identifiability and Singularity of Polynomial Neural Networks
- Learning Ordinal Probabilistic Reward from Preferences
- Learning Part-Aware Dense 3D Feature Field For Generalizable Articulated Object Manipulation
- Learning Patient-Specific Disease Dynamics With Latent Flow Matching For Longitudinal Imaging Generation
- Learning Physics-Grounded 4D Dynamics with Neural Gaussian Force Fields
- Learning Posterior Predictive Distributions for Node Classification from Synthetic Graph Priors
- Learning Pseudorandom Numbers with Transformers: Permuted Congruential Generators, Curricula, and Interpretability
- Learning Recursive Multi-Scale Representations for Irregular Multivariate Time Series Forecasting
- Learning residue level protein dynamics with multiscale Gaussians
- Learning Retrieval Models with Sparse Autoencoders
- Learning Robust Intervention Representations with Delta Embeddings
- Learning Self-Critiquing Mechanisms for Region-Guided Chest X-Ray Report Generation
- Learning Semi-Structured Sparsity for LLMs via Shared and Context-Aware Hypernetwork
- Learning Shrinks the Hard Tail: Training‑Dependent Inference Scaling in a Solvable Linear Model
- Learning Structure-Semantic Evolution Trajectories for Graph Domain Adaptation
- Learning Survival Distributions with Individually Calibrated Asymmetric Laplace Distribution
- Learning-Time Encoding Shapes Unlearning in LLMs
- Learning to Adapt: In-Context Learning Beyond Stationarity
- Learning to Answer from Correct Demonstrations
- Learning to Be Uncertain: Pre-training World Models with Horizon-Calibrated Uncertainty
- Learning To Draft: Adaptive Speculative Decoding with Reinforcement Learning
- Learning to Generate Stylized Handwritten Text via a Unified Representation of Style, Content, and Noise
- Learning to Generate Unit Test via Adversarial Reinforcement Learning
- Learning to Grasp Anything By Playing with Random Toys
- Learning to Interpret Weight Differences in Language Models
- Learning to Lie: Reinforcement Learning Attacks Damage Human-AI Teams and Teams of LLMs
- Learning to Orchestrate Agents in Natural Language with the Conductor
- Learning to Parallel: Accelerating Diffusion Large Language Models via Adaptive Parallel Decoding
- Learning to Play Multi-Follower Bayesian Stackelberg Games
- Learning to Reason as Action Abstractions with Scalable Mid-Training RL
- Learning to Reason Efficiently with Discounted Reinforcement Learning
- Learning to Reason for Hallucination Span Detection
- Learning to Reason in Structured In-context Environments with Reinforcement Learning
- Learning to Reason over Continuous Tokens with Reinforcement Learning
- Learning to Reason via Mixture-of-Thought for Logical Reasoning
- Learning to Reason without External Rewards
- Learning to Recall with Transformers Beyond Orthogonal Embeddings
- Learning to See Before Seeing: Demystifying LLM Visual Priors from Language Pre-training
- Learning to Segment for Vehicle Routing Problems
- Learning to Solve Orienteering Problem with Time Windows and Variable Profits
- Learning to Summarize by Learning to Quiz: Adversarial Agentic Collaboration for Long Document Summarization
- Learning to summarize user information for personalized reinforcement learning from human feedback
- Learning to Weight Parameters for Data Attribution
- Learning under Quantization for High-Dimensional Linear Regression
- Learning Unified Representation of 3D Gaussian Splatting
- Learning Video Generation for Robotic Manipulation with Collaborative Trajectory Control
- Learning What Matters Now: Dynamic Preference Inference under Contextual Shifts
- Learning What Reinforcement Learning Can't: Interleaved Online Fine-Tuning for Hardest Questions
- Learning with Dual-level Noisy Correspondence for Multi-modal Entity Alignment
- LearnIR: Learnable Posterior Sampling for Real-World Image Restoration
- Learn More with Less: Uncertainty Consistency Guided Query Selection for RLVR
- LearnPruner: Rethinking Attention-based Token Pruning in Vision Language Models
- Learn the Ropes, Then Trust the Wins: Self-imitation with Progressive Exploration for Agentic Reinforcement Learning
- Learn to Guide Your Diffusion Model
- Learn to Reason Efficiently with Adaptive Length-based Reward Shaping
- LEGACY: A Lightweight Dynamic Gradient Compression Strategy for Distributed Deep Learning
- LEGATO: Large-scale End-to-end Generalizable Approach to Typeset OMR
- LENS: Multi-level Evaluation of Multimodal Reasoning with Large Language Models
- LeRobot: An Open-Source Library for End-to-End Robot Learning
- Less Gaussians, Texture More: 4K Feed-Forward Textured Splatting
- Less is more: Clustered Cross-Covariance Control for Offline RL
- LeSTD: LLM Compression via Learning-based Sparse Tensor Decomposition
- Let Features Decide Their Own Solvers: Hybrid Feature Caching for Diffusion Transformers
- Let LLMs Speak Embedding Languages: Generative Text Embeddings via Iterative Contrastive Refinement
- Let OOD Feature Exploring Vast Predefined Classifiers
- Let's Explore Step by Step: Generating Provable Formal Statements with Deductive Exploration
- Let's (not) just put things in Context: Test-time Training for Long-context LLMs
- Let's Think in Two Steps: Mitigating Agreement Bias in MLLMs with Self-Grounded Verification
- Leveraging Data to Say No: Memory Augmented Plug-and-Play Selective Prediction
- Leveraging Discrete Function Decomposability for Scientific Design
- Leveraging Explanation to Improve Generalization of Meta Reinforcement Learning
- Leveraging Pretrained Knowledge at Inference Time: LoRA-Gated Contrastive Decoding for Multilingual Factual Language Generation in Adapted LLMs
- LEXam: Benchmarking Legal Reasoning on 340 Law Exams
- LFQA-E: Carefully Benchmarking Long-form QA Evaluation
- Libra: Effective yet Efficient Load Balancing for Large-scale MoE Inference
- Lifelong Agents: Learning, Aligning, Evolving
- Lifelong Embodied Navigation Learning
- Lifelong Learning with Behavior Consolidation for Vehicle Routing
- LiFR-Seg: Anytime High-Frame-Rate Segmentation via Event-Guided Propagation
- LightCtrl: Training-free Controllable Video Relighting
- Light Differentiable Logic Gate Networks
- LightMem: Lightweight and Efficient Memory-Augmented Generation
- Light of Normals: Unified Feature Representation for Universal Photometric Stereo
- LightRetriever: A LLM-based Text Retrieval Architecture with Extremely Faster Query Inference
- Lightweight Spatio-Temporal Modeling via Temporally Shifted Distillation for Real-Time Accident Anticipation
- Lightweight Transformer for EEG Classification via Balanced Signed Graph Algorithm Unrolling
- Light-X: Generative 4D Video Rendering with Camera and Illumination Control
- Linear Mechanisms for Spatiotemporal Reasoning in Vision Language Models
- LinearRAG: Linear Graph Retrieval Augmented Generation on Large-scale Corpora
- LinearSR: Unlocking Linear Attention for Stable and Efficient Image Super-Resolution
- LingoLoop Attack: Trapping MLLMs via Linguistic Context and State Entrapment into Endless Loops
- LINGOLY-TOO: Disentangling Reasoning from Knowledge with Templatised Orthographic Obfuscation
- LinguaMap: Which Layers of LLMs Speak Your Language and How to Tune Them?
- Linking Process to Outcome: Conditional Reward Modeling for LLM Reasoning
- LINK: Learning Instance-level Knowledge from Vision-Language Models for Human-Object Interaction Detection
- LipNeXt: Scaling up Lipschitz-based Certified Robustness to Billion-parameter Models
- Lipschitz Bandits with Stochastic Delayed Feedback
- LiteGuard: Efficient Task-Agnostic Model Fingerprinting with Enhanced Generalization
- LiTo: Surface Light Field Tokenization
- LiveClin: A Live Clinical Benchmark without Leakage
- LiveMoments: Reselected Key Photo Restoration in Live Photos via Reference-guided Diffusion
- LiveResearchBench: Benchmarking Single- and Multi-Agent Systems for Citation-Grounded Deep Research
- LiveWeb-IE: A Benchmark For Online Web Information Extraction
- LLaVA-4D: Embedding SpatioTemporal Prompt into LMMs for 4D Scene Understanding
- LLaVAction: evaluating and training multi-modal large language models for action understanding
- LLaVA-FA: Learning Fourier Approximation for Compressing Large Multimodal Models
- LLM2Fx-Tools: Tool Calling for Music Post-Production
- LLM as an Algorithmist: Enhancing Anomaly Detectors via Programmatic Synthesis
- LLM-as-a-Prophet: Understanding Predictive Intelligence with Prophet Arena
- LLM DNA: Tracing Model Evolution via Functional Representations
- LLM Fingerprinting via Semantically Conditioned Watermarks
- LLM-Guided Evolutionary Program Synthesis for Quasi-Monte Carlo Design
- LLM-JEPA: Large Language Models Meet Joint Embedding Predictive Architectures
- LLM Pretraining with Continuous Concepts
- LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities
- LLMs are Single-threaded Reasoners: Demystifying the Working Mechanism of Soft Thinking
- LLMs as Rules Oracles: Exploring Real-World Multimodal Reasoning in Tabletop Strategy Game Environments
- LLMs Can Hide Text in Other Text of the Same Length
- LLMs Get Lost In Multi-Turn Conversation
- LLMs Must Think Thrice to Solve Executable Counterfactuals
- LLMS ON TRIAL: Evaluating Judicial Fairness For Large Language Models
- LLMs Process Lists With General Filter Heads
- LLMs Struggle to Balance Reasoning and World Knowledge in Causal Narrative Understanding
- LLM Unlearning with LLM Beliefs
- LMask: Learn to Solve Constrained Routing Problems with Lazy Masking
- lmgame-Bench: How Good are LLMs at Playing Games?
- Loc$^{2}$: Interpretable Cross-View Localization via Depth-Lifted Local Feature Matching
- Local Entropy Search over Descent Sequences for Bayesian Optimization
- Local Geometry Attention for Time Series Forecasting under Realistic Corruptions
- Locality-Attending Vision Transformer
- Locality-aware Parallel Decoding for Efficient Autoregressive Image Generation
- Localized Concept Erasure in Text-to-Image Diffusion Models via High-Level Representation Misdirection
- Localizing Task Recognition and Task Learning in In-Context Learning via Attention Head Analysis
- Local Linear Attention: An Optimal Interpolation of Linear and Softmax Attention For Test-Time Regression
- Locally Subspace-Informed Neural Operators for Efficient Multiscale PDE Solving
- Local Reinforcement Learning with Action-Conditioned Root Mean Squared Q-Functions
- Local Success Does Not Compose: Benchmarking Large Language Models for Compositional Formal Verification
- LoC-Decomp: LLM Autoformalization via Logical Concept Decomposition and Iterative Feedback Correction
- LoFT: Low-Rank Adaptation That Behaves Like Full Fine-Tuning
- LogART: Pushing the Limit of Efficient Logarithmic Post-Training Quantization
- Log-Augmented Generation: Scaling Test-Time Reasoning with Reusable Computation
- LogiConBench: Benchmarking Logical Consistencies of LLMs
- LogicXGNN: Grounded Logical Rules for Explaining Graph Neural Networks
- LogiStory: A Logic-Aware Framework for Multi-Image Story Visualization
- Logit‑KL Flow Matching: Non‑Autoregressive Text Generation via Sampling‑Hybrid Inference
- Log-Linear Attention
- Log Probability Tracking of LLM APIs
- Long Chain-of-Thought Reasoning Across Languages
- Long-Context Attention Benchmark: From Kernel Efficiency to Distributed Context Parallelism
- Long-Context Generalization with Sparse Attention
- Long-Document QA with Chain-of-Structured-Thought and Fine-Tuned SLMs
- LongHorizonUI: A Unified Framework for Robust long-horizon Task Automation of GUI Agent
- LongLive: Real-time Interactive Long Video Generation
- Long-range Modeling and Processing of Multimodal Event Sequences
- LongRLVR: Long-Context Reinforcement Learning Requires Verifiable Context Rewards
- Long-tailed Test-Time Adaptation for Vision-Language Models
- Long-Text-to-Image Generation via Compositional Prompt Decomposition
- LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement Learning
- LookaheadKV: Fast and Accurate KV Cache Eviction by Glimpsing into the Future without Generation
- Look-ahead Reasoning with a Learned Model in Imperfect Information Games
- Lookahead Tree-Based Rollouts for Enhanced Trajectory-Level Exploration in Reinforcement Learning with Verifiable Rewards
- Look Back to Reason Forward: Revisitable Memory for Long-Context LLM Agents
- Look Carefully: Adaptive Visual Reinforcements in Multimodal Large Language Models for Hallucination Mitigation
- Lookup multivariate Kolmogorov-Arnold Networks
- LoongRL: Reinforcement Learning for Advanced Reasoning over Long Contexts
- LoopFormer: Elastic-Depth Looped Transformers for Latent Reasoning via Shortcut Modulation
- Loopholing Discrete Diffusion: Deterministic Bypass of the Sampling Wall
- LoRAGen: Structure-Aware Weight Space Learning for LoRA Generation
- LoRA meets Riemannion: Muon Optimizer for Parametrization-independent Low-Rank Adapters
- LoRA-Mixer: Coordinate Modular LoRA Experts Through Serial Attention Routing
- LoRA-S: An Efficient Low Rank Adaptation scheme via Sylvester equation
- LORE: Jointly Learning The Intrinsic Dimensionality and Relative Similarity Structure from Ordinal Data
- Lossless Vocabulary Reduction for Auto-Regressive Language Models
- Lossy Common Information in a Learnable Gray-Wyner Network
- Lost in the Non-convex Loss Landscape: How to Fine-tune the Large Time Series Model?
- Lost in Tokenization: Context as the Key to Unlocking Biomolecular Understanding in Scientific LLMs
- LouisKV: Efficient KV Cache Retrieval for Long Input-Output Sequences
- Low-Latency Neural LiDAR Compression with 2D Context Models
- Low-Pass Filtering Improves Behavioral Alignment of Vision Models
- Low-pass Personalized Subgraph Federated Recommendation
- Low rank adaptation of chemical foundation models generate effective odorant representations
- Low-Rank Few-Shot Node Classification by Node-Level Graph Diffusion
- Low Rank Transformer for Multivariate Time Series Anomaly Detection and Localization
- LRIM: a Physics-Based Benchmark for Provably Evaluating Long-Range Capabilities in Graph Learning
- LSA: Layer-wise Sparsity Allocation for Large Language Model Pruning Based on Minimal Linear Reconstruction Error
- LS-Merge: Merging Language Models in Latent Space
- LucidFlux: Caption-Free Universal Image Restoration via a Large-Scale Diffusion Transformer
- LUMINA: Detecting Hallucinations in RAG System with Context–Knowledge Signals
- LumiTex: Towards High-Fidelity PBR Texture Generation with Illumination Context
- Lumos-1: On Autoregressive Video Generation with Discrete Diffusion from a Unified Model Perspective
- LumosX: Relate Any Identities with Their Attributes for Personalized Video Generation
- LVTINO: LAtent Video consisTency INverse sOlver for High Definition Video Restoration
- LycheeDecode: Accelerating Long-Context LLM Inference via Hybrid-Head Sparse Decoding
- Lyra: Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation
- M$^2$-Miner: Multi-Agent Enhanced MCTS for Mobile GUI Agent Data Mining
- M$^3$E: Continual Vision-and-Language Navigation via Mixture of Macro and Micro Experts
- M3CoTBench: Benchmark Chain-of-Thought of MLLMs in Medical Image Understanding
- M4PQA: A Comprehensive QA Dataset for AI Research with Instance-Level Evaluation
- MAC-AMP: A Closed-Loop Multi-Agent Collaboration System for Multi-Objective Antimicrobial Peptide Design
- Machine Learning for Genomics Explorations (MLGenX)
- Machine Unlearning under Retain–Forget Entanglement
- MAGE: Multi-scale Autoregressive Generation for Offline Reinforcement Learning
- MAGO: Beyond Fixed Hyperparameters with Multi-Objective Pareto Optimization for Hybrid LLM Reasoning
- MAGREF: Masked Guidance for Any-Reference Video Generation with Subject Disentanglement
- Making, Not Taking, the Best of N
- Making Slow Thinking Faster: Compressing LLM Chain-of-Thought via Step Entropy
- Mamba-3: Improved Sequence Modeling using State Space Principles
- MambaSL: Exploring Single-Layer Mamba for Time Series Classification
- MambaVoiceCloning: Efficient and Expressive Text-to-Speech via State-Space Modeling and Diffusion Control
- ManagerBench: Evaluating the Safety-Pragmatism Trade-off in Autonomous LLMs
- Mango-GS: Enhancing Spatio-Temporal Consistency in Dynamic Scenes Reconstruction using Multi-Frame Node-Guided 4D Gaussian Splatting
- ManipEvalAgent: Promptable and Efficient Evaluation Framework for Robotic Manipulation Policies
- Manipulation as in Simulation: Enabling Accurate Geometry Perception in Robots
- Many Eyes, One Mind: Temporal Multi-Perspective and Progressive Distillation for Spiking Neural Networks
- Many-for-Many: Unify the Training of Multiple Video and Image Generation and Manipulation Tasks
- MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer
- Map as a Prompt: Learning Multi-Modal Spatial-Signal Foundation Models for Cross-scenario Wireless Localization
- Mapping Overlaps in Benchmarks through Perplexity in the Wild
- Mapping Post-Training Forgetting in Language Models at Scale
- MAPSS: Manifold-based Assessment of Perceptual Source Separation
- Map the Flow: Revealing Hidden Pathways of Information in VideoLLMs
- MARC: Memory-Augmented RL Token Compression for Efficient Video Understanding
- Market Games for Generative Models: Equilibria, Welfare, and Strategic Entry
- Markovian Transformers for Informative Language Modeling
- MARS - A Foundational Map Auto-Regressor
- MarS-FM: Generative Modeling of Molecular Dynamics via Markov State Models
- MaRS: Memory-Adaptive Routing for Reliable Capacity Expansion and Knowledge Retention
- MARS: Reinforcing Multi-Agent Reasoning of LLMs through Self-Play in Strategic Games
- MARS-Sep: Multimodal-Aligned Reinforced Sound Separation
- MARTI: A Framework for Multi-Agent LLM Systems Reinforced Training and Inference
- MAS$^2$: Self-Generative, Self-Configuring, Self-Rectifying Multi-Agent Systems
- MASAM: Multimodal Adaptive Sharpness-Aware Minimization for Heterogeneous Data Fusion
- MaskCO: Masked Generation Drives Effective Representation Learning and Exploiting for Combinatorial Optimization
- Masked Generative Policy for Robotic Control
- Masked Skill Token Training for Hierarchical Off-Dynamics Transfer
- MaskInversion: Localized Embeddings via Optimization of Explainability Maps
- MaskPro: Linear-Space Probabilistic Learning for Strict (N:M)-Sparsity on LLMs
- Massive Activations are the Key to Local Detail Synthesis in Diffusion Transformers
- Massive Editing for Large Language Models Based on Dynamic Weight Generation
- Massive Memorization with Hundreds of Trillions of Parameters for Sequential Transducer Generative Recommenders
- MASS: MoErging through Adaptive Subspace Selection
- Mastering Sparse CUDA Generation through Pretrained Models and Deep Reinforcement Learning
- Master Skill Learning with Policy-Grounded Synergy of LLM-based Reward Shaping and Exploring
- MATA: A Trainable Hierarchical Automaton System for Multi-Agent Visual Reasoning
- Matched Data, Better Models: Target Aligned Data Filtering with Sparse Features
- Matching multiple experts: on the exploitability of multi-agent imitation learning
- Matching without Group Barrier for Heterogeneous Treatment Effect Estimation
- MATH-Beyond: A Benchmark for RL to Expand Beyond the Base Model
- Math Blind: Failures in Diagram Understanding Undermine Reasoning in MLLMs
- Mathesis: Towards Formal Theorem Proving from Natural Languages
- MathFimer: Enhancing Mathematical Reasoning by Expanding Reasoning Steps through Fill-in-the-Middle Task
- MATHMO: Automated Mathematical Modeling Through Adaptive Search
- MathNet: A Global Multimodal Benchmark for Mathematical Reasoning and Retrieval
- MatRIS: Toward Reliable and Efficient Pretrained Machine Learning Interaction Potentials
- MATRIX: Mask Track Alignment for Interaction-aware Video Generation
- Matting Anything 2: Towards Video Matting for Anything
- MAVEN: A Mesh-Aware Volumetric Encoding Network for Simulating 3D Flexible Deformation
- Maximizing Asynchronicity in Event-based Neural Networks
- Maximizing Incremental Information Entropy for Contrastive Learning
- MCbiF: Measuring Topological Autocorrelation in Multiscale Clusterings via 2-Parameter Persistent Homology
- MCIF: Multimodal Crosslingual Instruction-Following Benchmark from Scientific Talks
- mCLM: A Modular Chemical Language Model that Generates Functional and Makeable Molecules
- MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers
- MCPMark: A Benchmark for Stress-Testing Realistic and Comprehensive MCP Use
- MCP-SafetyBench: A Benchmark for Safety Evaluation of Large Language Models with Real-World MCP Servers
- MCP Security Bench (MSB): Benchmarking Attacks Against Model Context Protocol in LLM Agents
- MC-Search: Evaluating and Enhancing Multimodal Agentic Search with Structured Long Reasoning Chains
- MeanCache: From Instantaneous to Average Velocity for Accelerating Flow Matching Inference
- Mean Estimation from Coarse Data: Characterizations and Efficient Algorithms
- Mean-Field Neural Differential Equations: A Game-Theoretic Approach to Sequence Prediction
- Mean Flow Policy with Instantaneous Velocity Constraint for One-step Action Generation
- Measurement Score-Based Diffusion Model
- Measure Twice, Cut Once: A Semantic-Oriented Approach to Video Temporal Localization with Video LLMs
- Measuring and Mitigating Rapport Bias of Large Language Models under Multi-Agent Social Interactions
- Measuring Audio's Impact on Correctness: Audio-Contribution-Aware Post-Training of Large Audio Language Models
- Measuring Bias Amplification in Multi-Agent Systems with Large Language Models
- Measuring LLM Novelty As The Frontier Of Original And High-Quality Output
- Measuring Physical-World Privacy Awareness of Large Language Models: An Evaluation Benchmark
- Measuring the Intrinsic Dimension of Earth Representations
- Measuring Uncertainty Calibration
- Mechanism of Task-oriented Information Removal in In-context Learning
- Mechanistic Detection and Mitigation of Hallucination in Large Reasoning Models
- Mechanistic Independence: A Principle for Identifiable Disentangled Representations
- MedAgentGym: A Scalable Agentic Training Environment for Code-Centric Reasoning in Biomedical Data Science
- MedAgent-Pro: Towards Evidence-based Multi-modal Medical Diagnosis via Reasoning Agentic Workflow
- MedAraBench: Large-scale Arabic Medical Question Answering Dataset and Benchmark
- MedGMAE: Gaussian Masked Autoencoders for Medical Volumetric Representation Learning
- Medical Interpretability and Knowledge Maps of Large Language Models
- Medical thinking with multiple images
- MedLesionVQA: A Multimodal Benchmark Emulating Clinical Visual Diagnosis for Body Surface Health
- MedVR: Annotation-Free Medical Visual Reasoning via Agentic Reinforcement Learning
- MEGS^{2}: Memory-Efficient Gaussian Splatting via Spherical Gaussians and Unified Pruning
- MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents
- MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent
- Memba: Membrane-driven Parameter-Efficient Fine-Tuning for Mamba
- Membership Inference Attacks Against Fine-tuned Diffusion Language Models
- Membership Privacy Risks of Sharpness Aware Minimization
- Membrane Potential Perturbation Dynamic Is Total Variation
- Memento: Toward an All-Day Proactive Assistant for Ultra-Long Streaming Video
- MemGen: Weaving Generative Latent Memory for Self-Evolving Agents
- Memorization Through the Lens of Sample Gradients
- Memorizing Long-tail Data Can Help Generalization Through Composition
- Memory, Benchmark & Robots: A Benchmark for Solving Complex Tasks with Reinforcement Learning
- Memory-Free Continual Learning with Null Space Adaptation for Zero-Shot Vision-Language Models
- Memory-Statistics Tradeoff in Continual Learning with Structural Regularization
- Memory-T1: Reinforcement Learning for Temporal Reasoning in Multi-session Agents
- MemoryVLA: Perceptual-Cognitive Memory in Vision-Language-Action Models for Robotic Manipulation
- MENLO: From Preferences to Proficiency – Evaluating and Modeling Native-like Quality Across 47 Languages
- Merge before Forget: A Single LoRA Continual Learning via Continual Merging
- MergeMix: A Unified Augmentation Paradigm for Visual and Multi-Modal Understanding
- MergePRAG: Orthogonal Merging of Passage-experts for Multi-hop Parametric RAG
- MergeTune: Continued Fine-Tuning of Vision-Language Models
- MergOPT: A Merge-Aware Optimizer for Robust Model Merging
- MesaNet: Sequence Modeling by Locally Optimal Test-Time Training
- MeSH: Memory-as-State-Highways for Recursive Transformers
- Mesh Splatting for End-to-end Multiview Surface Reconstruction
- Meta-Adaptive Prompt Distillation for Few-Shot Visual Question Answering
- MetaCaptioner: Towards Generalist Visual Captioning with Open-source Suites
- MetaEmbed: Scaling Multimodal Retrieval at Test-Time with Flexible Late Interaction
- Meta-Learning Theory-Informed Inductive Biases using Deep Kernel Gaussian Processes
- Meta-RL Induces Exploration in Language Agents
- Meta-Router: Bridging Gold-standard and Preference-based Evaluations in LLM Routing
- MetaSpatial: Reinforcing 3D Spatial Reasoning in VLMs for the Metaverse
- Meta-UCF: Unified Task-Conditioned LoRA Generation for Continual Learning in Large Language Models
- MetaVLA: Unified Meta Co-Training for Efficient Embodied Adaptation
- Metis: Training LLMs with FP4 Quantization
- Metric $k$-clustering using only Weak Comparison Oracles
- MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head
- MIAM: Modality Imbalance-Aware Masking for Multimodal Ecological Applications
- MICLIP: Learning to Interpret Representation in Vision Models
- Micro-Macro Coupled Koopman Modeling on Graph for Traffic Flow Prediction
- Micro-Macro Retrieval: Reducing Long-Form Hallucination in Large Language Models
- MicroMix: Efficient Mixed-Precision Quantization with Microscaling Formats for Large Language Models
- MicroVerse: A Preliminary Exploration Toward a Micro-World Simulation
- MIDAS: Multi-Image Dispersion and Semantic Reconstruction for Jailbreaking MLLMs
- Midway Network: Learning Representations for Recognition and Motion from Latent Dynamics
- MILCO: Learned Sparse Retrieval Across Languages via a Multilingual Connector
- MILPnet: A Multi-Scale Architecture with Geometric Feature Sequence Representations for Advancing MILP Problems
- MILR: Improving Multimodal Image Generation via Test-Time Latent Reasoning
- MIMIC-Bench: Exploring the User-Like Thinking and Mimicking Capabilities of Multimodal Large Language Models
- MIMIC: Mask-Injected Manipulation Video Generation with Interaction Control
- MindMix: A Multimodal Foundation Model for Auditory Perception Decoding via Deep Neural-Acoustic Alignment
- MindPilot: Closed-loop Visual Stimulation Optimization for Brain Modulation with EEG-guided Diffusion
- Mini-cluster Guided Long-tailed Deep Clustering
- Minimax Optimal Adversarial Reinforcement Learning
- Minimax-Optimal Aggregation for Density Ratio Estimation
- Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search
- Minor First, Major Last: A Depth-Induced Implicit Bias of Sharpness-Aware Minimization
- MIRACLE: Model-free Imitation and Reinforcement Learning for Adaptive Cut-Selection
- Mirage or Method? How Model–Task Alignment Induces Divergent RL Conclusions
- MIRA: Memory-Integrated Reinforcement Learning Agent with Limited LLM Guidance
- Mirror Flow Matching with Heavy-Tailed Priors for Generative Modeling on Convex Domains
- Misaligned Roles, Misplaced Images: Structural Input Perturbations Expose Multimodal Alignment Blind Spots
- Missingness Bias Calibration in Feature Attribution Explanations
- MiSS: Revisiting the Trade-off in LoRA with an Efficient Shard-Sharing Structure
- Mitigating Hallucination in Vision-Language Model with Depth and Spatial-aware Key-Value Refinement
- Mitigating Mismatch within Reference-based Preference Optimization
- Mitigating Noise Shift in Denoising Generative Models with Noise Awareness Guidance
- Mitigating Non-IID Drift in Zeroth-Order Federated LLM Fine-Tuning with Transferable Sparsity
- Mitigating Privacy Risk via Forget Set-Free Unlearning
- Mitigating Safety Fallback in Editing-based Backdoor Injection on LLMs
- Mitigating Semantic Collapse in Generative Personalization with Test-Time Embedding Adjustment
- Mitigating Spurious Correlation via Distributionally Robust Learning with Hierarchical Ambiguity Sets
- Mitigating the Curse of Detail: Scaling Arguments for Feature Learning and Sample Complexity
- Mitigating the Safety Alignment Tax with Null-Space Constrained Policy Optimization
- Mix-Ecom: Towards Mixed-Type E-Commerce Dialogues with Complex Domain Rules
- Mixed-Curvature Tree-Sliced Wasserstein Distance
- Mixing Importance with Diversity: Joint Optimization for KV Cache Compression in Large Vision-Language Models
- Mixing Mechanisms: How Language Models Retrieve Bound Entities In-Context
- MixLinear: Extreme Low Resource Multivariate Time Series Forecasting with $0.1K$ Parameters
- Mixture of Cognitive Reasoners: Modular Reasoning with Brain-Like Specialization
- Mixture of Contexts for Long Video Generation
- Mixture-of-Experts Can Surpass Dense LLMs Under Strictly Equal Resource
- Mixture-of-Visual-Thoughts: Exploring Context-Adaptive Reasoning Mode Selection for General Visual Reasoning
- Mixture-of-World Models: Scaling Multi-Task Reinforcement Learning with Modular Latent Dynamics
- MLE-Smith: Scaling MLE Tasks with Automated Multi-agent Pipeline
- MLP Memory: A Retriever-Pretrained Memory for Large Language Models
- MMDuet2: Enhancing Proactive Interaction of Video MLLMs with Multi-Turn Reinforcement Learning
- MMedAgent-RL: Optimizing Multi-Agent Collaboration for Multimodal Medical Reasoning
- MME-Emotion: A Holistic Evaluation Benchmark for Emotional Intelligence in Multimodal Large Language Models
- MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models
- MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization
- MMPD: Diverse Time Series Forecasting via Multi-Mode Patch Diffusion Loss
- MMReD: a Cross-Modal Benchmark for Dense Context Reasoning
- MMR-Life: Piecing Together Real-life Scenes for Multimodal Multi-image Reasoning
- MMR-V: What's Left Unsaid? A Benchmark for Multimodal Deep Reasoning in Videos
- MMSearch-Plus: Benchmarking Provenance-Aware Search for Multimodal Browsing Agents
- MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence
- MMSU: A Massive Multi-task Spoken Language Understanding and Reasoning Benchmark
- MMTok: Multimodal Coverage Maximization for Efficient Inference of VLMs
- MnemoDyn: Learning Resting State Dynamics from $40$K FMRI sequences
- MOAI: Module-Optimizing Architecture for Non-Interactive Secure Transformer Inference
- MoAlign: Motion-Centric Representation Alignment for Video Diffusion Models
- MoBE: Mixture-of-Basis-Experts for Compressing MoE-based LLMs
- MobiEdit: Resource-efficient Knowledge Editing for Personalized On-device LLMs
- Mobile-GS: Real-time Gaussian Splatting for Mobile Devices
- MobileIPL: Enhancing Mobile Agents Thinking Process via Iterative Preference Learning
- MobileKGQA: On-Device KGQA System on Dynamic Mobile Environments
- MobileRL: Online Agentic Reinforcement Learning for Mobile GUI Agents
- MOBODY: Model-Based Off-Dynamics Offline Reinforcement Learning
- MoCa: Modeling Object Consistency for 3D Camera Control in Video Generation
- Mod-Adapter: Tuning-Free and Versatile Multi-concept Personalization via Modulation Adapter
- Modal Aphasia: Can Unified Multimodal Models Describe Images From Memory?
- Modality Alignment across Trees on Heterogeneous Hyperbolic Manifolds
- Modality-free Graph In-context Alignment
- Mode-conditioning unlocks superior test-time compute scaling
- Model Already Knows the Best Noise: Bayesian Active Noise Selection via Attention in Video Diffusion Model
- Model-based Offline RL via Robust Value-Aware Model Learning with Implicitly Differentiable Adaptive Weighting
- Model Collapse Is Not a Bug but a Feature in Machine Unlearning for LLMs
- Model-Guided Microstimulation Steers Primate Visual Behavior
- Modeling Interference for Treatment Effect Estimation in Network Dynamic Environment
- Modeling Others' Minds as Code
- Modeling the Density of Pixel-level Self-supervised Embeddings for Unsupervised Pathology Segmentation in Medical CT
- Model Predictive Adversarial Imitation Learning for Planning from Observation
- MoDr: Mixture-of-Depth-Recurrent Transformers for Test-Time Reasoning
- MoEEdit: Efficient and Routing-Stable Knowledge Editing for Mixture-of-Experts LLMs
- MoE-GS: Mixture of Experts for Dynamic Gaussian Splatting
- MoGA: Mixture-of-Groups Attention for End-to-End Long Video Generation
- MoGen: Detailed Neuronal Morphology Generation via Point Cloud Flow Matching
- MoL: Adaptive Mixture-of-Length Reasoning for Efficient Question Answering with Context
- MolecularIQ: Characterizing Chemical Reasoning Capabilities Through Symbolic Verification on Molecular Graphs
- MolEditRL: Structure-Preserving Molecular Editing via Discrete Diffusion and Reinforcement Learning
- MolLangBench: A Comprehensive Benchmark for Language-Prompted Molecular Structure Recognition, Editing, and Generation
- MOLM: Mixture of LoRA Markers
- MoMa: A Simple Modular Learning Framework for Material Property Prediction
- MoMaGen: Generating Demonstrations under Soft and Hard Constraints for Multi-Step Bimanual Mobile Manipulation
- MomaGraph: State-Aware Unified Scene Graphs with Vision-Language Models for Embodied Task Planning
- MoM: Linear Sequence Modeling with Mixture-of-Memories
- MoNE: Replacing Redundant Experts with Lightweight Novices for Structured Pruning of MoE
- Monitoring Decomposition Attacks with Lightweight Sequential Monitors
- Monocular Normal Estimation via Shading Sequence Estimation
- Monotone Near-Zero-Sum Games
- MoRA: Missing Modality Low-Rank Adaptation for Visual Recognition
- MoRA: Mobility as the Backbone for Geospatial Representation Learning at Scale
- Mordal: Automated Pretrained Model Selection for Vision Language Models
- MoReBench: Evaluating Procedural and Pluralistic Moral Reasoning in Language Models, More than Outcomes
- More Than What Was Chosen: LLM-based Explainable Recommendation Beyond Noisy User Preferences
- More Thought, Less Accuracy? On the Dual Nature of Reasoning in Vision-Language Models
- MOSAIC: Multi-Subject Personalized Generation via Correspondence-Aware Alignment and Disentanglement
- MoSA: Mosaic Shared Adaptation of Large Language Models
- MoSA: Motion-Coherent Human Video Generation via Structure-Appearance Decoupling
- MOSS: Efficient and Accurate FP8 LLM Training with Microscaling and Automatic Scaling
- Motion-Aligned Word Embeddings for Text-to-Motion Generation
- MotionGPT3: Human Motion as a Second Modality
- Motion Prior Distillation in Time Reversal Sampling for Generative Inbetweening
- Motion-R1: Enhancing Motion Generation with Decomposed Chain-of-Thought and RL Binding
- MotionSight: Boosting Fine-Grained Motion Understanding in Multimodal LLMs
- MotionStream: Real-Time Video Generation with Interactive Motion Controls
- MotionWeaver: Holistic 4D-Anchored Framework for Multi-Humanoid Image Animation
- Moving Beyond Diffusion: Hierarchy-to-Hierarchy Autoregression for fMRI-to-Image Reconstruction
- Moving Beyond Medical Exams: A Clinician-Annotated Fairness Dataset of Real-World Tasks and Ambiguity in Mental Healthcare
- mR3: Multilingual Rubric-Agnostic Reward Reasoning Models
- MRAD: Zero-Shot Anomaly Detection with Memory-Driven Retrieval
- MRMR: A Realistic and Expert-Level Multidisciplinary Benchmark for Reasoning-Intensive Multimodal Retrieval
- MrRoPE: Mixed-radix Rotary Position Embedding
- MSCR: Exploring the Vulnerability of LLMs’ Mathematical Reasoning Abilities Using Multi-Source Candidate Replacement
- MT-DAO: Multi-Timescale Distributed Adaptive Optimizers with Local Updates
- MTVCraft: Tokenizing 4D Motion for Arbitrary Character Animation
- Much Ado About Noising: Do Flow Models Actually Make Better Control Policies?
- Multi-Action Self-Improvement For Neural Combinatorial Optimization
- Multi-agent Coordination via Flow Matching
- Multi-Agent Debate with Memory Masking
- Multi-Agent Design: Optimizing Agents with Better Prompts and Topologies
- Multi-Agent Guided Policy Optimization
- Multi-Armed Bandits with Minimum Aggregated Revenue Constraints
- Multi-Condition Conformal Selection
- Multi-Domain Transferable Graph Gluing for Building Graph Foundation Models
- Multi-Feature Quantized Self-Attention for Fair Large Language Models
- Multifidelity Simulation-based Inference for Computationally Expensive Simulators
- Multi-Head Low-Rank Attention
- Multihead Mixture of Experts for Classification of Gigapixel Pathology Images
- Multi-LCB: Extending LiveCodeBench to Multiple Programming Languages
- Multilevel Control Functional
- Multilingual Routing in Mixture-of-Experts
- Multi-LLM Adaptive Conformal Inference for Reliable LLM Response
- Multi-Marginal Flow Matching with Adversarially Learnt Interpolants
- MultiMat: Multimodal Program Synthesis for Procedural Materials using Large Multimodal Models
- Multimodal Aligned Semantic Knowledge for Unpaired Image-text Matching
- Multimodal Classification via Total Correlation Maximization
- Multimodal Dataset Distillation Made Simple by Prototype-guided Data Synthesis
- Multimodal Dataset Distillation via Phased Teacher Models
- Multi-modal Data Spectrum: Multi-modal Datasets are Multi-dimensional
- MULTIMODALITY AS SUPERVISION: SELF-SUPERVISED SPECIALIZATION TO THE TEST ENVIRONMENT VIA MULTIMODALITY
- Multimodal LLM-assisted Evolutionary Search for Programmatic Control Policies
- Multimodal Policy Internalization for Conversational Agents
- Multimodal Prompt Optimization: Why Not Leverage Multiple Modalities for MLLMs
- Multi-objective Large Language Model Alignment with Hierarchical Experts
- Multi-Object System Identification from Videos
- Multiplayer Nash Preference Optimization
- Multiple-Prediction-Powered Inference
- Multiple Streams of Knowledge Retrieval: Enriching and Recalling in Transformers
- Multiple Token Divergence: Measuring and Steering In-Context Computation Density
- Multiplicative Diffusion Models: Beyond Gaussian Latents
- Multi-ReduNet: Interpretable Class-Wise Decomposition of ReduNet
- Multi-Resolution Score-Based Variational Graphical Diffusion for Causal Inference on Latent Systems
- Multi-Scale Diffusion-Guided Graph Learning with Power-Smoothing Random Walk Contrast for Multi-View Clustering
- Multi-Scale Hypergraph Meets LLMs: Aligning Large Language Models for Time Series Analysis
- Multi-state Protein Design with DynamicMPNN
- Multi-Subspace Multi-Modal Modeling for Diffusion Models: Estimation, Convergence and Mixture of Experts
- Multi-Synaptic Cooperation: A Bio-Inspired Framework for Robust and Scalable Continual Learning
- Multi-Task Low-Rank Model Adaptation
- Multi-turn Evaluation of Anthropomorphic Behaviours in Large Language Models
- Multiverse Mechanica: A Testbed for Learning Game Mechanics via Counterfactual Worlds
- Multi-View Encoders for Performance Prediction in LLM-Based Agentic Workflows
- MuonBP: Faster Muon via Block-Periodic Orthogonalization
- Muon Outperforms Adam in Tail-End Associative Memory Learning
- Musculoskeletal simulation of limb movement biomechanics in Drosophila melanogaster
- MUSE: Model-Agnostic Tabular Watermarking via Multi-Sample Selection
- Music Flamingo: Scaling Music Understanding in Audio Language Models
- MVAR: Visual Autoregressive Modeling with Scale and Spatial Markovian Conditioning
- MVCustom: Multi-View Customized Diffusion via Geometric Latent Rendering and Completion
- MVR: Multi-view Video Reward Shaping for Reinforcement Learning
- NAB: Neural Adaptive Binning for Sparse-View CT reconstruction
- NAIPv2: Debiased Pairwise Learning for Efficient Paper Quality Estimation
- Naming to Learn: Class Incremental Learning for Vision-Language Model with Unlabeled Data
- NANO3D: A Training-Free Approach for Efficient 3D Editing Without Masks
- NarrLV: Towards a Comprehensive Narrative-Centric Evaluation for Long Video Generation
- Narrow Finetuning Leaves Clearly Readable Traces in the Activation Differences
- Nasty Adversarial Training: A Probability Sparsity Perspective for Robustness Enhancement
- NatADiff: Adversarial Boundary Guidance for Natural Adversarial Diffusion
- Native Adaptive Solution Expansion for Diffusion-based Combinatorial Optimization
- Native Reasoning Models: Training Language Models to Reason on Unverifiable Data
- Natural Identifiers for Privacy and Data Audits in Large Language Models
- Natural Language PDDL (NL-PDDL) for Open-world Goal-oriented Commonsense Regression Planning in Embodied AI
- Navigating the Accuracy-Size Trade-Off with Flexible Model Merging
- Navigating the Latent Space Dynamics of Neural Models
- NC-Bench and NCfold: A Benchmark and Closed-Loop Framework for RNA Non-Canonical Base-Pair Prediction
- NDAD: Negative-Direction Aware Decoding for Large Language Models via Controllable Hallucination Signal Injection
- Nearly-Optimal Bandit Learning in Stackelberg Games with Side Information
- Nearly Space-Optimal Graph and Hypergraph Sparsification in Insertion-Only Data Streams
- Near-Optimal Online Deployment and Routing for Streaming LLMs
- Near Optimal Robust Federated Learning Against Data Poisoning Attack
- Near-Optimal Sample Complexity Bounds for Constrained Average-Reward MDPs
- Near-Optimal Second-Order Guarantees for Model-Based Adversarial Imitation Learning
- Nef-Net+: Adapting Electrocardio Panorama in the wild
- Negative Pre-activations Differentiate Syntax
- Negotiated Reasoning: On Provably Addressing Relative Over-Generalization
- NeMo-map: Neural Implicit Flow Fields for Spatio-Temporal Motion Mapping
- Nemotron-CC-Math: A 133 Billion-Token-Scale High Quality Math Pretraining Dataset
- Nemotron-Research-Tool-N1: Exploring Tool-Using Language Models with Reinforced Reasoning
- Neodragon: Mobile Video Generation Using Diffusion Transformer
- Neologism Learning for Controllability and Self-Verbalization
- Neon: Negative Extrapolation From Self-Training Improves Image Generation
- NEO — No-Optimization Test-Time Adaptation through Latent Re-Centering
- NePTune: A Neuro-Pythonic Framework for Tunable Compositional Reasoning on Vision-Language
- NeRV-Diffusion: Diffuse Implicit Neural Representation for Video Synthesis
- NerVE: Nonlinear Eigenspectrum Dynamics in LLM Feed-Forward Networks
- Nesterov Finds GRAAL: Optimal and Adaptive Gradient Method for Convex Optimization
- NetArena: Dynamically Generated LLM Benchmarks for Network Applications
- NeuCLIP: Efficient Large-Scale CLIP Training with Neural Normalizer Optimization
- Neural Collapse in Multi-Task Learning
- Neural Compression of 3D Meshes using Sparse Implicit Representation
- Neural Dynamics Self-Attention for Spiking Transformers
- Neural Force Field: Few-shot Learning of Generalized Physical Reasoning
- Neural Graduated Assignment for Maximum Common Edge Subgraphs
- Neural Hamilton--Jacobi Characteristic Flows for Optimal Transport
- Neural Latent Arbitrary Lagrangian-Eulerian Grids for Fluid-Solid Interaction
- Neural Message-Passing on Attention Graphs for Hallucination Detection
- Neural Multi-Objective Combinatorial Optimization for Flexible Job Shop Scheduling Problems
- Neural Networks Learn Multi-Index Models Near the Information-Theoretic Limit
- Neural Optimal Transport Meets Multivariate Conformal Prediction
- NeuralOS: Towards Simulating Operating Systems via Neural Generative Models
- Neural Posterior Estimation with Latent Basis Expansions
- Neural Predictor-Corrector: Solving Homotopy Problems with Reinforcement Learning
- Neural Sum-of-Squares: Certifying the Nonnegativity of Polynomials with Transformers
- Neural+Symbolic Approaches for Interpretable Actor-Critic Reinforcement Learning
- Neural Synchrony Between Socially Interacting Language Models
- Neural Theorem Proving for Verification Conditions: A Real-World Benchmark
- Neuron-Aware Data Selection in Instruction Tuning for Large Language Models
- Neuron-Level Analysis of Cultural Understanding in Large Language Models
- Never Saddle: Reparameterized Steepest Descent as Mirror Flow
- New Frontiers in Associative Memories
- New Hybrid Fine-Tuning Paradigm for LLMs: Algorithm Design and Convergence Analysis Framework
- NewtonBench: Benchmarking Generalizable Scientific Law Discovery in LLM Agents
- NewtonGen: Physics-consistent and Controllable Text-to-Video Generation via Neural Newtonian Dynamics
- Newton Method Revisited: Global Convergence Rates up to $O(1/k^3)$ for Stepsize Schedules and Linesearch Procedures
- NExT-OMNI: Towards Any-to-Any Omnimodal Foundation Models with Discrete Flow Matching
- NextQuill: Causal Preference Modeling for Enhancing LLM Personalization
- NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale
- Next-ToBE: Probabilistic Next Token-Bag Exploitation for Activating Anticipatory Capacity in LLMs
- Next Visual Granularity Generation
- Neyman-Pearson Classification under Both Null and Alternative Distributions Shift
- NFT: Bridging Supervised Learning and Reinforcement Learning in Math Reasoning
- NGS-Marker: Robust Native Watermarking for 3D Gaussian Splatting
- NIMO: a Nonlinear Interpretable MOdel
- NI Sampling: Accelerating Discrete Diffusion Sampling by Token Order Optimization
- NLI : Non-uniform Linear Interpolation Approximation of Nonlinear Operations for Efficient LLMs Inference
- No Caption, No Problem: Caption-Free Membership Inference via Model-Fitted Embeddings
- Noise-Adaptive Diffusion Sampling for Inverse Problems Without Task-Specific Tuning
- Noise-Aware Generalization: Robustness to In-Domain Noise and Out-of-Domain Generalization
- NoisePrints: Distortion-Free Watermarks for Authorship in Private Diffusion Models
- Noise Stability of Transformer Models
- Noise Tolerance of Distributionally Robust Learning
- ``Noisier'’ Noise Contrastive Estimation is (Almost) Maximum Likelihood
- Noisy but Valid: Robust Statistical Evaluation of LLMs with Imperfect Judges
- Noisy-Pair Robust Representation Alignment for Positive-Unlabeled Learning
- No labels, No Problem: Training Visual Reasoners with Multimodal Verifiers
- Non-Asymptotic Analysis of Efficiency in Conformalized Regression
- Non-Asymptotic Analysis of (Sticky) Track-and-Stop
- Non-Autoregressive Generation for Agentic Multi-Turn Interaction
- Non-Clashing Teaching in Graphs: Algorithms, Complexity, and Bounds
- Non-Collaborative User Simulators for Tool Agents
- Nonparametric Contextual Online Bilateral Trade
- Nonparametric Teaching of Attention Learners
- No, of Course I Can! Deeper Fine-Tuning Attacks That Bypass Token-Level Safety Mechanisms
- No outlier channels but with outlier blocks
- No Pixel Left Behind: A Detail-Preserving Architecture for Robust High-Resolution AI-Generated Image Detection
- No Prior, No Leakage: Revisiting Reconstruction Attacks in Trained Neural Networks
- No Prompt Left Behind: Exploiting Zero-Variance Prompts in LLM Reinforcement Learning via Entropy-Guided Advantage Shaping
- Not-a-Bandit: Provably No-Regret Drafter Selection in Speculative Decoding for LLMs
- Not All Bits Are Equal: How Model Scale Changes Memory-Optimal Reasoning
- Not All Clients Are Equal: Collaborative Model Personalization on Heterogeneous Multi-Modal Clients
- Not All Documents Are What You Need for Extracting Instruction Tuning Data
- Not All Models Suit Expert Offloading: On Local Routing Consistency of Mixture-of-Expert Models
- Not Search, But Scan: Benchmarking MLLMs on Scan-Oriented Academic Paper Reasoning
- NOVA3R: Non-pixel-aligned Visual Transformer for Amodal 3D Reconstruction
- NRGPT: An Energy-based Alternative for GPT
- Nudging the Boundaries of LLM Reasoning
- Null-Space Filtering for Data-free Continual Model Merging: Preserving Transparency, Promoting Fidelity
- Numerion: A Multi-Hypercomplex Model for Time Series Forecasting
- NurValues: Real-World Nursing Values Evaluation for Large Language Models in Clinical Context
- Nüwa: Mending the Spatial Integrity Torn by VLM Token Pruning
- Obfuscated Activations Bypass LLM Latent-Space Defenses
- Object-Centric Refinement for Enhanced Zero-Shot Segmentation
- Object-Centric World Models from Few-Shot Annotations for Sample-Efficient Reinforcement Learning
- Object Fidelity Diffusion for Remote Sensing Image Generation
- Obscure but Effective: Classical Chinese Jailbreak Prompt Optimization via Bio-Inspired Search
- OBS-Diff: Accurate Pruning For Diffusion Models in One-Shot
- OccDriver: Future Occupancy Guided Dual-branch Trajectory Planner in Autonomous Driving
- Occupancy Reward Shaping: Improving Credit Assignment for Offline Goal-Conditioned Reinforcement Learning
- OCR-Reasoning Benchmark: Unveiling the True Capabilities of MLLMs in Complex Text-Rich Image Reasoning
- Octax: Accelerated CHIP-8 Arcade Environments for Reinforcement Learning in JAX
- OD$^3$: Optimization-free Dataset Distillation for Object Detection
- ODEBrain: Continuous-Time EEG Graph for Modeling Dynamic Brain Networks
- ODE-GS: Latent ODEs for Dynamic Scene Extrapolation with 3D Gaussian Splatting
- ODI-Bench: Can MLLMs Understand Immersive Omnidirectional Environments?
- Offline Preference-Based Value Optimization
- Offline Reinforcement Learning with Adaptive Feature Fusion
- Off-Policy Evaluation for Ranking Policies under Deterministic Logging Policies
- Off-Policy Safe Reinforcement Learning with Cost-Constrained Optimistic Exploration
- OffTopicEval: When Large Language Models Enter the Wrong Chat, Almost Always!
- Off-Trajectory Reasoning: Can LLMs Collaborate on Reasoning Trajectory?
- OFMU: OPTIMIZATION-DRIVEN FRAMEWORK FOR MACHINE UNLEARNING
- OmniActor: A Generalist GUI and Embodied Agent for 2D&3D Worlds
- Omni-Captioner: Data Pipeline, Models, and Benchmark for Omni Detailed Perception
- OmniCT: Towards a Unified Slice-Volume LVLM for Comprehensive CT Analysis
- OmniCVR: A Benchmark for Omni-Composed Video Retrieval with Vision, Audio, and Text
- OmniEVA: Embodied Versatile Planner via Task-Adaptive 3D-Grounded and Embodiment-aware Reasoning
- OmniField: Conditioned Neural Fields for Robust Multimodal Spatiotemporal Learning
- Omni-iEEG: A Large-Scale, Comprehensive iEEG Dataset and Benchmark for Epilepsy Research
- Omni-IML: Towards Unified Interpretable Image Manipulation Localization
- OmniMouse: Scaling properties of multi-modal, multi-task Brain Models on 150B Neural Tokens
- OmniNav: A Unified Framework for Prospective Exploration and Visual-Language Navigation
- OmniPortrait: Fine-Grained Personalized Portrait Synthesis via Pivotal Optimization
- Omni-Reward: Towards Generalist Omni-Modal Reward Modeling with Free-Form Preferences
- OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models
- OmniSTVG: Toward Spatio-Temporal Omni-Object Video Grounding
- OmniText: A Training-Free Generalist for Controllable Text-Image Manipulation
- OmniVideoBench: Towards Audio-Visual Understanding Evaluation for Omni MLLMs
- Omni-View: Unlocking How Generation Facilitates Understanding in Unified 3D Model based on Multiview images
- OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM
- Omni-Weather: Unified Multimodal Foundation Model for Weather Generation and Understanding
- OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling
- Once-More: Continuous Self-Correction for Large Language Models via Perplexity-Guided Intervention
- On Code-Induced Reasoning in LLMs
- On Coreset for LASSO Regression Problem with Sensitivity Sampling
- On Discovering Algorithms for Adversarial Imitation Learning
- On Discriminative vs. Generative classifiers: Rethinking MLLMs for Action Understanding
- One2Scene: Geometric Consistent Explorable 3D Scene Generation from a Single Image
- One Demo Is All It Takes: Planning Domain Derivation with LLMs from A Single Demonstration
- One for Two: A Unified Framework for Imbalanced Graph Classification via Dynamic Balanced Prototype
- One Life to Learn: Inferring Symbolic World Models for Stochastic Environments from Unguided Exploration
- : One LLM Token for Explicit Graph Structural Understanding
- One Model for All Tasks: Leveraging Efficient World Models in Multi-Task Planning
- On Entropy Control in LLM-RL Algorithms
- One Patch Doesn’t Fit All: Adaptive Patching for Native-Resolution Multimodal Large Language Models
- One-Prompt Strikes Back: Sparse Mixture of Experts for Prompt-based Continual Learning
- One protein is all you need
- One-Shot Exemplars for Class Grounding in Self-Supervised Learning
- One Skill, Many Websites: Learning Generalizable Skills Through Polymorphic Abstraction
- One-Step Flow for Image Super-Resolution with Tunable Fidelity-Realism Trade-offs
- One-Step Flow Q-Learning: Addressing the Diffusion Policy Bottleneck in Offline Reinforcement Learning
- One step further with Monte-Carlo sampler to guide diffusion better
- One-Step Video Restoration via Diffusion Adversarial Post-Training
- OneTwoVLA: A Unified Vision-Language-Action Model with Adaptive Reasoning
- On Fairness of Task Arithmetic: The Role of Task Vectors
- On learning linear dynamical systems in context with attention layers
- Online Alignment as Perceptual Loss
- Online Black-Box Prompt Optimization with Regret Guarantees under Noisy Feedback
- Online Conformal Prediction with Adversarial Feedback via Regret Minimization
- Online Decision-Focused Learning
- Online Decision Making with Generative Action Sets
- Online Inventory Optimization in Non-Stationary Environment
- Online Learning and Equilibrium Computation with Ranking Feedback
- Online Minimization of Polarization and Disagreement via Low-Rank Matrix Bandits
- Online Navigation Refinement: Achieving Lane-Level Guidance by Associating Standard-Definition and Online Perception Maps
- Online Prediction of Stochastic Sequences with High Probability Regret Bounds
- Online Pseudo-Zeroth-Order Training of Neuromorphic Spiking Neural Networks
- Online Rounding and Learning Augmented Algorithms for Facility Location
- Online time series prediction using feature adjustment
- Only Brains Align with Brains: Cross-Region Patterns Expose Limits of Normative Models
- On Measuring Influence in Avoiding Undesired Future
- On Natural Ways to Generate and Their Provable Power
- On Optimal Hyperparameters for Differentially Private Deep Transfer Learning
- On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting
- On Predictability of Reinforcement Learning Dynamics for Large Language Models
- On Robustness of Vision-Language-Action Model against Multi-Modal Perturbations
- On Smoothness Bounds for Non-Clairvoyant Scheduling with Predictions
- On the $O(1/T)$ Convergence of Alternating Gradient Descent–Ascent in Bilinear Games
- On the Alignment Between Supervised and Self-Supervised Contrastive Learning
- On the Bayes Inconsistency of Disagreement Discrepancy Surrogates
- On the Benefits of Weight Normalization for Overparameterized Matrix Sensing
- On the Computational Limits of AI4S-RL : A Unified $\varepsilon$-$N$ Analysis
- On the Convergence Behavior of Preconditioned Gradient Descent Toward the Rich Learning Regime
- On the Convergence Direction of Gradient Descent
- On the Convergence of Two-Layer Kolmogorov-Arnold Networks with First-Layer Training
- On the Design of KL-Regularized Policy Gradient Algorithms for LLM Reasoning
- On the Design of One-step Diffusion via Shortcutting Flow Paths
- On the Direction of RLVR Updates for LLM Reasoning: Identification and Exploitation
- On the Eligibility of LLMs for Counterfactual Reasoning: A Decompositional Study
- On the Expressiveness of State Space Models via Temporal Logics
- On The Expressive Power of GNN Derivatives
- On the Expressive Power of GNNs for Boolean Satisfiability
- On-the-Fly Adaptation to Quantization: Configuration-Aware LoRA for Efficient Fine-Tuning of Quantized LLMs
- On The Fragility of Benchmark Contamination Detection in Reasoning Models
- On the Generalization Capacities of MLLMs for Spatial Intelligence
- On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification
- On the identifiability of causal graphs with multiple environments
- On the Impact of the Utility in Semivalue-based Data Valuation
- On the Impossibility of Separating Intelligence from Judgment: The Computational Intractability of Filtering for AI Alignment
- On the Interaction of Compressibility and Adversarial Robustness
- On the Interpolation Effect of Score Smoothing in Diffusion Models
- On the Limits of Sparse Autoencoders: A Theoretical Framework and Reweighted Remedy
- On the Lipschitz Continuity of Set Aggregation Functions and Neural Networks for Sets
- On the Predictive Power of Representation Dispersion in Language Models
- On the Reasoning Abilities of Masked Diffusion Language Models
- ON THE ROLE OF IMPLICIT REGULARIZATION OF STOCHASTIC GRADIENT DESCENT IN GROUP ROBUSTNESS
- On the Sample Complexity of GNNs
- On the Shelf Life of Finetuned LLM-Judges: Future Proofing, Backward Compatibility, and Question Generalization
- On the Spectral Differences Between NTK and CNTK and Their Implications for Point Cloud Recognition
- On The Surprising Effectiveness of a Single Global Merging in Decentralized Learning
- On the Tension Between Optimality and Adversarial Robustness in Policy Optimization
- On the Theoretical Limitations of Embedding-Based Retrieval
- On the Thinking-Language Modeling Gap in Large Language Models
- On the trade-off between expressivity and privacy in graph representation learning
- On the Universality and Complexity of GNN for Solving Second-order Cone Programs
- On the Wasserstein Geodesic Principal Component Analysis of probability measures
- On the Wings of Imagination: Conflicting Script-based Multi-role Framework for Humor Caption Generation
- On Universality of Deep Equivariant Networks
- OpenAgentSafety: A Comprehensive Framework For Evaluating Real-World AI Agent Safety
- OpenApps: Simulating Environment Variations to Measure UI Agent Reliability
- OpenEstimate: Evaluating LLMs on Probabilistic Estimation with Real-World Data
- OpenFly: A COMPREHENSIVE PLATFORM FOR AERIAL VISION-LANGUAGE NAVIGATION
- OpenPros: A Large-Scale Dataset for Limited View Prostate Ultrasound Computed Tomography
- Open-Set Semantic Gaussian Splatting SLAM with Expandable Representation
- OpenThoughts: Data Recipes for Reasoning Models
- Operationalizing Data Minimization for Privacy-Preserving LLM Prompting
- Operator Learning with Domain Decomposition for Geometry Generalization in PDE Solving
- Operator Theory-Driven Autoformulation of MDPs for Control of Queueing Systems
- OPPO: Accelerating PPO-based RLHF via Pipeline Overlap
- Opponent Shaping in LLM Agents
- OPRIDE: Efficient Offline Preference-based Reinforcement Learning via In-Dataset Exploration
- Optimal Aggregation of LLM and PRM Signals for Efficient Test-Time Scaling
- Optimal Brain Restoration for Joint Quantization and Sparsification of LLMs
- Optimal Robust Subsidy Policies for Irrational Agent in Principal-Agent MDPs
- Optimal Sparsity of Mixture-of-Experts Language Models for Reasoning Tasks
- OptimalThinkingBench: Evaluating Over and Underthinking in LLMs
- Optimal Transport-Induced Samples against Out-of-Distribution Overconfidence
- Optimal transport unlocks end-to-end learning for single-molecule localization
- Optimas: Optimizing Compound AI Systems with Globally Aligned Local Rewards
- Optimistic Task Inference for Behavior Foundation Models
- Optimizer Choice Matters For The Emergence of Neural Collapse
- Optimizing Agent Planning for Security and Autonomy
- Optimizing Canaries for Privacy Auditing with Metagradient Descent
- Optimizing Data Augmentation through Bayesian Model Selection
- Optimizing ID Consistency in Multimodal Large Models: Facial Restoration via Alignment, Entanglement, and Disentanglement
- OptimSyn: Influence-Guided Rubrics Optimization for Synthetic Data Generation
- OptMerge: Unifying Multimodal LLM Capabilities and Modalities via Model Merging
- Oracle-efficient Hybrid Learning with Constrained Adversaries
- Orak: A Foundational Benchmark for Training and Evaluating LLM Agents on Diverse Video Games
- Orbital Transformers for Predicting Wavefunctions in Time-Dependent Density Functional Theory
- ORCaS: Unsupervised Depth Completion via Occluded Region Completion as Supervision
- OrchestrationBench: LLM-Driven Agentic Planning and Tool Use in Multi-Domain Scenarios
- OrderDP: A Theoretically Guaranteed Lossless Dynamic Data Pruning Framework
- ORION: Decoupling and Alignment for Unified Autoregressive Understanding and Generation
- OR-PRM: A Process Reward Model for Algorithmic Problem in Operations Research
- OrthAlign: Orthogonal Subspace Decomposition for Non-Interfering Multi-Objective Alignment
- OrthoRF: Exploring Orthogonality in Object-Centric Representations
- OrthoSolver: A Neural Proper Orthogonal Decomposition Solver For PDEs
- OSCAR: Online Soft Compression for RAG
- OSIRIS: Bridging Analog Circuit Design and Machine Learning with Scalable Dataset Generation
- OSWorld-MCP: Benchmarking MCP Tool Invocation In Computer-Use Agents
- Otters: An Energy-Efficient Spiking Transformer via Optical Time-to-First-Spike Encoding
- Out-of-Distribution Graph Models Merging
- Out of the Memory Barrier: A Highly Memory-Efficient Training System for LLMs with Million-Token Contexts
- Out of the Shadows: Exploring a Latent Space for Neural Network Verification
- Output Supervision Can Obfuscate the Chain of Thought
- Outrageously Large Context Windows via RACE Attention -- A Family of Non-Linear Attention that can be calculated in Strictly Linear-Time
- Overcoming Joint Intractability with Lossless Hierarchical Speculative Decoding
- Overlap-Adaptive Regularization for Conditional Average Treatment Effect Estimation
- Overlap-weighted orthogonal meta-learner for treatment effect estimation over time
- Overparametrization bends the landscape: BBP transitions at initialization in simple Neural Networks
- Overshoot and Shrinkage in Classifier-Free Guidance: From Theory to Practice
- Oversmoothing, "Oversquashing'', Heterophily, Long-Range, and more: Demystifying Common Beliefs in Graph Machine Learning
- Overthinking Reduction with Decoupled Rewards and Curriculum Data Scheduling
- Overtone: Cyclic Patch Modulation for Cleaner, Faster Physics Emulators
- OVID: Open-Vocabulary Intrusion Detection
- OVSeg3R: Learn Open-vocabulary Instance Segmentation from 2D via 3D Reconstruction
- OWLEYE: ZERO-SHOT LEARNER FOR CROSSDOMAIN GRAPH DATA ANOMALY DETECTION
- OWL : Geometry-Aware Spatial Reasoning for Audio Large Language Models
- OXtal: An All-Atom Diffusion Model for Organic Crystal Structure Prediction
- P$^2$-DPO:Grounding Hallucination in Perceptual Processing via Calibration Direct Preference Optimization
- P2P: Automated Paper-to-Poster Generation and Fine-Grained Benchmark
- P3D: Highly Scalable 3D Neural Surrogates for Physics Simulations with Global Context
- PaAno: Patch-Based Representation Learning for Time-Series Anomaly Detection
- PAC-Bayes bounds for cumulative loss in Continual Learning
- PACEbench: A Framework for Evaluating Practical AI Cyber-Exploitation Capabilities
- PACE: Pretrained Audio Continual Learning
- PAGE-4D: Disentangled Pose and Geometry Estimation for 4D Perception
- PairFlow: Closed-Form Source-Target Coupling for Few-Step Generation in Discrete Flow Models
- Pairwise is Not Enough: Hypergraph Neural Networks for Multi-Agent Pathfinding
- PALC: Preference Alignment via Logit Calibration
- Pallatom-Ligand: an All-Atom Diffusion Model for Designing Ligand-Binding Proteins
- PAMDP: Interact to Persona Alignment via a Partially Observable Markov Decision Process
- Panda: A pretrained forecast model for chaotic dynamics
- Panoptic Pairwise Distortion Graph
- Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning
- Paper Copilot: Tracking the Evolution of Peer Review in AI Conferences
- Paradigm Shift of GNN Explainer from Label Space to Prototypical Representation Space
- ParallelBench: Understanding the Trade-offs of Parallel Decoding in Diffusion LLMs
- Parallel Multimodal Diffusion Language Models for Thinking-Aware Editing and Generation
- Parallel-R1: Towards Parallel Thinking via Reinforcement Learning
- Parallel Sampling from Masked Diffusion Models via Conditional Independence Testing
- Parallel Token Generation for Language Models
- Parameter-Efficient Reinforcement Learning using Prefix Optimization
- Parameterization-Based Dataset Distillation of 3D Point Clouds through Learnable Shape Morphing
- Parameterized Hardness of Zonotope Containment and Neural Network Verification
- Parameters vs. Context: Fine-Grained Control of Knowledge Reliance in Language Models
- ParaS2S: Benchmarking and Aligning Spoken Language Models for Paralinguistic-aware Speech-to-Speech Interaction
- PARD: Accelerating LLM Inference with Low‑Cost PARallel Draft Model Adaptation
- Pareto-Conditioned Diffusion Models for Offline Multi-Objective Optimization
- Pareto Variational Autoencoder
- ParoQuant: Pairwise Rotation Quantization for Efficient Reasoning LLM Inference
- Partially Equivariant Reinforcement Learning in Symmetry-Breaking Environments
- Partition Generative Modeling: Masked Modeling Without Masks
- Part-level Semantic-guided Contrastive Learning for Fine-grained Visual Classification
- PartSAM: A Scalable Promptable Part Segmentation Model Trained on Native 3D Data
- Part-X-MLLM: Part-aware 3D Multimodal Large Language Model
- PASER: Post-Training Data Selection for Efficient Pruned Large Language Model Recovery
- PAS: Estimating the target Accuracy before domain adaptation
- PAT3D: Physics-Augmented Text-to-3D Scene Generation
- Patch-as-Decodable-Token: Towards Unified Multi-Modal Vision Tasks in MLLMs
- PatchDNA: A Flexible and Biologically-Informed Alternative to Tokenization for DNA
- Patching Gaps In LLM Reasoning With Interventional Training
- PatchRefiner V2: Fast and Lightweight Real-Domain High-Resolution Metric Depth Estimation
- Path Channels and Plan Extension Kernels: a Mechanistic Description of Planning in a Sokoban RNN
- PathChat-SegR1: Reasoning Segmentation in Pathology via SO-GRPO
- Path Matters: Unveiling Geometric Implicit Bias via Curvature-Aware Sparse View Optimization
- Patronus: Interpretable Diffusion Models with Prototypes
- Pay Attention to CTC: Fast and Robust Pseudo-Labelling for Unified Speech Recognition
- Pay Less Attention to Function Words for Free Robustness of Vision-Language Models
- PCB-Bench: Benchmarking LLMs for Printed Circuit Board Placement and Routing
- PCLR: Progressively Compressed LoRA for Multimodal Continual Instruction Tuning
- PCPO: Proportionate Credit Policy Optimization for Preference Alignment of Image Generation Models
- PD$^{2}$GS: Part-Level Decoupling and Continuous Deformation of Articulated Objects via Gaussian Splatting
- Peak-Return Greedy Slicing: Subtrajectory Selection for Transformer-based Offline RL
- PEAR: Phase Entropy Aware Reward for Efficient Reasoning
- Pedagogically-Inspired Data Synthesis for Language Model Knowledge Distillation
- PEERING INTO THE UNKNOWN: ACTIVE VIEW SELECTION WITH NEURAL UNCERTAINTY MAPS FOR 3D RECONSTRUCTION
- Peng's Q($\lambda$) for Conservative Value Estimation in Offline Reinforcement Learning
- PepBenchmark: A Standardized Benchmark for Peptide Machine Learning
- PepTri: Tri-Guided All-Atom Diffusion for Peptide Design via Physics, Evolution, and Mutual Information
- Perception-Aware Policy Optimization for Multimodal Reasoning
- Perception-R1: Advancing Multimodal Reasoning Capabilities of MLLMs via Visual Perception Reward
- PerfGuard: A Performance-Aware Agent for Visual Content Generation
- PerFit: Exploring Personalization Shifts in Representation Space of LLMs
- PERK: Long-Context Reasoning as Parameter-Efficient Test-Time Learning
- Permutation-Consistent Variational Encoding for Incomplete Multi-View Multi-Label Classification
- PERSISTENCE SPHERES: BI-CONTINUOUS REPRESENTATIONS OF PERSISTENCE DIAGRAMS.
- PERSONA: Dynamic and Compositional Inference-Time Personality Control via Activation Vector Algebra
- Persona Features Control Emergent Misalignment
- Personalized Collaborative Learning with Affinity-Based Variance Reduction
- Personalized Feature Translation for Expression Recognition: An Efficient Source-Free Domain Adaptation Method
- Personalized Reasoning: Just-in-time Personalization and Why LLMs Fail at It
- PersonaX: Multimodal Datasets with LLM-Inferred Behavior Traits
- Person-Centric Annotations of LAION-400M: Auditing Bias and Its Transfer to Models
- PerSpectra: A Scalable and Configurable Pluralist Benchmark of Perspectives from Arguments
- Perturbation-Induced Linearization: Constructing Unlearnable Data with Solely Linear Classifiers
- Perturbed Dynamic Time Warping: A Probabilistic Framework and Generalized Variants
- PE-SGD: Differentially Private Deep Learning via Evolution of Gradient Subspace for Text
- PetaGAIL++: Utility Optimized Private Trajectory Generation with Imitation Learning
- PETRI: Learning Unified Cell Embeddings from Unpaired Modalities via Early-Fusion Joint Reconstruction
- pFedMMA: Personalized Federated Fine-Tuning with Multi-Modal Adapter for Vision-Language Models
- P-GenRM: Personalized Generative Reward Model with Test-time User-based Scaling
- PGRF-Net: A Prototype-Guided Relational Fusion Network for Diagnostic Multivariate Time-Series Anomaly Detection
- Phantom-Data: Towards a General Subject-Consistent Video Generation Dataset
- PhaseFormer: From Patches to Phases for Efficient and Effective Time Series Forecasting
- PHAT: Modeling Period Heterogeneity for Multivariate Time Series Forecasting
- Photon: Speedup Volume Understanding with Efficient Multimodal Large Language Models
- PHyCLIP: $\ell_1$-Product of Hyperbolic Factors Unifies Hierarchy and Compositionality in Vision-Language Representation Learning
- PhyScensis: Physics-Augmented LLM Agents for Complex Physical Scene Generation
- Physically-Guided Optical Inversion Enable Non-Contact Side-Channel Attack on Isolated Screens
- Physically Valid Biomolecular Interaction Modeling with Gauss-Seidel Projection
- Physics-Constrained Fine-Tuning of Flow-Matching Models for Generation and Inverse Problems
- Physics-Informed Audio-Geometry-Grid Representation Learning for Universal Sound Source Localization
- Physics-Informed Inference Time Scaling for Solving High-Dimensional Partial Differential Equations
- Physics-informed learning under mixing: How physical knowledge speeds up learning
- Physics-Inspired All-Pair Interaction Learning for 3D Dynamics Modeling
- Physics vs Distributions: Pareto Optimal Flow Matching with Physics Constraints
- PhysLLM: Harnessing Large Language Models for Cross-Modal Remote Physiological Sensing
- PICABench: How Far are We from Physical Realistic Image Editing?
- PiCa: Parameter-Efficient Fine-Tuning with Column Space Projection
- Pi-CCA: Prompt-Invariant CCA Certificates for Replay-Free Vision–Language Continual Learning
- PICS: Pairwise Image Compositing with Spatial Interactions
- pi-Flow: Policy-Based Few-Step Generation via Imitation Distillation
- PI-Light: Physics-Inspired Diffusion for Full-Image Relighting
- PILOT-Bench: Probabilistic Interaction for LLM Operations in Tool-driven Scenarios
- Pinet: Optimizing hard-constrained neural networks with orthogonal projection layers
- PINFDiT: Energy-Based Physics-Informed Diffusion Transformers for General-purpose Time Series Tasks
- PIRN: Prototypical-based Intra-modal Reconstruction with Normality Communication for Multi-modal Anomaly Detection.
- Pisces: Cryptography-based Private Retrieval-Augmented Generation with Dual-Path Retrieval
- Pitfalls in Evaluating Language Model Forecasters
- Pixel3DMM: Versatile Screen-Space Priors for Single-Image 3D Face Reconstruction
- PixelCraft: A Multi-Agent system for High-Fidelity Visual Reasoning on Structured Images
- Pixel-Level Residual Diffusion Transformer: Scalable 3D CT Volume Generation
- Pixel-Perfect Puppetry: Precision-Guided Enhancement for Face Image and Video Editing
- Pixel to Gaussian: Ultra-Fast Continuous Super-Resolution with 2D Gaussian Modeling
- PixelVLA: Advancing Pixel-level Understanding in Vision-Language-Action Model
- PixNerd: Pixel Neural Field Diffusion
- PLAGUE: Plug-and-play Framework for Lifelong Adaptive Generation of Multi-turn Exploits
- Plan and Budget: Effective and Efficient Test-Time Scaling on Reasoning Large Language Models
- Plan-Answer-Refine-on-Graph: Structured Planning and Self-Refinement for Large Language Model Reasoning on Knowledge Graphs
- PLANETALIGN: A Comprehensive Python Library for Benchmarking Network Alignment
- Planned Diffusion
- Planner Aware Path Learning in Diffusion Language Models Training
- Planning with an Embodied Learnable Memory
- Plan-R1: Safe and Feasible Trajectory Planning as Language Modeling
- Plan then Act: Bi-level CAD Command Sequence Generation
- PlantRSR: A New Plant Dataset and Method for Reference-based Super-Resolution
- Play to Generalize: Learning to Reason Through Game Play
- PLoP: Precise LoRA Placement for Efficient Finetuning of Large Models
- Plug-and-Play Compositionality for Boosting Continual Learning with Foundation Models
- Plug-and-Play Fidelity Optimization for Diffusion Transformer Acceleration via Cumulative Error Minimization
- Plug, Play, and Fortify: A Low-Cost Module for Robust Multimodal Image Understanding Models
- PluriHarms: Benchmarking the Full Spectrum of Human Judgments on AI Harm
- PMark: Towards Robust and Distortion-free Semantic-level Watermarking with Channel Constraints
- PMDformer: Patch-Mean Decoupling Transformer for Long-term Forecasting
- PMI: Flow-Based Inversion Correction via Proximal Operator
- PM-KVQ: Progressive Mixed-precision KV Cache Quantization for Long-CoT LLMs
- POEMetric: The Last Stanza of Humanity
- PoinnCARE: Hyperbolic Multi-Modal Learning for Enzyme Classification
- Point2RBox-v3: Self-Bootstrapping from Point Annotations via Integrated Pseudo-Label Refinement and Utilization
- Point-Focused Attention Meets Context-Scan State Space: Robust Biological Visual Perception for Point Cloud Representation
- Point-MoE: Large-Scale Multi-Dataset Training with Mixture-of-Experts for 3D Semantic Segmentation
- Point Prompting: Counterfactual Tracking with Video Diffusion Models
- PointRePar : SpatioTemporal Point Relation Parsing for Robust Category-Unified 3D Tracking
- Point-UQ: An Uncertainty-Quantification Paradigm for Point Cloud Few-Shot Class Incremental Learning
- Poisson Midpoint Method for Log Concave Sampling: Beyond the Strong Error Lower Bounds
- PoliCon: Evaluating LLMs on Achieving Diverse Political Consensus Objectives
- Policy Contrastive Decoding for Robotic Foundation Models
- PolicyFlow: Policy Optimization with Continuous Normalizing Flow in Reinforcement Learning
- Policy Likelihood-based Query Sampling and Critic-Exploited Reset for Efficient Preference-based Reinforcement Learning
- Policy Newton Algorithm in Reproducing Kernel Hilbert Space
- PoLi-RL: A Point-to-List Reinforcement Learning Framework for Conditional Semantic Textual Similarity
- Poly-attention: a general scheme for higher-order self-attention
- Polychromic Objectives for Reinforcement Learning
- PolyGraphScore: a classifier-based metric for evaluating graph generative models
- Polynomial Convergence of Riemannian Diffusion Models
- Polynomial, trigonometric, and tropical activations
- PolySHAP: Extending KernelSHAP with Interaction-Informed Polynomial Regression
- PonderLM: Pretraining Language Models to Ponder in Continuous Space
- Pose Prior Learner: Unsupervised Categorical Prior Learning for Pose Estimation
- Pose-RFT: Aligning MLLMs for 3D Pose Generation via Hybrid Action Reinforcement Fine-Tuning
- PoseX: AI Defeats Physics-based Methods on Protein Ligand Cross-Docking
- PoSh: Using Scene Graphs to Guide LLMs-as-a-Judge for Detailed Image Descriptions
- Positional Encoding Field
- Positional Preservation Embedding for Multimodal Large Language Models
- Post-AGI Science and Society Workshop
- PostAlign: Multimodal Grounding as a Corrective Lens for MLLMs
- PosterCraft: Rethinking High-Quality Aesthetic Poster Generation in a Unified Framework
- Post-hoc Probabilistic Vision-Language Models
- Post-training Large Language Models for Diverse High-Quality Responses
- Post-Training Quantization for Video Matting
- Potentially Optimal Joint Actions Recognition for Cooperative Multi-Agent Reinforcement Learning
- PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance
- Practical estimation of the optimal classification error with soft labels and calibration
- Pragma-VL: Towards a Pragmatic Arbitration of Safety and Helpfulness in MLLMs
- Precise and Interpretable Editing of Code Knowledge in Large Language Models
- PreciseCache: Precise Feature Caching for Efficient and High-fidelity Video Generation
- Predictability Shapes Adaptation: An Evolutionary Perspective on Modes of Learning in Transformers
- Predicting Kernel Regression Learning Curves from Only Raw Data Statistics
- Predicting LLM Output Length via Entropy-Guided Representations
- Predicting LLM Reasoning Performance with Small Proxy Model
- Predicting Training Re-evaluation Curves Enables Effective Data Curriculums for LLMs
- Prediction with Expert Advice under Local Differential Privacy
- Predictive CVaR Q-learning
- Predictive Differential Training Guided by Training Dynamics
- PredNext: Explicit Cross-View Temporal Prediction for Unsupervised Learning in Spiking Neural Networks
- Preference-based Policy Optimization from Sparse-reward Offline Dataset
- Preference Leakage: A Contamination Problem in LLM-as-a-judge
- PreferThinker: Reasoning-based Personalized Image Preference Assessment
- PrefixMemory-Tuning: Modernizing Prefix-Tuning by Decoupling the Prefix from Attention
- Premise Selection for a Lean Hammer
- Presenting a Paper is an Art: Self-Improvement Aesthetic Agents for Academic Presentations
- Preserve and Personalize: Personalized Text-to-Image Diffusion Models without Distributional Drift
- Preserve and Sculpt: Manifold-Aligned Fine-tuning of Vision-Language Models for Few-Shot Learning
- Preserving Forgery Artifacts: AI-Generated Video Detection at Native Scale
- Pre-training Limited Memory Language Models with Internal and External Knowledge
- Pre-training LLM without Learning Rate Decay Enhances Supervised Fine-Tuning
- Pretraining Scaling Laws for Generative Evaluations of Language Models
- Pre-training under infinite compute
- Pretraining with hierarchical memories: separating long-tail and common knowledge
- Pretraining with Re-parametrized Self-Attention: Unlocking Generalizationin SNN-Based Neural Decoding Across Time, Brains, and Tasks
- Pretrain–Test Task Alignment Governs Generalization in In-Context Learning
- Pretrain Value, Not Reward: Decoupled Value Policy Optimization
- Preventing Model Collapse Under Overparametrization: Optimal Mixing Ratios for Interpolation Learning and Ridge Regression
- Price of Quality: Sufficient Conditions for Sparse Recovery using Mixed-Quality Data
- Prima.cpp: Fast 30-70B LLM Inference on Heterogeneous and Low-Resource Home Clusters
- Primal-Dual Policy Optimization for Adversarial Linear CMDPs
- Primary-Fine Decoupling for Action Generation in Robotic Imitation
- Principled Design for Trustworthy AI: Interpretability, Robustness, and Safety Across Modalities
- Principled Fast and Meta Knowledge Learners for Continual Reinforcement Learning
- Principled RL for Diffusion LLMs Emerges from a Sequence-Level Perspective
- Prior-aware and Context-guided Group Sampling for Active Probabilistic Subsampling
- Prior-based Noisy Text Data Filtering: Fast and Strong Alternative For Perplexity
- Prior-free Tabular Test-time Adaptation
- PriorGuide: Test-Time Prior Adaptation for Simulation-Based Inference
- Priors in time: Missing inductive biases for language model interpretability
- PrismAudio: Decomposed Chain-of-Thought and Multi-dimensional Rewards for Video-to-Audio Generation
- PRISM: Controllable Diffusion for Compound Image Restoration with Scientific Fidelity
- PRISM: Enhancing PRotein Inverse Folding through Fine- Grained Retrieval on Structure-Sequence Multimodal Representations
- PRISM: Festina Lente Proactivity—Risk-Sensitive, Uncertainty-Aware Deliberation for Proactive Agents
- PRISMM-Bench: A Benchmark of Peer-Review Grounded Multimodal Inconsistencies
- PRISM: Partial-label Relational Inference with Spatial and Spectral Cues
- PRISM-Physics: Causal DAG-Based Process Evaluation for Physics Reasoning
- PRISM: Progressive Robust Learning for Open-World Continual Category Discovery
- PRISON: Unmasking the Criminal Potential of Large Language Models
- Privacy Beyond Pixels: Latent Anonymization for Privacy-Preserving Video Understanding
- Privacy-Protected Causal Survival Analysis Under Distribution Shift
- Private Rate-Constrained Optimization with Applications to Fair Learning
- Probabilistic Kernel Function for Fast Angle Testing
- Probability Distributions Computed by Hard-Attention Transformers
- Probing in the Dark: State Entropy Maximization for POMDPs
- Probing Rotary Position Embeddings through Frequency Entropy
- Probing to Refine: Reinforcement Distillation of LLM Reasoners via Explanatory Inversion
- Procedural Mistake Detection via Action Effect Modeling
- Process-Level Trajectory Evaluation for Environment Configuration in Software Engineering Agents
- Process-Verified Reinforcement Learning for Theorem Proving via Lean
- Product of Experts for Visual Generation
- Product-Quantised Image Representation for High-Quality Image Synthesis
- ProfBench: Multi-Domain Rubrics requiring Professional Knowledge to Answer and Judge
- Programming by Backprop: Learning Behaviour from Symbolic Descriptions
- Programming with Pixels: Can Computer-Use Agents do Software Engineering?
- Progressive Gaussian Transformer with Anisotropy-aware Sampling for Open Vocabulary Occupancy Prediction
- Progressive Online Video Understanding with Evidence-Aligned Timing and Transparent Decisions
- Projected Coupled Diffusion for Test-Time Constrained Joint Generation
- PRO-MOF: Policy Optimization with Universal Atomistic Models for Controllable MOF Generation
- Prompt and Parameter Co-Optimization for Large Language Models
- Prompt Curriculum Learning for Efficient LLM Post-Training
- PromptHub: Enhancing Multi-Prompt Visual In-Context Learning with Locality-Aware Fusion, Concentration and Alignment
- Prompt-MII: Meta-Learning Instruction Induction for LLMs
- Prompt-Robust Vision-Language Models via Meta-Finetuning
- ProofBridge: Auto-Formalization of Natural Language Proofs in Lean via Joint Embeddings
- ProofFlow: A Dependency Graph Approach to Faithful Proof Autoformalization
- ProofOptimizer: Training Language Models to Simplify Proofs without Human Demonstrations
- Propaganda AI: An Analysis of Semantic Divergence in Large Language Models
- PropensityBench: Evaluating Latent Safety Risks in Large Language Models via an Agentic Approach
- ProPerSim: Developing Proactive and Personalized AI Assistants through User-Assistant Simulation
- Property-Driven Protein Inverse Folding with Multi-Objective Preference Alignment
- Proper Velocity Neural Networks
- ProRe: A Proactive Reward System for GUI Agents via Reasoner–Actor Collaboration
- ProReGen: Progressive Residual Generation under Attribute Correlations
- ProSafePrune: Projected Safety Pruning for Mitigating Over-Refusal in LLMs
- Prosperity before Collapse: How Far Can Off-Policy RL Reach with Stale Data on LLMs?
- ProstaTD: Bridging Surgical Triplet from Classification to Fully Supervised Detection
- PROS: Towards Compute-Efficient RLVR via Rollout Prefix Reuse
- PROTDYN: A FOUNDATION PROTEIN LANGUAGE MODEL FOR THERMODYNAMICS AND DYNAMICS GENERATION
- Protection against Source Inference Attacks in Federated Learning
- ProteinAE: Protein Diffusion Autoencoders for Structure Encoding
- Protein Structure Tokenization via Geometric Byte Pair Encoding
- ProtoKV: Long-context Knowledges Are Already Well-Organized Before Your Query
- ProtoTS: Learning Hierarchical Prototypes for Explainable Time Series Forecasting
- Provable Guarantees for Automated Circuit Discovery in Mechanistic Interpretability
- Provable Separations between Memorization and Generalization in Diffusion Models
- Provably Accelerated Imaging with Restarted Inertia and Score-based Image Priors
- Provably Explaining Neural Additive Models
- Provably Tracking Equivalent Mechanistic Interpretations Across Neural Networks
- Proving the Limited Scalability of Centralized Distributed Optimization via a New Lower Bound Construction
- Proximal Diffusion Neural Sampler
- Proximal Supervised Fine-Tuning
- ProxyAttn: Guided Sparse Attention via Representative Heads
- ProxyThinker: Test-Time Guidance through Small Visual Reasoners
- Prune Redundancy, Preserve Essence: Vision Token Compression in VLMs via Synergistic Importance-Diversity
- Prune-then-Quantize or Quantize-then-Prune? Understanding the Impact of Compression Order in Joint Model Compression
- Pruning as a Cooperative Game: Surrogate-Assisted Layer Contribution Estimation for Large Language Models
- Pruning Long Chain-of-Thought of Large Reasoning Models via Small-Scale Preference Optimization
- PSDNorm: Temporal Normalization for Deep Learning in Sleep Staging
- Pseudo-Non-Linear Data Augmentation: A Constrained Energy Minimization Viewpoint
- PSP: Prompt-Guided Self-Training Sampling Policy for Active Prompt Learning
- PT$^2$-LLM: Post-Training Ternarization for Large Language Models
- PTNET: A PROPOSAL-CENTRIC TRANSFORMER NET- WORK FOR 3D OBJECT DETECTION
- PTQ4ARVG: Post-Training Quantization for AutoRegressive Visual Generation Models
- PU-BENCH: A UNIFIED BENCHMARK FOR RIGOROUS AND REPRODUCIBLE PU LEARNING
- Pulp Motion: Framing-aware multimodal camera and human motion generation
- Purifying Generative LLMs from Backdoors without Prior Knowledge or Clean Reference
- Purrception: Variational Flow Matching for Vector-Quantized Image Generation
- Pursuing Minimal Sufficiency in Spatial Reasoning
- Pusa V1.0: Unlocking Temporal Control in Pretrained Video Diffusion Models via Vectorized Timestep Adaptation
- Pushing on Multilingual Reasoning Models with Language-Mixed Chain-of-Thought
- Pushing Test-Time Scaling Limits of Deep Search with Asymmetric Verification
- PuzzleWorld: A Benchmark for Multimodal, Open-Ended Reasoning in Puzzlehunts
- Pyramid Patchification Flow for Visual Generation
- PYRREGULAR: A Unified Framework for Irregular Time Series, with Classification Benchmarks
- pySpatial: Generating 3D Visual Programs for Zero-Shot Spatial Reasoning
- Q&C: When Quantization Meets Cache in Efficient Generation
- QeRL: Beyond Efficiency - Quantization-enhanced Reinforcement Learning for LLMs
- QKV Projections Require a Fraction of Their Memory
- Q-Learning with Adjoint Matching
- Q-Learning with Fine-Grained Gap-Dependent Regret
- Q-learning with Posterior Sampling
- QLIP: A Dynamic Quadtree Vision Prior Enhances MLLM Performance Without Retraining
- QPrompt-R1: Real-Time Reasoning for Domain-Generalized Semantic Segmentation via Group-Relative Query Alignment
- Q-RAG: Long Context Multi‑Step Retrieval via Value‑Based Embedder Training
- Qronos: Correcting the Past by Shaping the Future... in Post-Training Quantization
- QuadGPT: Native Quadrilateral Mesh Generation with Autoregressive Models
- Quadratic Direct Forecast for Training Multi-Step Time-Series Forecast Models
- Quagmires in SFT-RL Post-Training: When High SFT Scores Mislead and What to Use Instead
- QuaMo: Quaternion Motions for Vision-based 3D Human Kinematics Capture
- Quant-dLLM: Post-Training Extreme Low-Bit Quantization for Diffusion Large Language Models
- Quantifying Cross-Attention Interaction in Transformers for Interpreting TCR-pMHC Binding
- Quantile Advantage Estimation for Entropy-Safe Reasoning
- Quantitative Bounds for Length Generalization in Transformers
- Quantization-Aware Diffusion Models For Maximum Likelihood Training
- Quantized Gradient Projection for Memory-Efficient Continual Learning
- Quantized Visual Geometry Grounded Transformer
- QuantSparse: Comprehensively Compressing Video Diffusion Transformer with Model Quantization and Attention Sparsification
- Quantum machine learning advantages beyond hardness of evaluation
- Quartet of Diffusions: Structure-Aware Point Cloud Generation through Part and Symmetry Guidance
- Quasi-Equivariant Metanetworks
- Quasi-Monte Carlo Methods Enable Extremely Low-Dimensional Deep Generative Models
- Query-Aware Flow Diffusion for Graph-Based RAG with Retrieval Guarantees
- Query-Guided Spatial–Temporal–Frequency Interaction for Music Audio–Visual Question Answering
- Query-Level Uncertainty in Large Language Models
- Query-Specific Causal Graph Pruning Under Tiered Knowledge
- QueryStream: Advancing Streaming Video Understanding with Query-Aware Pruning and Proactive Response
- QuestA: Expanding Reasoning Capacity in LLMs via Question Augmentation
- QUEST: A robust attention formulation using query-modulated spherical attention
- Queue Length Regret Bounds for Contextual Queueing Bandits
- QuoKA: Query-Oriented KV Selection for Efficient LLM Prefill
- Quotient-Space Diffusion Model
- QuRL: Low-Precision Reinforcement Learning for Efficient Reasoning
- QuRL: Rubrics As Judge For Open-Ended Question Answering
- QVGen: Pushing the Limit of Quantized Video Generative Models
- QWHA: Quantization-Aware Walsh-Hadamard Adaptation for Parameter-Efficient Fine-Tuning on Large Language Models
- R1-Code-Interpreter: LLMs Reason with Code via Supervised and Multi-stage Reinforcement Learning
- R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning
- R2-Dreamer: Redundancy-Reduced World Models without Decoders or Augmentation
- R2PS: Worst-Case Robust Real-Time Pursuit Strategies under Partial Observability
- R4: Nested Reasoning-Retrieval for Reward Modeling in Role-Playing Agents
- RADAR: Learning to Route with Asymmetry-aware Distance Representations
- RADAR: Reasoning–Ability and Difficulty-Aware Routing in Language Models
- Radiometrically Consistent Gaussian Surfels for Inverse Rendering
- RAEE: A Robust Retrieval-Augmented Early Exit Framework for Efficient Inference
- RAG4DMC: Retrieval-Augmented Generation for Data-Level Modality Completion
- Rainbow Padding: Mitigating Early Termination in Instruction-Tuned Diffusion LLMs
- RAIN-Merging: A Gradient-Free Method to Enhance Instruction Following in Large Reasoning Models with Preserved Thinking Format
- RainPro-8: An Efficient Deep Learning Model to Estimate Rainfall Probabilities Over 8 Hours
- Random Anchors with Low-rank Decorrelated Learning: A Minimalist Pipeline for Class-Incremental Medical Image Classification
- Random Controlled Differential Equations
- Randomization Boosts KV Caching, Learning Balances Query Load: A Joint Perspective
- Randomized Antipodal Search Done Right for Data Pareto Improvement of LLM Unlearning
- Random Label Prediction Heads for Studying and Controlling Memorization in Deep Neural Networks
- Random Policy Valuation is Enough for LLM Reasoning with Verifiable Rewards
- Random-projection ensemble dimension reduction
- Random Spiking Neural Networks are Stable and Spectrally Simple
- RankFlow: Property-aware Transport for Protein Optimization
- RankLLM: Weighted Ranking of LLMs by Quantifying Question Difficulty
- RAP: 3D Rasterization Augmented End-to-End Planning
- RAPID$^3$: Tri-Level Reinforced Acceleration Policies for Diffusion Transformer
- Rapid Training of Hamiltonian Graph Networks Using Random Features
- RAR: Reversing Visual Attention Re-Sinking for Unlocking Potential in Multimodal Large Language Models
- RAS: Retrieval-And-Structuring for Knowledge-Intensive LLM Generation
- RATE-DISTORTION OPTIMIZED COMMUNICATION FOR COLLABORATIVE PERCEPTION
- Rating Quality of Diverse Time Series Data by Meta-learning from LLM Judgment
- RAVENEA: A Benchmark for Multimodal Retrieval-Augmented Visual Culture Understanding
- RAVEN: End-to-end Equivariant Robot Learning with RGB Cameras
- RayI2P: Learning Rays for Image-to-Point Cloud Registration
- RCPU: Rotation-Constrained Error Compensation for Structured Pruning of a Large Language Model
- RD-HRL: Generating Reliable Sub-Goals for Long-Horizon Sparse-Reward Tasks
- ReactDance: Hierarchical Representation for High-Fidelity and Coherent Long-Form Reactive Dance Generation
- ReactID: Synchronizing Realistic Actions and Identity in Personalized Video Generation
- Reading Images Like Texts: Sequential Image Understanding in Vision-Language Models
- Readout Representation: Redefining Neural Codes by Input Recovery
- Read the Room: Video Social Reasoning with Mental-Physical Causal Chains
- RealBench: A Benchmark for Complex Physical Systems with Real-World Data
- ReALM-GEN: Real-World Constrained and Preference-Aligned Flow- and Diffusion-based Generative Models
- REAL: Reading Out Transformer Activations for Precise Localization in Language Model Steering
- Real-Time Motion-Controllable Autoregressive Video Diffusion
- Real-Time Reasoning Agents in Evolving Environments
- Real-Time Robot Execution with Masked Action Chunking
- Realtime Video Frame Interpolation using One-Step Diffusion Sampling
- REAP the Experts: Why Pruning Prevails for One-Shot MoE compression
- REA-RL: Reflection-Aware Online Reinforcement Learning for Efficient Reasoning
- reAR: Rethinking Visual Autoregressive Models via Token-wise Consistency Regularization
- Reasoned Safety Alignment: Ensuring Jailbreak Defense via Answer-Then-Check
- Reasoning-Aligned Perception Decoupling for Scalable Multi-modal Reasoning
- Reasoning as Representation: Rethinking Visual Reinforcement Learning in Image Quality Assessment
- ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory
- Reasoning Boosts Opinion Alignment in LLMs
- Reasoning-Driven Multimodal LLM for Domain Generalization
- Reasoning in Space via Grounding in the World
- Reasoning Language Model Inference Serving Unveiled: An Empirical Study
- Reasoning Models Can be Accurately Pruned Via Chain-of-Thought Reconstruction
- Reasoning on Time-Series for Financial Technical Analysis
- Reasoning or Retrieval? A Study of Answer Attribution on Large Reasoning Models
- Reasoning Scaffolding: Distilling the Flow of Thought from LLMs
- Reasoning without Training: Your Base Model is Smarter Than You Think
- Reassessing Layer Pruning in LLMs: New Insights and Methods
- ReCAPA: Hierarchical Predictive Correction to Mitigate Cascading Failures
- RECAST: Expanding the Boundaries of LLMs' Complex Instruction Following with Multi-Constraint Data
- RECODE: A Benchmark for Research Code DEvelopment with Interactive Human Feedback
- ReCogDrive: A Reinforced Cognitive Framework for End-to-End Autonomous Driving
- Reconciling Visual Perception and Generation in Diffusion Models
- RECON: Robust symmetry discovery via Explicit Canonical Orientation Normalization
- Reconstruct Anything Model a lightweight foundation model for computational imaging
- Reconstructing KV Caches with Cross-Layer Fusion for Enhanced Transformers
- Reconstruction Alignment Improves Unified Multimodal Models
- ReconViaGen: Towards Accurate Multi-view 3D Object Reconstruction via Generation
- Recover Cell Tensor: Diffusion-Equivalent Tensor Completion for Fluorescence Microscopy Imaging
- Rectified Decoupled Dataset Distillation: A Closer Look for Fair and Comprehensive Evaluation
- Rectifying LLM Thought from Lens of Optimization
- Recurrent Action Transformer with Memory
- RedacBench: Can AI Erase Your Secrets?
- RedCodeAgent: Automatic Red-teaming Agent against Diverse Code Agents
- ReDDiT: Rehashing Noise for Discrete Visual Generation
- Redirection for Erasing Memory (REM): Towards a universal unlearning method for corrupted data
- RedSage: A Cybersecurity Generalist LLM
- RedTeamCUA: Realistic Adversarial Testing of Computer-Use Agents in Hybrid Web-OS Environments
- Reducing Class-Wise Performance Disparity via Margin Regularization
- Reducing Contextual Stochastic Bilevel Optimization via Structured Function Approximation
- Reducing information dependency does not cause training data privacy. Adversarially non-robust features do.
- Reducing Semantic Mismatch in Brain-to-Text Decoding Through Personalized Multimodal Masking
- Reducing Symmetry Increase in Equivariant Neural Networks
- Reevaluating Policy Gradient Methods for Imperfect-Information Games
- Ref-Adv: Exploring MLLM Visual Reasoning in Referring Expression Tasks
- RefAny3D: 3D Asset-Referenced Diffusion Models for Image Generation
- Reference Guided Skill Discovery
- Referring Layer Decomposition
- RefineBench: Evaluating Refinement Capability in Language Models
- Refine Drugs, Don’t Complete Them: Uniform-Source Discrete Flows for Fragment-Based Drug Discovery
- Refine Now, Query Fast: A Decoupled Refinement Paradigm for Implicit Neural Fields
- RefineStat: Efficient Exploration for Probabilistic Program Synthesis
- Refining Hybrid Genetic Search for CVRP via Reinforcement Learning-Finetuned LLM
- ReFocusEraser: Refocusing for Small Object Removal with Robust Context-Shadow Repair
- Reforming the Mechanism: Editing Reasoning Patterns in LLMs with Circuit Reshaping
- ReFORM: Reflected Flows for On-support Offline RL via Noise Manipulation
- ReForm: Reflective Autoformalization with Prospective Bounded Sequence Optimization
- Reformulation for Pretraining Data Augmentation
- RefTool: Reference-Guided Tool Creation for Knowledge-Intensive Reasoning
- ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding
- RegionE: Adaptive Region-Aware Generation for Efficient Image Editing
- RegionReasoner: Region-Grounded Multi-Round Visual Reasoning
- Regret-Guided Search Control for Efficient Learning in AlphaZero
- Regularized Latent Dynamics Prediction is a Strong Baseline For Behavioral Foundation Models
- Regulating Internal Evidence Flows for Robust Learning Under Spurious Correlations
- REI-Bench: Can Embodied Agents Understand Vague Human Instructions in Task Planning?
- ReIn: Conversational Error Recovery with Reasoning Inception
- Reinforced Latent Reasoning for LLM-based Recommendation
- Reinforcement Learning Fine-Tuning Enhances Activation Intensity and Diversity in the Internal Circuitry of LLMs
- Reinforcement Learning for Machine Learning Engineering Agents
- Reinforcement Learning from Dynamic Critic Feedback for Free-Form Generations
- Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs
- Reinforcement Unlearning via Group Relative Policy Optimization
- Reinforcing Diffusion Models by Direct Group Preference Optimization
- Reinforcing General Reasoning Without Verifiers
- Rejuvenating Cross-Entropy Loss in Knowledge Distillation for Recommender Systems
- ReLaSH: Reconstructing Joint Latent Spaces for Efficient Generation of Synthetic Hypergraphs with Hyperlink Attributes
- Relational Feature Caching for Accelerating Diffusion Transformers
- Relational Graph Transformer
- Relational Transformer: Toward Zero-Shot Foundation Models for Relational Data
- Relationship Alignment for View-aware Multi-view Clustering
- Relative Entropy Pathwise Policy Optimization
- Relative Value Learning
- Relatron: Automating Relational Machine Learning over Relational Databases
- RelayFormer: A Unified Local-Global Attention Framework for Scalable Image and Video Manipulation Localization
- ReLi3D: Relightable Multi-view 3D Reconstruction with Disentangled Illumination
- Reliability-Adjusted Prioritized Experience Replay
- Reliable Evaluation of MRI Motion Correction: Dataset and Insights
- Reliable Fine-Grained Evaluation of Natural Language Math Proofs
- Reliable Poisoned Sample Detection against Backdoor Attacks Enhanced by Sharpness Aware Minimization
- Reliable Probabilistic Forecasting of Irregular Time Series through Marginalization-Consistent Flows
- Reliable Weak-to-Strong Monitoring of LLM Agents
- Remaining-data-free Machine Unlearning by Suppressing Sample Contribution
- REMem: Reasoning with Episodic Memory in Language Agent
- Remotely Detectable Robot Policy Watermarking
- Rényi Sharpness: A Novel Sharpness that Strongly Correlates with Generalization
- RepIt: Steering Language Models with Concept-Specific Refusal Vectors
- Replicable Reinforcement Learning with Linear Function Approximation
- Representational Alignment Across Model Layers and Brain Regions with Hierarchical Optimal Transport
- Representational Alignment (Re$^4$-Align)
- Representation Alignment for Diffusion Transformers without External Components
- Representation-Based Exploration for Language Models: From Test-Time to Post-Training
- Representing local protein environments with machine learning force fields
- RePrompt: Reasoning-Augmented Reprompting for Text-to-Image Generation via Reinforcement Learning
- RepSpec: Structural Re-parameterized Draft Model Training for Speculative Decoding
- Repurposing Foundation Model for Generalizable Medical Time Series Classification
- Repurposing Synthetic Data for Fine-grained Search Agent Supervision
- RESA: Bringing Back What Sparse Attention Ignores with Residual Estimation
- RESCHED: Rethinking Flexible Job Shop Scheduling from a Transformer-based Architecture with Simplified States
- ResCP: Reservoir Conformal Prediction for Time Series Forecasting
- Rescue: Retrieval Augmented Secure Code Generation
- ResearchRubrics: A Benchmark of Prompts and Rubrics For Deep Research Agents
- RESFL: An Uncertainty-Aware Framework for Responsible Federated Learning by Balancing Privacy, Fairness and Utility
- Reshaping Reasoning in LLMs: A Theoretical Analysis of RL Training Dynamics through Pattern Selection
- Residual Feature Integration is Sufficient to Prevent Negative Transfer
- Resisting Contextual Interference in RAG via Parametric-Knowledge Reinforcement
- Resp-Agent: An Agent-Based System for Multimodal Respiratory Sound Generation and Disease Diagnosis
- ReSplat: Degradation-agnostic Feed-forward Gaussian Splatting via Self-guided Residual Diffusion
- ReST-KV: Robust KV Cache Eviction with Layer-wise Output Reconstruction and Spatial-Temporal Smoothing
- RestoreVAR: Visual Autoregressive Generation for All-in-One Image Restoration
- RESTRAIN: From Spurious Votes to Signals — Self-Training RL with Self-Penalization
- ResT: Reshaping Token-Level Policy Gradients for Tool-Use Large Language Models
- Resurfacing the Instance-only Dependent Label Noise Model through Loss Correction
- ResWorld: Temporal Residual World Model for End-to-End Autonomous Driving
- ReTabAD: A Benchmark for Restoring Semantic Context in Tabular Anomaly Detection
- Retain and Adapt: Auto-Balanced Model Editing for Open-Vocabulary Object Detection under Domain Shifts
- Retaining Suboptimal Actions to Follow Shifting Optima in Multi-Agent Reinforcement Learning
- Rethinking Benign Relearning: Syntax as the Hidden Driver of Unlearning Failures
- Rethinking Bottlenecks in Safety Fine-Tuning of Vision Language Models
- Rethinking Causal Mask Attention for Vision-Language Inference
- Rethinking Code Similarity for Automated Algorithm Design with LLMs
- Rethinking Consistent Multi-Label Classification under Inexact Supervision
- Rethinking Continual Learning with Progressive Neural Collapse
- Rethinking Data Curation in LLM Training: Online Reweighting Offers Better Generalization than Offline Methods
- Rethinking Driving World Model as Synthetic Data Generator for Perception Tasks
- Rethinking Expressivity and Degradation-Awareness in Attention for All-in-One Blind Image Restoration
- Rethinking JEPA: Compute‑Efficient Video Self-Supervised Learning with Frozen Teachers
- Rethinking Layer Relevance in Large Language Models Beyond Cosine Similarity
- Rethinking LLM-as-a-Judge: Representation-as-a-Judge with Small Language Models via Semantic Capacity Asymmetry
- Rethinking LLM Evaluation: Can We Evaluate LLMs with 200× Less Data?
- Rethinking LLM Reasoning: From Explicit Trajectories to Latent Representations
- Rethinking LoRA for Privacy-Preserving Federated Learning in Large Models
- Rethinking Model Calibration through Spectral Entropy Regularization in Medical Image Segmentation
- Rethinking Pareto Frontier: On the Optimal Trade-offs in Fair Classification
- Rethinking Policy Diversity in Ensemble Policy Gradient in Large-Scale Reinforcement Learning
- Rethinking Radiology Report Generation: From Narrative Flow to Topic-Guided Findings
- Rethinking Residual Errors in Compensation-based LLM Quantization
- Rethinking the Gold Standard: Why Discrete Curvature Fails to Fully Capture Over-squashing in GNNs?
- Rethinking Uncertainty Estimation in LLMs: A Principled Single-Sequence Measure
- Rethinking Unsupervised Cross-modal Flow Estimation: Learning from Decoupled Optimization and Consistency Constraint
- ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
- ReTrace: Reinforcement Learning-Guided Reconstruction Attacks on Machine Unlearning
- Retrieval-of-Thought: Efficient Reasoning via Reusing Thoughts
- Retro*: Optimizing LLMs for Reasoning-Intensive Document Retrieval
- Retrospective Sparse Attention for Efficient Long-Context Generation
- Reusing Pre-Training Data at Test Time is a Compute Multiplier
- ReVeal: Self-Evolving Code Agents via Reliable Self-Verification
- Revela: Dense Retriever Learning via Language Modeling
- Revenue Maximization Under Sequential Price Competition Via The Estimation Of $s$-Concave Demand Functions
- Reverse Distillation: Disentangling and Scaling Protein Language Model Representations
- Reverse-Engineered Reasoning for Open-Ended Generation
- Reversible Primitive–Composition Alignment for Continual Vision–Language Learning
- Revisiting [CLS] and Patch Token Interaction in Vision Transformers
- Revisiting Confidence Calibration for Misclassification Detection in VLMs
- Revisiting Global Text Conditioning in Diffusion Transformers
- Revisiting Group Relative Policy Optimization: Insights into On-Policy and Off-Policy Training
- Revisiting Long-context Modeling from Context Denoising Perspective
- Revisiting Matrix Sketching in Linear Bandits: Achieving Sublinear Regret via Dyadic Block Sketching
- Revisiting Multimodal Positional Encoding in Vision–Language Models
- Revisiting Nonstationary Kernel Design for Multi-Output Gaussian Processes
- Revisiting Parameter Server in LLM Post-Training
- Revisiting Sharpness-Aware Minimization: A More Faithful and Effective Implementation
- Revisiting the Past: Data Unlearning with Model State History
- Revisiting the Scaling Properties of Downstream Metrics in Large Language Model Training
- Revisiting Tree-Sliced Wasserstein Distance Through the Lens of the Fermat–Weber Problem
- Revisiting Weight Regularization for Low-Rank Continual Learning
- Revisit Visual Prompt Tuning: The Expressiveness of Prompt Experts
- Revisting Node Affinity Prediction In Temporal Graphs
- Revisual-R1: Advancing Multimodal Reasoning From Optimized Cold Start to Staged Reinforcement Learning
- Revolutionizing Reinforcement Learning Framework for Diffusion Large Language Models
- RewardEval: Advancing Reward Model Evaluation
- Rewarding Doubt: A Reinforcement Learning Approach to Calibrated Confidence Expression of Large Language Models
- Reward Is Enough: LLMs Are In-Context Reinforcement Learners
- RewardMap: Tackling Sparse Rewards in Fine-grained Visual Reasoning via Multi-Stage Reinforcement Learning
- Reward Model Routing in Alignment
- Reward Models Inherit Value Biases from Pretraining
- ReWatch-R1: Boosting Complex Video Reasoning in Large Vision-Language Models through Agentic Data Synthesis
- Rewriting Pre-Training Data Boosts LLM Performance in Math and Code
- Rex-Thinker: Grounded Object Referring via Chain-of-Thought Reasoning
- RF-DETR: Neural Architecture Search for Real-Time Detection Transformers
- RFEval: Benchmarking Reasoning Faithfulness under Counterfactual Reasoning Intervention in Large Reasoning Models
- RF-MatID: Dataset and Benchmark for Radio Frequency Material Identification
- RFS: Reinforcement learning with Residual flow steering for dexterous manipulation
- R-Horizon: How Far Can Your Large Reasoning Model Really Go in Breadth and Depth?
- Riemannian Federated Learning via Averaging Gradient Streams
- Riemannian High-Order Pooling for Brain Foundation Models
- Riemannian Optimization on Relaxed Indicator Matrix Manifold
- Riemannian Variational Flow Matching for Material and Protein Design
- Riemannian Zeroth-Order Gradient Estimation with Structure-Preserving Metrics for Geodesically Incomplete Manifolds
- Riesz Neural Operator for Solving Partial Differential Equations
- RigidSSL: Rigidity-based Geometric Pretraining for Protein Generation
- RIG: Synergizing Reasoning and Imagination in End-to-End Generalist Policy
- Ringleader ASGD: The First Asynchronous SGD with Optimal Time Complexity under Data Heterogeneity
- Risk Phase Transitions in Spiked Regression: Alignment Driven Benign and Catastrophic Overfitting
- RiskPO: Risk-based Policy Optimization with Verifiable Reward for LLM Post-Training
- Risk-Sensitive Agent Compositions
- Risk-Sensitive Reinforcement Learning for Alleviating Exploration Dilemmas in Large Language Models
- RIVER: Real-time Video Interaction Benchmark
- RLAD: Training LLMs to Discover Abstractions for Solving Reasoning Problems
- RLAP-CLIP: Continual Multimodal Learning with Prototype Adaptation and Difficulty-Aware Routing
- RLBFF: Binary Flexible Feedback to bridge between Human Feedback & Verifiable Rewards
- RL for Reasoning by Adaptively Revealing Rationales
- RL makes MLLMs see better than SFT
- RL of Thoughts: Navigating LLM Reasoning with Inference-time Reinforcement Learning
- RLP: Reinforcement as a Pretraining Objective
- RL Squeezes, SFT Expands: A Comparative Study of Reasoning LLMs
- RL's Razor: Why Online Reinforcement Learning Forgets Less
- RLVER: Reinforcement Learning with Verifiable Emotion Rewards for Empathetic Agents
- RLVMR: Reinforcement Learning with Verifiable Meta-Reasoning Rewards for Robust Long-Horizon Agents
- RMAAT: Astrocyte-Inspired Memory Compression and Replay for Efficient Long-Context Transformers
- RMFlow: Refined Mean Flow by a Noise-Injection Step for Multimodal Generation
- RM-R1: Reward Modeling as Reasoning
- RNE: plug-and-play diffusion inference-time control and energy-based training
- RoboCasa365: A Large-Scale Simulation Framework for Training and Benchmarking Generalist Robots
- RoboInter: A Holistic Intermediate Representation Suite Towards Robotic Manipulation
- RoboOmni: Proactive Robot Manipulation in Omni-modal Context
- RoboPARA: Dual-Arm Robot Planning with Parallel Allocation and Recomposition Across Tasks
- RobotArena $\infty$: Unlimited Robot Benchmarking via Real-to-Sim Translation
- Robotic Manipulation by Imitating Generated Videos Without Physical Demonstrations
- Robust Adaptive Multi-Step Predictive Shielding
- Robust Adversarial Attacks Against Unknown Disturbance via Inverse Gradient Sample
- Robust Adversarial Quantification via Conflict-Aware Evidential Deep Learning
- Robust Amortized Bayesian Inference with Self-Consistency Losses on Unlabeled Data
- Robust Decision-Making with Partially Calibrated Forecasters
- Robust Deep Reinforcement Learning against Adversarial Behavior Manipulation
- Robust Denoising Neural Reranker for Recommender Systems
- Robust Equation Structure learning with Adaptive Refinement
- Robust Federated Inference
- Robust Fine-Tuning from Non-Robust Pretrained Models: Mitigating Suboptimal Transfer With Epsilon-Scheduling
- Robust Fine-tuning of Vision-Language-Action Robot Policies via Parameter Merging
- Robust Generalized Schr\"{o}dinger Bridge via Sparse Variational Gaussian Processes
- Robustify Spiking Neural Networks via Dominant Singular Deflation under Heterogeneous Training Vulnerability
- Robust LLM Unlearning via Post Judgment and Multi-round Thinking
- Robust Multi-Objective Controlled Decoding of Large Language Models
- Robustness in Text-Attributed Graph Learning: Insights, Trade-offs, and New Defenses
- Robustness in the Face of Partial Identifiability in Reward Learning
- Robustness of Probabilistic Models to Low-Quality Data: A Multi-Perspective Analysis
- Robust Optimization for Mitigating Reward Hacking with Correlated Proxies
- Robust Preference Alignment via Directional Neighborhood Consensus
- Robust Preference Optimization: Aligning Language Models with Noisy Preference Feedback
- Robust Reward Modeling via Causal Rubrics
- Robust Selective Activation with Randomized Temporal K-Winner-Take-All in Spiking Neural Networks for Continual Learning
- Robust Spiking Neural Networks Against Adversarial Attacks
- RobustSpring: Benchmarking Robustness to Image Corruptions for Optical Flow, Scene Flow and Stereo
- Robust Test-time Video-Text Retrieval: Benchmarking and Adapting for Query Shifts
- Robust Training of Neural Networks at Arbitrary Precision and Sparsity
- ROC-n-reroll: How verifier imperfection affects test-time scaling
- Rodrigues Network for Learning Robot Actions
- ROGA: Scaling Generalist Agents for Office Productivity Tasks via Tool Generation
- Rolling Forcing: Autoregressive Long Video Diffusion in Real Time
- RoRE: Rotary Ray Embedding for Generalised Multi-Modal Scene Understanding
- ROSETTA: Constructing Code-Based Reward from Unconstrained Language Preference
- Rote Learning Considered Useful: Generalizing over Memorized Data in LLMs
- RouterArena: An Open Platform for Comprehensive Comparison of LLM Routers
- Routing, Cascades, and User Choice for LLMs
- Routing Channel-Patch Dependencies in Time Series Forecasting with Graph Spectral Decomposition
- Routing Manifold Alignment Improves Generalization of Mixture-of-Experts LLMs
- Routing Matters in MoE: Scaling Diffusion Transformers with Explicit Routing Guidance
- ROVER: Benchmarking Reciprocal Cross-Modal Reasoning for Omnimodal Generation
- RPG: A Repository Planning Graph for Unified and Scalable Codebase Generation
- RPM: Reasoning-Level Personalization for Black-Box Large Language Models
- Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains
- RuleReasoner: Reinforced Rule-based Reasoning via Domain-aware Dynamic Sampling
- R-WoM: Retrieval-augmented World Model For Computer-use Agents
- R-Zero: Self-Evolving Reasoning LLM from Zero Data
- S$^2$-Guidance: Stochastic Self-Guidance for Training-Free Enhancement of Diffusion Models
- S2GO: Streaming Sparse Gaussian Occupancy
- S2R-HDR: A Large-Scale Rendered Dataset for HDR Fusion
- S3OD: Towards Generalizable Salient Object Detection with Synthetic Data
- SABRE-FL: Selective and Accurate Backdoor Rejection for Federated Prompt Learning
- SAC Flow: Sample-Efficient Reinforcement Learning of Flow-Based Policies via Velocity-Reparameterized Sequential Modeling
- Saddle-to-Saddle Dynamics Explains A Simplicity Bias Across Architectures
- Saddle-To-Saddle Dynamics in Deep ReLU Networks: Low-Rank Bias in the First Saddle Escape
- SAE as a Crystal Ball: Interpretable Features Predict Cross-domain Transferability of LLMs without Training
- SAES-SVD: Self-Adaptive Suppression of Accumulated and Local Errors for SVD-based LLM Compression
- SAFA-SNN: Sparsity-Aware On-Device Few-Shot Class-Incremental Learning with Fast-Adaptive Structure of Spiking Neural Network
- Safe Continuous-time Multi-Agent Reinforcement Learning via Epigraph Form
- SafeDialBench: A Fine-Grained Safety Evaluation Benchmark for Large Language Models in Multi-Turn Dialogues with Diverse Jailbreak Attacks
- SafeDPO: A Simple Approach to Direct Preference Optimization with Enhanced Safety
- Safe Exploration via Policy Priors
- SafeFlowMatcher: Safe and Fast Planning using Flow Matching with Control Barrier Functions
- Safeguarding Multimodal Knowledge Copyright in the RAG-as-a-Service Environment
- SafeMoE: Safe Fine-Tuning for MoE LLMs by Aligning Harmful Input Routing
- SafeMPO: Constrained Reinforcement Learning with Probabilistic Incremental Improvement
- SAFER: Risk-Constrained Sample-then-Filter in Large Language Models
- Safety at One Shot: Patching Fine-Tuned LLMs with A Single Instance
- SAFETY-GUIDED FLOW (SGF): A UNIFIED FRAMEWORK FOR NEGATIVE GUIDANCE IN SAFE GENERATION
- Safety Instincts: LLMs Learn to Trust Their Internal Compass for Self-Defense
- Safety Mirage: How Spurious Correlations Undermine VLM Safety Fine-Tuning and Can Be Mitigated by Machine Unlearning
- Safety Subspaces are Not Linearly Distinct: A Fine-Tuning Case Study
- SAGA: Structural Aggregation Guided Alignment with Dynamic View and Neighborhood Order Selection for Multiview Graph Domain Adaptation
- SAGE: Spatial-visual Adaptive Graph Exploration for Visual Place Recognition
- SAIL: Self-Amplified Iterative Learning for Diffusion Model Alignment with Minimal Human Feedback
- SAIR: Enabling Deep Learning for Protein-Ligand Interactions with a Synthetic Structural Dataset
- Salient Object Ranking via Cyclical Perception-Viewing Interaction Modeling
- SAM 3: Segment Anything with Concepts
- Same Content, Different Representations: A Controlled Study for Table QA
- Sample Complexity and Representation Ability of Test-time Scaling Paradigms
- Sample-efficient and Scalable Exploration in Continuous-Time RL
- Sample-Efficient Distributionally Robust Multi-Agent Reinforcement Learning via Online Interaction
- Sample-efficient evidence estimation of score based priors for model selection
- Sample Efficient Offline RL via T-Symmetry Enforced Latent State-Stitching
- Sample Lottery: Unsupervised Discovery of Critical Instances for LLM Reasoning
- Sample More to Think Less: Group Filtered Policy Optimization for Concise Reasoning
- Sample Reward Soups: Query-efficient Multi-Reward Guidance for Text-to-Image Diffusion Models
- Samples Are Not Equal: A Sample Selection Approach for Deep Clustering
- Sample Smart, Not Hard: Correctness-First Decoding for Better Reasoning in LLMs
- Sampling-aware Adversarial Attacks Against Large Language Models
- Sampling Complexity of TD and PPO in RKHS
- SAM-Veteran: An MLLM-Based Human-like SAM Agent for Reasoning Segmentation
- SANA-Video: Efficient Video Generation with Block Linear Diffusion Transformer
- Sapiens2
- SAQ: Stabilizer-Aware Quantum Error Correction Decoder
- SARM: Stage-Aware Reward Modeling for Long Horizon Robot Manipulation
- SASFT: Sparse Autoencoder-guided Supervised Finetuning to Mitigate Unexpected Code-Switching in LLMs
- Sat3DGen: Comprehensive Street-Level 3D Scene Generation from Single Satellite Image
- SatDreamer360: Multiview-Consistent Generation of Ground-Level Scenes from Satellite Imagery
- SAVE: A Generalizable Framework for Multi-Condition Single-Cell Generation with Gene Block Attention
- SCAD: Super-Class-Aware Debiasing for Long-Tailed Semi-Supervised Learning
- Scaf-GRPO: Scaffolded Group Relative Policy Optimization for Enhancing LLM Reasoning
- Scalable and Adaptive Trust-Region Learning via Projection Convex Hull
- Scalable Chain of Thoughts via Elastic Reasoning
- Scalable Exploration for High-Dimensional Continuous Control via Value-Guided Flow
- Scalable In-Context Q-Learning
- Scalable Multilingual Multimodal Machine Translation with Speech-Text Fusion
- Scalable Offline Model-Based RL with Action Chunks
- Scalable Oversight via Partitioned Human Supervision
- Scalable Random Wavelet Features: Efficient Non-Stationary Kernel Approximation with Convergence Guarantees
- Scalable Second-order Riemannian Optimization for $K$-means Clustering
- Scalable Spatio-Temporal SE(3) Diffusion for Long-Horizon Protein Dynamics
- Scalable Training for Vector-Quantized Networks with 100% Codebook Utilization
- ScaleCap: Scalable Image Captioning via Dual-Modality Debiasing
- ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data
- ScaleLong: A Multi-Timescale Benchmark for Long Video Understanding
- Scale-wise Distillation of Diffusion Models
- Scaling Agent Learning via Experience Synthesis
- Scaling Agents via Continual Pre-training
- Scaling Atomistic Protein Binder Design with Generative Pretraining and Test-Time Compute
- Scaling Attention via Feature Sparsity
- Scaling Bayesian Experimental Design to High-Dimensions with Information-Guided Diffusion
- Scaling Behavior of Discrete Diffusion Language Models
- ScalingCache: Extreme Acceleration of DiTs through Difference Scaling and Dynamic Interval Caching
- Scaling Direct Feedback Learning with Theoretical Guarantees
- Scaling Generalist Data-Analytic Agents
- Scaling Goal-conditioned Reinforcement Learning with Multistep Quasimetric Distances
- Scaling Group Inference for Diverse and High-Quality Generation
- Scaling Knowledge Editing in LLMs to 100,000 Facts with Neural KV Database
- Scaling Laws and Spectra of Shallow Neural Networks in the Feature Learning Regime
- Scaling Laws and Symmetry, Evidence from Neural Force Fields
- Scaling Laws for Diffusion Transformers
- Scaling Laws Meet Model Architecture: Toward Inference-Efficient LLMs
- Scaling Laws of SignSGD in Linear Regression: When Does It Outperform SGD?
- Scaling Laws Revisited: Modeling the Role of Data Quality in Language Model Pretraining
- Scaling Linear Attention with Sparse State Expansion
- Scaling Multi-Task Bayesian Optimization with Large Language Models
- Scaling Reasoning Hop Exposes Weaknesses: Demystifying and Improving Hop Generalization in Large Language Models
- Scaling Sequence-to-Sequence Generative Neural Rendering
- Scaling Speech Tokenizers with Diffusion Autoencoders
- Scaling Synthetic Task Generation for Agents via Exploration
- Scaling up Memory for Robotic Control via Experience Retrieval
- Scaling Up, Speeding Up: A Benchmark of Speculative Decoding for Efficient LLM Test-Time Scaling
- Scaling with Collapse: Efficient and Predictable Training of LLM Families
- SC-Arena: A Natural Language Benchmark for Single-Cell Reasoning with Knowledge-Augmented Evaluation
- scDFM: Distributional Flow Matching Model for Robust Single-Cell Perturbation Prediction
- SceneCOT: Eliciting Chain-of-Thought Reasoning in 3D Scenes
- Scenethesis: A Language and Vision Agentic Framework for 3D Scene Generation
- SceneTransporter: Optimal Transport-Guided Compositional Latent Diffusion for Single-Image Structured 3D Scene Generation
- Scheduling Your LLM Reinforcement Learning with Reasoning Trees
- Sci2Pol: Evaluating and Fine-tuning LLMs on Scientific-to-Policy Brief Generation
- ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows
- Scientific Methods for Understanding Deep Learning (Sci4DL)
- SciNav: A Principled Agent Framework for Scientific Coding Tasks
- SciTS: Scientific Time Series Understanding and Generation with LLMs
- SCI-Verifier: Scientific Verifier with Thinking
- SCOPED: Score–Curvature Out-of-distribution Proximity Evaluator for Diffusion
- Score-Based Density Estimation from Pairwise Comparisons
- Score-based Greedy Search for Structure Identification of Partially Observed Linear Causal Models
- Score Distillation Beyond Acceleration: Generative Modeling from Corrupted Data
- SCoT: Teaching 3D-LLMs to Think Spatially with Million-scale CoT Annotations
- SCRAPL: Scattering Transform with Random Paths for Machine Learning
- SCRIBES: Web-Scale Script-Based Semi-Structured Data Extraction with Reinforcement Learning
- SCUBA: Salesforce Computer Use Benchmark
- Sculpting Subspaces: Constrained Full Fine-Tuning in LLMs for Continual Learning
- Sculptor: Empowering LLMs with Cognitive Agency via Active Context Management
- SDErasure: Concept-Specific Trajectory Shifting for Concept Erasure via Adaptive Diffusion Classifier
- SealQA: Raising the Bar for Reasoning in Search-Augmented Language Models
- Search Arena: Analyzing Search-Augmented LLMs
- Searching for Privacy Risks in LLM Agents via Simulation
- Search Self-Play: Pushing the Frontier of Agent Capability without Supervision
- SecP-Tuning: Efficient Privacy-Preserving Prompt Tuning for Large Language Models via MPC
- Secret-Protected Evolution for Differentially Private Synthetic Text Generation
- Secure Inference for Diffusion Models via Unconditional Scores
- Secure Outlier-Aware Large Language Model Inference
- SE-Diff: Simulator and Experience Enhanced Diffusion Model for Comprehensive ECG Generation
- SeeDNorm: Self-Rescaled Dynamic Normalization
- SeedPrints: Fingerprints Can Even Tell Which Seed Your Large Language Model Was Trained From
- SEED-SET: Scalable Evolving Experimental Design for System-level Ethical Testing
- SEED: Towards More Accurate Semantic Evaluation for Visual Brain Decoding
- Seeing Across Views: Benchmarking Spatial Reasoning of Vision-Language Models in Robotic Scenes
- Seeing but Not Believing: Probing the Disconnect Between Visual Attention and Answer Correctness in VLMs
- Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory
- Seeing Through Deception: Uncovering Misleading Creator Intent in Multimodal News with Vision-Language Models
- Seeing Through the Brain: New Insights from Decoding Visual Stimuli with fMRI
- Seeing Through Words: Controlling Visual Retrieval Quality with Language
- Seeing What’s Not There: Negation Understanding Needs More Than Training
- Seeing What’s Wrong: A Trajectory-Guided Approach to Caption Error Detection
- Seek-CAD: A Self-refined Generative Modeling for 3D Parametric CAD Using Local Inference via DeepSeek
- Seesaw: Accelerating Training by Balancing Batch Size and Learning Rate Scheduling
- Segment Any Events with Language
- Segment-Level Attribution for Selective Learning of Long Reasoning Traces
- Selection, Reflection and Self-Refinement: Revisit Reasoning Tasks via a Causal Lens
- Selective Data Removal for Distributional Machine Unlearning
- Selective Expert Guidance for Effective and Diverse Exploration in Reinforcement Learning of LLMs
- Selective Rotary Position Embedding
- Self-Aligned Reward: Towards Effective and Efficient Reasoners
- Self-Augmented Visual Contrastive Decoding
- Self-Consistency Improves the Trustworthiness of Self-Interpretable GNNs
- Self-Destructive Language Models
- Self-Evolving Vision-Language Models for Image Quality Assessment via Voting and Ranking
- Self-Forcing++: Towards Minute-Scale High-Quality Video Generation
- Self-Guided Low Light Object Detection Framework
- SELF-HARMONY: LEARNING TO HARMONIZE SELF-SUPERVISION AND SELF-PLAY IN TEST-TIME REINFORCEMENT LEARNING
- Self-Improving Loops for Visual Robotic Planning
- Self-Improving Skill Learning for Robust Skill-based Meta-Reinforcement Learning
- Self-Improving Vision-Language-Action Models with Data Generation via Residual RL
- Self-Jailbreaking: Language Models Can Reason Themselves Out of Safety Alignment After Benign Reasoning Training
- Self-Predictive Representations for Combinatorial Generalization in Behavioral Cloning
- Self-Refining Vision Language Model for Robotic Failure Detection and Reasoning
- SelfReflect: Can LLMs Communicate Their Internal Answer Distribution?
- Self-Rewarding Vision-Language Model via Reasoning Decomposition and Multi-Reward Policy Optimization
- Self-Speculative Decoding Accelerates Lossless Inference in Any-Order and Any-Subset Autoregressive Models
- Self-Speculative Masked Diffusions
- Self-Supervised Evolution Operator Learning for High-Dimensional Dynamical Systems
- Self-Supervised Learning from Structural Invariance
- SelvaBox: A high‑resolution dataset for tropical tree crown detection
- Semantic-Aware Diffusion LLM Inference With Adaptive Block Size
- Semantic-aware Wasserstein Policy Regularization for Large Language Model Alignment
- Semantic-Enhanced Time-Series Forecasting via Large Language Models
- Semantic Regexes: Auto-Interpreting LLM Features with a Structured Language
- Semantic Uncertainty Quantification of Hallucinations in LLMs: A Quantum Tensor Network Based Method
- Semantic Visual Anomaly Detection and Reasoning in AI-Generated Images
- Semantic Voting: A Self-Evaluation-Free Approach for Efficient LLM Self-Improvement on Unverifiable Open-ended Tasks
- SEMA: Simple yet Effective Learning for Multi-Turn Jailbreak Attacks
- SemHiTok: A Unified Image Tokenizer via Semantic-Guided Hierarchical Codebook for Multimodal Understanding and Generation
- Semi-Parametric Contextual Pricing with General Smoothness
- Semi-Supervised Preference Optimization with Limited Feedback
- Sem-MoE: Semantic-aware Model-Data Collaborative Scheduling for Efficient MoE Inference
- SenseFlow: Scaling Distribution Matching for Flow-based Text-to-Image Distillation
- Separable Neural Networks: Approximation Theory, NTK Regime, and Preconditioned Gradient Descent
- Sequences of Logits Reveal the Low Rank Structure of Language Models
- Sequential Information Bottleneck Fusion: Towards Robust and Generalizable Multi-Modal Brain Tumor Segmentation
- Sequential Parallel Duality in Prefix Scannable Models
- Seq vs Seq: An Open Suite of Paired Encoders and Decoders
- SERE: Similarity-based Expert Re-routing for Efficient Batch Decoding in MoE Models
- SeRI: Gradient-Free Sensitive Region Identification in Decision-Based Black-Box Attacks
- SERQ: Saliency-Aware Low-Rank Error Reconstruction for LLM Quantization
- SERUM: Simple, Efficient, Robust, and Unifying Marking for Diffusion-based Image Generation
- SesaHand: Enhancing 3D Hand Reconstruction via Controllable Generation with Semantic and Structural Alignment
- SESaMo: Symmetry-Enforcing Stochastic Modulation for Normalizing Flows
- Set Representation Auxiliary Learning with Adversarial Encoding Perturbation and Optimization
- Setting up for failure: automatic discovery of the neural mechanisms of cognitive errors
- SFBD-OMNI: Bridge models for lossy measurement restoration with limited clean samples
- SFT Doesn’t Always Hurt General Capabilities: Revisiting Domain-Specific Fine-Tuning in LLMs
- SGD-Based Knowledge Distillation with Bayesian Teachers: Theory and Guidelines
- SGD with Adaptive Preconditioning: Unified Analysis and Momentum Acceleration
- ShapeGen4D: Towards High Quality 4D Shape Generation from Videos
- SHAPO: Sharpness-Aware Policy Optimization for Safe Exploration
- Sharing State Between Prompts and Programs
- Sharp asymptotic theory for Q-learning with \texttt{LD2Z} learning rate and its generalization
- Sharp Monocular View Synthesis in Less Than a Second
- Sharpness-Aware Machine Unlearning
- Sharpness-Aware Minimization in Logit Space Efficiently Enhances Direct Preference Optimization
- Sheaves Reloaded: A Direction Awakening
- SHE-LoRA: Selective Homomorphic Encryption for Federated Tuning with Heterogeneous LoRA
- ShieldedCode: Learning Robust Representations for Virtual Machine Protected Code
- SHIELD: Suppressing Hallucinations In LVLM Encoders via Bias and Vulnerability Defense
- Shift-and-Sum Quantization for Visual Autoregressive Models
- Shift-Tolerant Allocation via Black-Litterman Using Conditional Diffusion Estimates
- ShinkaEvolve: Towards Open-Ended and Sample-Efficient Program Evolution
- Shoot First, Ask Questions Later? Building Rational Agents that Explore and Act Like People
- Shop-R1: Rewarding LLMs to Simulate Human Behavior in Online Shopping via Reinforcement Learning
- Short Window Attention Enables Long-Term Memorization
- Should We Still Pretrain Encoders with Masked Language Modeling?
- Shrinking Proteins with Diffusion
- Shuffle-R1: Efficient RL framework for Multimodal Large Language Models via Data-centric Dynamic Shuffle
- Shuffling the Data, Extrapolating the Step: Sharper Bias In Constant Step-Size SGD
- SigLIP-HD by Fine-to-Coarse Supervision
- SigmaDock: Untwisting Molecular Docking with Fragment-Based SE(3) Diffusion
- SIGMA-GEN: STRUCTURE AND IDENTITY GUIDED MULTI-SUBJECT ASSEMBLY FOR IMAGE GENERATION
- SIGMark: Scalable In-Generation Watermark with Blind Extraction for Video Diffusion
- Signal in the Noise: Polysemantic Interference Transfers and Predicts Cross-Model Influence
- Signal Structure-Aware Gaussian Splatting for Large-Scale Scene Reconstruction
- Sign-SGD via Parameter-Free Optimization
- Si-GT: Fast Interconnect Signal Integrity Analysis for Integrated Circuit Design via Graph Transformers
- Silent Leaks: Implicit Knowledge Extraction Attack on RAG Systems
- Sim2Real VLA: Zero-Shot Generalization of Synthesized Skills to Realistic Manipulation
- SimBench: Benchmarking the Ability of Large Language Models to Simulate Human Behaviors
- SIM-CoT: Supervised Implicit Chain-of-Thought
- Similarity-aware Non-Convex Federated Optimization
- SiMO: Single-Modality-Operable Multimodal Collaborative Perception
- SimpleFold: Folding Proteins is Simpler than You Think
- SimpleGVR: A Simple Baseline for Latent-Cascaded Generative Video Super-Resolution
- SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning
- SimpleToM: Exposing the Gap between Explicit ToM Inference and Implicit ToM Application in LLMs
- SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning
- Simplicial Embeddings Improve Sample Efficiency in Actor–Critic Agents
- SimuHome: A Temporal- and Environment-Aware Benchmark for Smart Home LLM Agents
- Simulating and Understanding Deceptive Behaviors in Long-Horizon Interactions
- Simulation to Rules: A Dual-VLM Framework for Formal Visual Planning
- SimULi: Real-Time LiDAR and Camera Simulation with Unscented Transforms
- SiNGER: A Clearer Voice Distills Vision Transformers Further
- Single Index Bandits: Generalized Linear Contextual Bandits with Unknown Reward Functions
- Single-Loop Byzantine-Resilient Federated Bilevel Optimization
- Single-stream Policy Optimization
- Singleton-Optimized Conformal Prediction
- SinkTrack: Attention Sink based Context Anchoring for Large Language Models
- SIPDO: Closed-Loop Prompt Optimization via Synthetic Data Feedback
- SK2Decompile: LLM-based Two-Phase Binary Decompilation from Skeleton to Skin
- SketchEvo: Leveraging Drawing Dynamics for Enhanced Image Synthesis
- SketchingReality: From Freehand Scene Sketches to Photorealistic Images
- SketchThinker-R1: Towards Efficient Sketch-Style Reasoning in Large Multimodal Models
- SkillFactory: Self-Distillation for Learning Cognitive Behaviors
- Skill Learning via Policy Diversity Yields Identifiable Representations for Reinforcement Learning
- Skirting Additive Error Lower Bounds for Private Turnstile Streams
- SkyEvents: A Large-Scale Event-enhanced UAV Dataset for Robust 3D Scene Reconstruction
- SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse–Linear Attention
- sleep2vec: Unified Cross-Modal Alignment for Heterogeneous Nocturnal Biosignals
- Slicing Wasserstein over Wasserstein via Functional Optimal Transport
- SliderQuant: Accurate Post-Training Quantization for LLMs
- SLM-MUX: Orchestrating Small Language Models for Reasoning
- SlotGCG: Exploiting the Positional Vulnerability in LLMs for Jailbreak Attacks
- Slow-Fast Policy Optimization: Reposition-Before-Update for LLM Reasoning
- Small Drafts, Big Verdict: Information-Intensive Visual Reasoning via Speculation
- SMAN-Bench: A Cross-System Benchmark for Mobile Agents under Single- and Multi-path, Ambiguous, and Noisy Tasks
- SmartChunk Retrieval: Query-Aware Chunk Compression with Planning for Efficient Document RAG
- Smarter Not Harder: Generative Process Evaluation with Intrinsic-Signal Driving and Ability‑Adaptive Reward Shaping
- SmellNet: A Large-scale Dataset for Real-world Smell Recognition
- SMixer: Rethinking Efficient-Training and Event-Driven SNNs
- Smooth Calibration Error: Uniform Convergence and Functional Gradient Analysis
- Smooth Reading: Bridging the Gap of Recurrent LLM to Self-Attention LLM on Long-Context Understanding
- SMOTE and Mirrors: Exposing Privacy Leakage from Synthetic Minority Oversampling
- SNAPHARD CONTRAST LEARNING
- SNAP-UQ: Self-supervised Next-Activation Prediction for Single-Pass Uncertainty in TinyML
- SNaX: sparse narrow accelerated mixture of experts
- Sobolev Gradient Ascent for Optimal Transport: Barycenter Optimization and Convergence Analysis
- Social Agents: Collective Intelligence Improves LLM Predictions
- SocialHarmBench: Revealing LLM Vulnerabilities to Socially Harmful Requests
- SocialJax: An Evaluation Suite for Multi-agent Reinforcement Learning in Sequential Social Dilemmas
- SoFlow: Solution Flow Models for One-Step Generative Modeling
- SoftCFG: Uncertainty-guided Stable Guidance for Visual Autoregressive Model
- Soft-Di[M]O: Improved one-step Image Discrete Model
- Soft Equivariance Regularization for Invariant Self-Supervised Learning
- Soft-Masked Diffusion Language Models
- Softmax is not Enough (for Adaptive Conformal Classification)
- Softmax Transformers are Turing-Complete
- Soft Quality-Diversity Optimization
- Soft Tokens, Hard Truths
- SoLoPO: Unlocking Long-Context Capabilities in LLMs via Short-to-Long Preference Optimization
- Solving Football by Exploiting Equilibrium Structure of 2p0s Differential Games with One-Sided Information
- Solving General-Utility Markov Decision Processes in the Single-Trial Regime with Online Planning
- Solving Parameter-Robust Avoid Problems with Unknown Feasibility using Reinforcement Learning
- Solving the 2-norm k-hyperplane clustering problem via multi-norm formulations
- Solving the Granularity Mismatch: Hierarchical Preference Learning for Long-Horizon LLM Agents
- Some Neural Networks Inherently Preserve Subspace Clustering Structure
- SONA: Learning Conditional, Unconditional, and Mismatching-Aware Discriminator
- SONATA: Synergistic Coreset Informed Adaptive Temporal Tensor Factorization
- SongEcho: Cover Song Generation via Instance-Adaptive Element-wise Linear Modulation
- SONIC: Spectral Oriented Neural Invariant Convolutions
- SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward
- SOSBENCH: Benchmarking Safety Alignment on Scientific Knowledge
- Source-Guided Flow Matching
- SpaCE-10: A Comprehensive Benchmark for Multimodal Large Language Models in Compositional Spatial Intelligence
- SpaceControl: Introducing Test-Time Spatial Control to 3D Generative Modeling
- SpaCE-Eval: A Benchmark for Real-World Multi-Modal Reasoning
- SPACeR: Self-Play Anchoring with Centralized Reference Models
- SpareTrain: Fault-Tolerant LLM Training via Low-Cost Dual Modular Redundancy
- Sparkle: A Robust and Versatile Representation for Point Cloud-based Human Motion Capture
- Sparling: End-to-End Spatial Concept Learning via Extremely Sparse Activations
- Sparse Attention Adaptation for Long Reasoning
- Sparse Autoencoders Trained on the Same Data Learn Different Features
- Sparse but Critical: A Token-Level Analysis of Distributional Shifts in RLVR Fine-Tuning of LLMs
- Sparse CLIP: Co-Optimizing Interpretability and Performance in Contrastive Learning
- SparseD: Sparse Attention for Diffusion Language Models
- SparseEval: Efficient Evaluation of Large Language Models by Sparse Optimization
- Sparse Imagination for Efficient Visual World Model Planning
- Sparsity Forcing: Reinforcing Token Sparsity of MLLMs
- Sparsity-promoting Fine-tuning for Equivariant Materials Foundation Model
- SPARTA: Scalable and Principled Benchmark of Tree-Structured Multi-hop QA over Text and Tables
- SpatiaLab: Can Vision–Language Models Perform Spatial Reasoning in the Wild?
- Spatial CAPTCHA: Generatively Benchmarking Spatial Reasoning for Human-Machine Differentiation
- Spatial-DISE: A Unified Benchmark for Evaluating Spatial Reasoning in Vision-Language Models
- Spatial Forcing: Implicit Spatial Representation Alignment for Vision-language-action Model
- SpatialHand: Generative Object Manipulation from 3D Prespective
- SpatialLadder: Progressive Training for Spatial Reasoning in Vision-Language Models
- Spatially Guided Training for Vision-Language-Action Model
- Spatially Informed Autoencoders for Interpretable Visual Representation Learning
- Spatial Reasoning with Vision-Language Models in Ego-Centric Multi-View Scenes
- Spatial Structure and Selective Text Jointly Facilitate Image Clustering
- SpatialViz-Bench: A Cognitively-Grounded Benchmark for Diagnosing Spatial Visualization in MLLMs
- SpeakerVid-5M: A Large-Scale High-Quality Dataset for Audio-Visual Dyadic Interactive Human Generation
- SpecBranch: Speculative Decoding via Hybrid Drafting and Rollback-Aware Branch Parallelism
- Specialization after Generalization: Towards Understanding Test-Time Training in Foundation Models
- Special Unitary Parameterized Estimators of Rotation
- SPECS: Decoupling Multimodal Learning via Self-distilled Preference-based Cold Start
- Spectral Attention Steering for Prompt Highlighting
- Spectral Bellman Method: Unifying Representation and Exploration in RL
- SpectralGCD: Spectral Concept Selection and Cross-modal Representation Learning for Generalized Category Discovery
- Spectral-guided Physical Dynamics Distillation
- SpectraLLM: Uncovering the Ability of LLMs for Molecule Structure Elucidation from Multi-Spectra
- Spectrum Tuning: Post-Training for Distributional Coverage and In-Context Steerability
- Speculative Actions: A Lossless Framework for Faster AI Agents
- Speculative Speculative Decoding
- SpeechJudge: Towards Human-Level Judgment for Speech Naturalness
- SpeechOp: Inference-Time Task Composition for Generative Speech Processing
- Speech-to-LaTeX: New Models and Datasets for Converting Spoken Equations and Sentences
- Speech World Model: Causal State–Action Planning with Explicit Reasoning for Speech
- SPEED: Scalable, Precise, and Efficient Concept Erasure for Diffusion Models
- SPELL: Self-Play Reinforcement Learning for evolving Long-Context Language Models
- SPG: Sandwiched Policy Gradient for Masked Diffusion Language Models
- Spherical Watermark: Encryption-Free, Lossless Watermarking for Diffusion Models
- SPICE: Submodular Penalized Information–Conflict Selection for Efficient Large Language Model Training
- Spike-based Digital Brain: a novel fundamental model for brain activity analysis
- SpikeGen: Decoupled “Rods and Cones” Visual Representation Processing with Latent Generative Framework
- SpikePingpong: Spike Vision-based Fast-Slow Pingpong Robot System
- SPIKE-RL: Video-LLMs meet Bayesian Surprise
- SpikeStereoNet: A Brain-Inspired Framework for Stereo Depth Estimation from Spike Streams
- Spiking Discrepancy Transformer for Point Cloud Analysis
- Spilled Energy in Large Language Models
- Spilling the Beans: Teaching LLMs to Self-Report Their Hidden Objectives
- SpinBench: Perspective and Rotation as a Lens on Spatial Reasoning in VLMs
- SpineBench: A Clinically Salient, Level-Aware Benchmark Powered by the SpineMed-450k Corpus
- Spinning Straw into Gold: Relabeling LLM Agent Trajectories in Hindsight for Successful Demonstrations
- SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning
- Splat and Distill: Augmenting Teachers with Feed-Forward 3D Reconstruction For 3D-Aware Distillation
- Splat Feature Solver
- Splat Regression Models
- Splat the Net: Radiance Fields with Splattable Neural Primitives
- Split Happens (But Your Video Model Can Be Edited)
- SplitLoRA: Balancing Stability and Plasticity in Continual Learning Through Gradient Space Splitting
- SportR: A Benchmark for Multimodal Large Language Model Reasoning in Sports
- SpotIt: Evaluating Text-to-SQL Evaluation with Formal Verification
- Spotlight on Token Perception for Multimodal Reinforcement Learning
- SPR$^2$Q: Static Priority-based Rectifier Routing Quantization for Image Super-Resolution
- SPREAD: Sampling-based Pareto front Refinement via Efficient Adaptive Diffusion
- SPRIG: Improving Large Language Model Performance by System Prompt Optimization
- SPRINT: Sparse-Dense Residual Fusion for Efficient Diffusion Transformers
- Spurious Correlation-Aware Embedding Regularization for Worst-Group Robustness
- SP-VLA: A Joint Model Scheduling and Token Pruning Approach for VLA Model Acceleration
- SPWOOD: Sparse Partial Weakly-Supervised Oriented Object Detection
- Squeeze the Soaked Sponge: Efficient Off-policy RFT for Large Language Model
- SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning
- SR-Scientist: Scientific Equation Discovery With Agentic AI
- SRT: Super-Resolution for Time Series via Disentangled Rectified Flow
- SSD-GS: Scattering and Shadow Decomposition for Relightable 3D Gaussian Splatting
- SSDi8: Accurate and Efficient 8-bit Quantization for State Space Duality
- SSG: Scaled Spatial Guidance for Multi-Scale Visual Autoregressive Generation
- ssToken: Self-modulated and Semantic-aware Token Selection for LLM Fine-tuning
- SSVPO: Effective Step-Level Credit Assignment for RL Training of Language Models
- Stability Under Scrutiny: Benchmarking Representation Paradigms for Online HD Mapping
- Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping
- Stabilizing Policy Gradients for Sample-Efficient Reinforcement Learning in LLM Reasoning
- Stable and Scalable Deep Predictive Coding Networks with Meta Prediction Errors
- Stable coresets: Unleashing the power of uniform sampling
- Stable-LoRA: Stabilizing Feature Learning of Low-Rank Adaption
- StableToken: A Noise-Robust Semantic Speech Tokenizer for Resilient SpeechLLMs
- Stable Video Infinity: Infinite-Length Video Generation with Error Recycling
- Stacked from One: Multi-Scale Self-Injection for Context Window Extension
- Stackelberg Coupling of Online Representation Learning and Reinforcement Learning
- Stackelberg Learning from Human Feedback: Preference Optimization as a Sequential Game
- Stage-wise Dynamics of Classifier-Free Guidance in Diffusion Models
- STAIRS-Former: Spatio-Temporal Attention with Interleaved Recursive Structure TransFormer for Offline Mulit-task Multi-agent Reinforcement Learning
- STaMP: Sequence Transformation and Mixed Precision for Low-Precision Activation Quantization
- STAR-Bench: Probing Deep Spatio-Temporal Reasoning as Audio 4D Intelligence
- STARK: Strategic Team of Agents for Refining Kernels
- STAR: Similarity-guided Teacher-Assisted Refinement for Super-Tiny Function Calling Models
- STAR: Strategy-driven Automatic Jailbreak Red-teaming For Large Language Model
- station2radar: query‑conditioned gaussian splatting for precipitation field
- Statistical Advantage of Softmax Attention: Insights from Single-Location Regression
- Statistical Guarantees for Offline Domain Randomization
- Statistical Guarantees in the Search for Less Discriminatory Algorithms
- STAT: Skill-Targeted Adaptive Training
- STDDN: A Physics-Guided Deep Learning Framework for Crowd Simulation
- STEDiff: Revealing the Spatial and Temporal Redundancy of Backdoor Attacks in Text-to-Image Diffusion Models
- Steerable Adversarial Scenario Generation through Test-Time Preference Alignment
- Steering and Rectifying Latent representation manifolds in Frozen Multi-modal LLMs for Video Anomaly Detection
- Steering Autoregressive Music Generation with Recursive Feature Machines
- Steering Diffusion Models Towards Credible Content Recommendation
- Steering Embedding Models with Geometric Rotation: Mapping Semantic Relationships Across Languages and Models
- Steering Evaluation-Aware Language Models To Act Like They Are Deployed
- Steering Language Models with Weight Arithmetic
- Steering MoE LLMs via Expert (De)Activation
- Steering the Herd: A Framework for LLM-based Control of Social Learning
- SteinsGate: Adding Causality to Diffusions for Long Video Generation via Path Integral
- STEM: SCALING TRANSFORMERS WITH EMBEDDING MODULES
- Step-Aware Residual-Guided Diffusion for EEG Spatial Super-Resolution
- StepORLM: A Self-Evolving Framework With Generative Process Supervision For Operations Research Language Models
- ST-HHOL: Spatio-Temporal Hierarchical Hypergraph Online Learning for Crime Prediction
- STITCH: Simultaneous Thinking and Talking with Chunked Reasoning for Spoken Language Models
- Stochastic Neural Networks for Causal Inference with Missing Confounders
- Stochastic Optimal Control for Continuous-Time fMRI Representation Learning
- Stochastic Self-Organization in Multi-Agent Systems
- StochasTok: Improving Fine-Grained Subword Understanding in LLMs
- Stop Guessing: Choosing the Optimization-Consistent Uncertainty Measurement for Evidential Deep Learning
- Stopping Computation for Converged Tokens in Masked Diffusion-LM Decoding
- Stop Tracking Me! Proactive Defense Against Attribute Inference Attack in LLMs
- Stop Unnecessary Reflection: Training LRMs for Efficient Reasoning with Adaptive Reflection and Length Coordinated Penalty
- Stop Wasting Your Tokens: Towards Efficient Runtime Multi-Agent Systems
- STORK: Faster Diffusion and Flow Matching Sampling by Resolving both Stiffness and Structure-Dependence
- STORM: Synergistic Cross-Scale Spatio-Temporal Modeling for Weather Forecasting
- StoryAlign: Evaluating and Training Reward Models for Story Generation
- Story-Iter: A Training-free Iterative Paradigm for Long Story Visualization
- StPR: Spatiotemporal Preservation and Routing for Exemplar-Free Video Class-Incremental Learning
- Strategic Dishonesty Can Undermine AI Safety Evaluations of Frontier LLMs
- Strategic Obfuscation of Deceptive Reasoning in Language Models
- Strategic Planning and Rationalizing on Trees Make LLMs Better Debaters
- Strategic Scaling of Test-Time Compute: A Bandit Learning Approach
- STream3R: Scalable Sequential 3D Reconstruction with Causal Transformer
- Streaming Autoregressive Video Generation via Diagonal Distillation
- Streaming Drag-Oriented Interactive Video Manipulation: Drag Anything, Anytime!
- StreamingThinker: Large Language Models Can Think While Reading
- Streaming Visual Geometry Transformer
- StreamingVLM: Real-Time Understanding for Infinite Video Streams
- StreamSplat: Towards Online Dynamic 3D Reconstruction from Uncalibrated Video Streams
- Stretching Beyond the Obvious: A Gradient-Free Framework to Unveil the Hidden Landscape of Visual Invariance
- Strictly Constrained Generative Modeling via Split Augmented Langevin Sampling
- Strict Subgoal Execution: Reliable Long-Horizon Planning in Hierarchical Reinforcement Learning
- String Seed of Thought: Prompting LLMs for Distribution-Faithful and Diverse Generation
- Stroke3D: Lifting 2D strokes into rigged 3D model via latent diffusion models
- Strong Correlations Induce Cause Only Predictions in Transformer Training
- STRONGER TOGETHER: ON-POLICY REINFORCEMENT LEARNING FOR COLLABORATIVE LLMS
- Strongly Convex Sets in Riemannian Manifolds
- Structural Inference: Interpreting Small Language Models with Susceptibilities
- Structurally Human, Semantically Biased: Detecting LLM-Generated References with Embeddings and GNNs
- Structural Prognostic Event Modeling for Multimodal Cancer Survival Analysis
- Structure-Aware Graph Hypernetworks for Neural Program Synthesis
- Structured Flow Autoencoders: Learning Structured Probabilistic Representations with Flow Matching
- Structured Reasoning for LLMs: A Unified Framework for Efficiency and Explainability
- Structure Learning from Time-Series Data with Lag-Agnostic Structural Prior
- ST-SimDiff: Balancing Spatiotemporal Similarity and Difference for Efficient Video Understanding with MLLMs
- Study of Training Dynamics for Memory-Constrained Fine-Tuning
- STVG-R1: Incentivizing Instance-Level Reasoning and Grounding in Videos via Reinforcement Learning
- ST-WebAgentBench: A Benchmark for Evaluating Safety and Trustworthiness in Web Agents
- StyliTruth : Unlocking Stylized yet Truthful LLM Generation via Disentangled Steering
- StylOS: Multi-View 3D Stylization with Single-Forward Gaussian Splatting
- SubDyve: Subgraph-Driven Dynamic Propagation for Virtual Screening Enhancement Controlling False Positive
- Sublinear Spectral Clustering Oracle with Little Memory
- Sublinear Time Quantum Algorithm for Attention Approximation
- Submodular Function Minimization with Dueling Oracle
- Subquadratic Algorithms and Hardness for Attention with Any Temperature
- Subspace Kernel Learning on Tensor Sequences
- Summaries as Centroids for Interpretable and Scalable Text Clustering
- SumRA: Parameter Efficient Fine-tuning with Singular Value Decomposition and Summed Orthogonal Basis
- SupCLAP: Controlling Optimization Trajectory Drift in Audio-Text Contrastive Learning with Support Vector Regularization
- Superficial Safety Alignment Hypothesis
- SuperF: Neural Implicit Fields for Multi-Image Super-Resolution
- Supervised Fine-Tuning or Contrastive Learning? Towards Better Multimodal LLM Reranking
- Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning
- Supporting High-Stakes Decision Making Through Interactive Preference Elicitation in the Latent Space
- Supporting Multimodal Intermediate Fusion with Informatic Constraint and Distribution Coherence
- SurfSplat: Conquering Feedforward 2D Gaussian Splatting with Surface Continuity Priors
- SURGE: Surprise-Guided Token Reduction for Efficient Video Understanding with VLMs
- SurvHTE-Bench: A Benchmark for Heterogeneous Treatment Effect Estimation in Survival Analysis
- SUSD: Structured Unsupervised Skill Discovery through State Factorization
- SVD Provably Denoises Nearest Neighbor Data
- Swap-guided Preference Learning for Personalized Reinforcement Learning from Human Feedback
- SWERank: Software Issue Localization with Code Ranking
- SWE-RM: Execution-free Feedback for Software Engineering Agents
- SwiftTS: A Swift Selection Framework for Time Series Pre-trained Models via Multi-task Meta-Learning
- SWINGARENA: Adversarial Programming Arena for Long-context GitHub Issue Solving
- SwiReasoning: Switch-Thinking in Latent and Explicit for Pareto-Superior Reasoning LLMs
- Symmetric Space Learning for Combinatorial Generalization
- Symmetry-Aware Bayesian Optimization via Max Kernels
- Synchronizing Probabilities in Model-Driven Lossless Compression
- SYNC: Measuring and Advancing Synthesizability in Structure-Based Drug Design
- SynCoGen: Synthesizable 3D Molecule Generation via Joint Reaction and Coordinate Modeling
- Syncphony: Synchronized Audio-to-Video Generation with Diffusion Transformers
- SyncTrack: Rhythmic Stability and Synchronization in Multi-Track Music Generation
- Synergizing Understanding and Generation with Interleaved Analyzing-Drafting Thinking
- Synthesising Counterfactual Explanations via Label-Conditional Gaussian Mixture Variational Autoencoders
- Synthesizing High-Quality Visual Question Answering from Medical Documents with Generator-Verifier LMMs
- Synthesizing Multimodal Verifiable Game Data to Boost VLMs' General Reasoning
- Synthetic Bootstrapped Pretraining
- Synthetic History: Evaluating Visual Representations of the Past in Diffusion Models
- SynthWorlds: Controlled Parallel Worlds for Disentangling Reasoning and Knowledge in Language Models
- Sysformer: Safeguarding Frozen Large Language Models with Adaptive System Prompts
- SysMoBench: Evaluating AI on Formally Specifying Complex Real-World Systems
- Systematic Biosafety Evaluation of DNA Language Models under Jailbreak Attacks
- T1: One-to-One Channel-Head Binding for Multivariate Time-Series Imputation
- T1: Tool-integrated Verification for Test-time Compute Scaling in Small Language Models
- TableDART: Dynamic Adaptive Multi-Modal Routing for Table Understanding
- TableMaster: A Recipe to Advance Table Understanding with Language Models
- TABLET: A Large-Scale Dataset for Robust Visual Table Understanding
- Tab-MIA: A Benchmark Dataset for Membership Inference Attacks on Tabular Data in LLMs
- TabStruct: Measuring Structural Fidelity of Tabular Data
- Tackling Heavy-Tailed Q-Value Bias in Offline-to-Online Reinforcement Learning with Laplace-Robust Modeling
- Tackling the XAI Disagreement Problem with Adaptive Feature Grouping
- Tackling Time-Series Forecasting Generalization via Mitigating Concept Drift
- TaCo: A Benchmark for Lossless and Lossy Codecs of Heterogeneous Tactile Data
- Tactic: Adaptive Sparse Attention with Clustering and Distribution Fitting for Long-Context LLMs
- Take Note: Your Molecular Dataset Is Probably Aligned
- Talk, Evaluate, Diagnose: User-aware Agent Evaluation with Automated Error Analysis
- Talking Points: Describing and Localizing Pixels
- Taming Curvature: Architecture Warm-up for Stable Transformer Training
- Taming Hierarchical Image Coding Optimization: A Spectral Regularization Perspective
- Taming Imperfect Process Verifiers: A Sampling Perspective on Backtracking
- Taming Momentum: Rethinking Optimizer States Through Low-Rank Approximation
- Taming Polysemanticity in LLMs: Theory-Grounded Feature Recovery via Sparse Autoencoders
- Taming Score-Based Denoisers in ADMM: A Convergent Plug-and-Play Framework
- Taming the Fragility of KV Cache Eviction in LLM Inference
- TAMMs:~Change Understanding and Forecasting in Satellite Image Time Series with a Temporal-Aware Multimodal Model
- **TandemFoilSet**: Datasets for Flow Field Prediction of Tandem-Airfoil Through the Reuse of Single Airfoils
- TangleScore: Tangle-Guided Purge and Imprint for Unstructured Knowledge Editing
- TangoFlux: Text to Audio Generation with CLAP-Ranked Preference Optimization
- TAO-Attack: Toward Advanced Optimization-Based Jailbreak Attacks for Large Language Models
- TAPTRv3: Spatial and Temporal Context Foster Robust Tracking of Any Point in Long Video
- Target-Aware Video Diffusion Models
- Task-Adaptive Parameter-Efficient Fine-Tuning for Weather Foundation Models
- Task-Agnostic Amortized Multi-Objective Optimization
- Task-Aware Data Selection via Proxy-Label Enhanced Distribution Matching for LLM Finetuning
- TaskCraft: Automated Generation of Agentic Tasks
- Task-free Adaptive Meta Black-box Optimization
- Task-Related Token Compression in Multimodal Large Language Models from an Explainability Perspective
- Task Tokens: A Flexible Approach to Adapting Behavior Foundation Models
- Task Vectors, Learned Not Extracted: Performance Gains and Mechanistic Insights
- TASTE: Text-Aligned Speech Tokenization and Embedding for Spoken Language Modeling
- TaTToo: Tool-Grounded Thinking PRM for Test-Time Scaling in Tabular Reasoning
- TAVAE: A VAE with Adaptable Priors Explains Contextual Modulation in the Visual Cortex
- TCD-Arena: Assessing Robustness of Time Series Causal Discovery Methods Against Assumption Violations
- TD-JEPA: Latent-predictive Representations for Zero-Shot Reinforcement Learning
- TD-MoE: Tensor Decomposition for MoE Models
- Teach2Eval: An Interaction-Driven LLMs Evaluation Method via Teaching Effectiveness
- Teaching LLMs to Admit Uncertainty in OCR
- Teaching Metric Distance to Discrete Autoregressive Language Models
- Teach to Reason Safely: Policy-Guided Safety Tuning for MLRMs
- TEDM: Time Series Forecasting with Elucidated Diffusion Models
- Tell me Habibi, is it Real or Fake?
- Temperature as a Meta-Policy: Adaptive Temperature in LLM Reinforcement Learning
- TEMPFLOW-GRPO: WHEN TIMING MATTERS FOR GRPO IN FLOW MODELS
- Temporal Generalization: A Reality Check
- Temporal Geometry of Deep Networks: Hyperbolic Representations of Training Dynamics for Intrinsic Explainability
- Temporal Graph Thumbnail: Robust Representation Learning with Global Evolutionary Skeleton
- Temporally Detailed Hypergraph Neural ODE for Type 2 Diabetes Progression Modeling
- Temporal Slowness in Central Vision Drives Semantic Object Learning
- Temporal Sparse Autoencoders: Leveraging the Sequential Nature of Language for Interpretability
- Temporal superposition and feature geometry of RNNs under memory demands
- TEN-DM: Topology-Enhanced Diffusion Model for Spatio-Temporal Event Prediction
- Tensor learning with orthogonal, Lorentz, and symplectic symmetries
- Tequila: Deadzone-free Ternary Quantization for Large Language Models
- Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces
- Terminal Velocity Matching
- TerraFM: A Scalable Foundation Model for Unified Multisensor Earth Observation
- TESSAR: Geometry-Aware Active Regression via Dynamic Voronoi Tessellation
- Testing Most Influential Sets
- Test-Time Accuracy-Cost Control in Neural Simulators via Recurrent-Depth
- Test-Time Adaptation without Source Data for Out-of-Domain Bioactivity Prediction
- Test-Time Alignment for Large Language Models via Textual Model Predictive Control
- Test-time Domain Generalization for Image Super-resolution
- Test-Time Efficient Pretrained Model Portfolios for Time Series Forecasting
- Test-Time Iterative Error Correction for Efficient Diffusion Models
- Test-Time Matching: Unlocking Compositional Reasoning in Multimodal Models
- Test-Time Mixture of World Models for Embodied Agents in Dynamic Environments
- Test-Time Optimization of 3D Point Cloud LLM via Manifold-Aware In-Context Guidance and Refinement
- Test-Time Poisoned Sample Detection by Exploiting Shallow Malicious Matching in Backdoored CLIP
- TEST-TIME SCALING IN DIFFUSION LLMS VIA HIDDEN SEMI-AUTOREGRESSIVE EXPERTS
- Test-Time Scaling with Reflective Generative Model
- Test-Time Training Done Right
- Test-time Verification via Optimal Transport: Coverage, ROC, & Sub-optimality
- TetraGT: Tetrahedral Geometry-Driven Explicit Token Interactions with Graph Transformer for Molecular Representation Learning
- Text2Arch: A Dataset for Generating Scientific Architecture Diagrams from Natural Language Descriptions
- Text2Grad: Reinforcement Learning from Natural Language Feedback
- Text2Interact: High-Fidelity and Diverse Text-to-Two-Person Interaction Generation
- Text-Aware Image Restoration with Diffusion Models
- Text summarization via global structure awareness
- Text-to-3D by Stitching a Multi-view Reconstruction Network to a Video Generator
- Textual Bayes: Quantifying Uncertainty in LLM-Based Systems
- Textual Equilibrium Propagation for Deep Compound AI Systems
- Texture Vector-Quantization and Reconstruction Aware Prediction for Generative Super-Resolution
- TFHE-Coder: Evaluating LLM Agents for secure Fully Homomorphic Encryption Code Generation
- TGM: A Modular and Efficient Library for Machine Learning on Temporal Graphs
- The 2nd Workshop on Advances in Financial AI Workshop: Towards Agentic and Responsible Systems
- The 2nd Workshop on Foundation Models for Science: Real-World Impact and Science-First Design
- The 3rd Workshop on Test-Time Updates (TTU)
- The Achilles’ Heel of LLMs: How Altering a Handful of Neurons Can Cripple Language Abilities
- The Alignment Auditor: A Bayesian Framework for Verifying and Refining LLM Objectives
- The Alignment Waltz: Jointly Training Agents to Collaborate for Safety
- The Art of Scaling Reinforcement Learning Compute for LLMs
- The Choice of Divergence: A Neglected Key to Mitigating Diversity Collapse in Reinforcement Learning with Verifiable Reward
- The CoT Encyclopedia: Analyzing, Predicting, and Controlling how a Reasoning Model will Think
- The Counting Power of Transformers
- The Coverage Principle: How Pre-Training Enables Post-Training
- The Curious Case of In-Training Compression of State Space Models
- The Deleuzian Representation Hypothesis
- The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs
- The Diffusion Duality, Chapter II: $\Psi$-Samplers and Efficient Curriculum
- The Effect of Attention Head Count on Transformer Approximation
- THE END OF MANUAL DECODING: TOWARDS TRULY END-TO-END LANGUAGE MODELS
- The Expressive Limits of Diagonal SSMs for State-Tracking
- The False Promise of Zero-Shot Super-Resolution in Machine-Learned Operators
- The First Impression Problem: Internal Bias Triggers Overthinking in Reasoning Models
- The First Workshop on Efficient Spatial Reasoning
- The Forecast After the Forecast: A Post-Processing Shift in Time Series
- The Gaussian-Head OFL Family: One-Shot Federated Learning from Client Global Statistics
- The Geometry and Topology of Circuits: the Manifolds of Modular Addition
- The Geometry of LLM Quantization: GPTQ as Babai's Nearest Plane Algorithm
- The Geometry of Reasoning: Flowing Logics in Representation Space
- The Hidden Lattice Geometry of LLMs
- The Hot Mess of AI: How Does Misalignment Scale With Model Intelligence and Task Complexity?
- The Human Brain as a Dynamic Mixture of Expert Models in Video Understanding
- The Ideation-Execution Gap: Execution Outcomes of LLM-Generated versus Human Research Ideas
- The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs
- The Imitation Game: Turing Machine Imitator is Length Generalizable Reasoner
- The Intricate Dance of Prompt Complexity, Quality, Diversity and Consistency in T2I Models
- The Lattice Geometry of Neural Network Quantization: A Short Equivalence Proof of GPTQ and Babai's algorithm
- The Less You Depend, The More You Learn: Synthesizing Novel Views from Sparse, Unposed Images without Any 3D Knowledge
- The Lie of the Average: How Class Incremental Learning Evaluation Deceives You?
- The Limits of Inference Scaling Through Resampling
- The logical expressiveness of topological neural networks
- The Markovian Thinker
- The Matthew Effect of AI Programming Assistants: A Hidden Bias in Software Evolution
- The Mind's Transformer: Computational Neuroanatomy of LLM-Brain Alignment
- THEMIS: Towards Holistic Evaluation of MLLMs for Scientific Paper Fraud Forensics
- The Natural Geometry of Code: Hyperbolic Representation Learning for Program Reasoning
- The Open Proof Corpus: A Large-Scale Study of LLM-Generated Mathematical Proofs
- Theoretical Analysis of Contrastive Learning under Imbalanced Data: From Training Dynamics to a Pruning Solution
- Theoretical Guarantees for Causal Discovery on Large Random Graphs
- Theoretical Modeling of Large Language Model Self-Improvement Training Dynamics Through Solver-Verifier Gap
- Theory-Grounded Evaluation of Human-Like Fallacy Patterns in LLM Reasoning
- Theory of Scaling Laws for In-Context Regression: Depth, Width, Context and Time
- The Overthinking Predicament: When Reasoning Hurts Ranking
- THE PATH OF LEAST RESISTANCE: GUIDING LLM REASONING TRAJECTORIES WITH PREFIX CONSENSUS
- The Pensieve Paradigm: Stateful Language Models with Learned Memory Management
- The Polar Express: Optimal Matrix Sign Methods and their Application to the Muon Algorithm
- The Potential of Second-Order Optimization for LLMs: A Study with Full Gauss-Newton
- The Power of Small Initialization in Noisy Low-Tubal-Rank Tensor Recovery
- The Price of Amortized inference in Sparse Autoencoders
- The Price of Robustness: Stable Classifiers Need Overparameterization
- The Quest for Efficient Reasoning: A Data-Centric Benchmark to CoT Distillation
- The Quest for Generalizable Motion Generation: Data, Model, and Evaluation
- The Rank and Gradient Lost in Non-stationarity: Sample Weight Decay for Mitigating Plasticity Loss in Reinforcement Learning
- There and Back Again: On the relation between Noise and Image Inversions in Diffusion Models
- There Was Never a Bottleneck in Concept Bottleneck Models
- The Sample Complexity of Online Reinforcement Learning: A Multi-model Perspective
- The Seismic Wavefield Common Task Framework
- THE SELF-RE-WATERMARKING TRAP: FROM EXPLOIT TO RESILIENCE
- The Serial Scaling Hypothesis
- The Shape of Adversarial Influence: Characterizing LLM Latent Spaces with Persistent Homology
- The Softmax Bottleneck Does Not Limit the Probabilities of the Most Likely Tokens
- The Spacetime of Diffusion Models: An Information Geometry Perspective
- The State of Reinforcement Finetuning for Transformer-based Generative Agents
- The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution
- The Tutor-Pupil Augmentation: Enhancing Learning and Interpretability via Input Corrections
- The Unseen Bias: How Norm Discrepancy in Pre-Norm MLLMs Leads to Visual Information Loss
- The Unseen Frontier: Pushing the Limits of LLM Sparsity with Surrogate-Free ADMM
- The Value of Information in Human-AI Decision-making
- Thicker and Quicker: The Jumbo Token for Fast Plain Vision Transformers
- Thinking as Society: Multi-Social-Agent Self-Distillation for Multimodal Misinformation Detection
- Thinking-Free Policy Initialization Makes Distilled Reasoning Models More Effective and Efficient Reasoners
- Thinking on the Fly: Test-Time Reasoning Enhancement via Latent Thought Policy Optimization
- Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation
- Think in Parallel, Answer as One: Logit Averaging for Open-Ended Reasoning
- ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning
- ThinkOmni: Lifting Textual Reasoning to Omni-modal Scenarios via Guidance Decoding
- Think Then Embed: Generative Context Improves Multimodal Embedding
- ThinKV: Thought-Adaptive KV Cache Compression for Efficient Reasoning Models
- Think-While-Generating: On-the-Fly Reasoning for Personalized Long-Form Generation
- Thompson Sampling via Fine-Tuning of LLMs
- THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning
- Thought Branches: Interpreting LLM Reasoning Requires Resampling
- Threading Keyframe with Narratives: MLLMs as Strong Long Video Comprehenders
- Three Forward, One Backward: Memory-Efficient Full-Rank Fine-Tuning of Large Models via Extra Forward Passes
- Through the Lens of Contrast: Self-Improving Visual Reasoning in VLMs
- Thyme: Think Beyond Images
- TianQuan-S2S: A Subseasonal-to-Seasonal Global Weather Model via Incorporate Climatology State
- TIGaussian: Disentangle Gaussians for Spatial-Awared Text-Image-3D Alignment
- Tight Bounds for Schrodinger Potential Estimation in Unpaired Data Translation
- Tighter Performance Theory of FedExProx
- TikZilla: Scaling Text-to-TikZ with High-Quality Data and Reinforcement Learning
- TileLang: Bridge Programmability and Performance in Modern Neural Kernels
- Time-Gated Multi-Scale Flow Matching for Time-Series Imputation
- Time Is a Feature: Exploiting Temporal Dynamics in Diffusion Language Models
- Time Is All It Takes: Spike-Retiming Attacks on Event-Driven Spiking Neural Networks
- TimeOmni-1: Incentivizing Complex Reasoning with Time Series in Large Language Models
- Time Optimal Execution of Action Chunk Policies Beyond Demonstration Speed
- TimeRecipe: A Time-Series Forecasting Recipe via Benchmarking Module Level Effectiveness
- TimeSearch-R: Adaptive Temporal Search for Long-Form Video Understanding via Self-Verification Reinforcement Learning
- TimeSeg: An Information-Theoretic Segment-Wise Explainer for Time-Series Predictions
- TimeSeriesExamAgent: Creating TimeSeries Reasoning Benchmarks at Scale
- TIMESLIVER : SYMBOLIC-LINEAR DECOMPOSITION FOR EXPLAINABLE TIME SERIES CLASSIFICATION
- Time-To-Inconsistency: A Survival Analysis of Large Language Model Robustness to Adversarial Attacks
- Time-to-Move: Training-Free Motion-Controlled Video Generation via Dual-Clock Denoising
- Tina: Tiny Reasoning Models via LoRA
- TINKER: Diffusion's Gift to 3D--Multi-View Consistent Editing From Sparse Inputs without Per-Scene Optimization
- TINY BUT MIGHTY: A SOFTWARE-HARDWARE CO- DESIGN APPROACH FOR EFFICIENT MULTIMODAL IN- FERENCE ON BATTERY-POWERED SMALL DEVICES
- TIPO: Text to Image with Text Pre-sampling for Prompt Optimization
- TIPS: Turn-level Information-Potential Reward Shaping for Search-Augmented LLMs
- TiTok: Transfer Token-level Knowledge via Contrastive Excess to Transplant LoRA
- TNT: Improving Chunkwise Training for Test-Time Memorization
- To Augment or Not to Augment? Diagnosing Distributional Symmetry Breaking
- To Compress or Not? Pushing the Frontier of Lossless GenAI Model Weights Compression with Exponent Concentration
- To Infinity and Beyond: Tool-Use Unlocks Length Generalization in State Space Models
- Token-based Audio Inpainting via Discrete Diffusion
- Token Distillation: Attention-Aware Input Embeddings for New Tokens
- Token-Efficient Long-Term Interest Sketching and Internalized Reasoning for LLM-based Recommendation
- Token-Guard: Towards Token-Level Hallucination Control via Self-Checking Decoding
- Token Hidden Reward: Steering Exploration-Exploitation in Group Relative Deep Reinforcement Learning
- Token-Importance Guided Direct Preference Optimization
- Tokenisation over Bounded Alphabets is Hard
- Tokenizing Single-Channel EEG with Time-Frequency Motif Learning
- Token-level Data Selection for Safe LLM Fine-tuning
- (Token-Level) \textbf{InfoRMIA}: Stronger Membership Inference and Privacy Assessment for LLMs
- TokenSeek: Memory Efficient Fine Tuning via Instance-Aware Token Ditching
- TokMem: Tokenized Procedural Memory for Large Language Models
- TokUR: Token-Level Uncertainty Estimation for Large Language Model Reasoning
- Tools are under-documented: Simple Document Expansion Boosts Tool Retrieval
- ToolTree: Efficient LLM Tool Planning via Dual-Feedback Monte Carlo Tree Search and Bidirectional Pruning
- ToolWeaver: Weaving Collaborative Semantics for Scalable Tool Use in Large Language Models
- ToonComposer: Streamlining Cartoon Production with Generative Post-Keyframing
- TopoFormer: Topology Meets Attention for Graph Learning
- Topological Anomaly Quantification for Semi-supervised Graph Anomaly Detection
- Topological Causal Effects
- Topological Flow Matching
- Topology and geometry of the learning space of ReLU networks: connectivity and singularities
- Topology Matters in RTL Circuit Representation Learning
- Topology of Reasoning: Retrieved Cell Complex-Augmented Generation for Textual Graph Question Answering
- Topology-Preserved Auto-regressive Mesh Generation in the Manner of Weaving Silk
- ToProVAR: Efficient Visual Autoregressive Modeling via Tri-Dimensional Entropy-Aware Semantic Analysis and Sparsity Optimization
- To Sink or Not to Sink: Visual Information Pathways in Large Vision-Language Models
- TOUCH: Text-guided Controllable Generation of Free-Form Hand-Object Interactions
- To View Transform or Not to View Transform: NeRF-based Pre-training Perspective
- Toward Complex-Valued Neural Networks for Waveform Generation
- Toward Conservative Planning from Preferences in Offline Reinforcement Learning
- Toward Effective Tool-Integrated Reasoning via Self-Evolved Preference Learning
- Toward Efficient Exploration by Large Language Model Agents
- Toward Enhancing Representation Learning in Federated Multi-Task Settings
- Toward Faithful Retrieval-Augmented Generation with Sparse Autoencoders
- Toward Practical Equilibrium Propagation: Brain-inspired Recurrent Neural Network with Feedback Regulation and Residual Connections
- Toward Principled Flexible Scaling for Self-Gated Neural Activation
- Towards a Certificate of Trust: Task-Aware OOD Detection for Scientific AI
- Toward Safer Diffusion Language Models: Discovery and Mitigation of Priming Vulnerability
- Towards a Foundation Model for Crowdsourced Label Aggregation
- Towards All-Atom Foundation Models for Biomolecular Binding Affinity Prediction
- Towards Anomaly-Aware Pre-Training and Fine-Tuning for Graph Anomaly Detection
- Towards a Sharp Analysis of Learning Offline $f$-Divergence-Regularized Contextual Bandits
- Towards a Theoretical Understanding of In-context Learning: Stability and Non-I.I.D Generalisation
- Towards a Universally Transferable Acceleration Method for Density Functional Theory
- Towards Better Branching Policies: Leveraging the Sequential Nature of Branch-and-Bound Tree
- Towards Better Optimization For Listwise Preference in Diffusion Models
- Towards Bridging the Gap between Large-Scale Pretraining and Efficient Finetuning for Humanoid Control
- Towards Cognitively-Faithful Decision-Making Models to Improve AI Alignment
- Towards Dynamic Interleaving Optimizers
- Towards Efficient, Adaptive, and Unified Reinforcement Mid-Training
- Towards Efficient Constraint Handling in Neural Solvers for Routing Problems
- Towards Efficient Optimizer Design for LLM via Structured Fisher Approximation with a Low-Rank Extension
- Towards Faithful Reasoning in Remote Sensing: A Perceptually-Grounded GeoSpatial Chain-of-Thought for Vision-Language Models
- Towards Generalizable PDE Dynamics Forecasting via Physics-Guided Invariant Learning
- Towards Greater Leverage: Scaling Laws for Efficient Mixture-of-Experts Language Models
- Towards High Data Efficiency in Reinforcement Learning with Verifiable Reward
- Towards Improved Sentence Representations using Token Graphs
- Towards Improvisational TAMP: Learning Low-Level Shortcuts in Abstract Planning Graphs
- Towards Interpretable Visual Decoding with Attention to Brain Representations
- Towards Knowledge‑and‑Data‑Driven Organic Reaction Prediction: RAG‑Enhanced and Reasoning‑Powered Hybrid System with LLMs
- Towards Learned Optimization Free Lunch
- Towards Lossless Memory-efficient Training of Spiking Neural Networks via Gradient Checkpointing and Spike Compression
- Towards Multimodal Data-Driven Scientific Discovery Powered by LLM Agents
- Towards Multimodal Time Series Anomaly Detection with Semantic Alignment and Condensed Interaction
- Towards One-step Causal Video Generation via Adversarial Self-Distillation
- Towards Persistent Noise-Tolerant Active Learning of Regular Languages with Class Query
- Towards Personalized Deep Research: Benchmarks and Evaluations
- Towards Physically Executable 3D Gaussian for Embodied Navigation
- Towards Privacy-Guaranteed Label Unlearning in Vertical Federated Learning: Few-Shot Forgetting Without Disclosure
- Towards Prompt-Robust Machine-Generated Text Detection
- Towards Quantifying Long-Range Interactions in Graph Machine Learning: a Large Graph Dataset and a Measurement
- Towards Quantization-Aware Training for Ultra-Low-Bit Reasoning LLMs
- Towards Real-World Routing with Neural Combinatorial Optimization
- Towards Reliable Benchmarking: A Contamination Free, Controllable Evaluation Framework for Multi-step LLM Function Calling
- Towards Reliable Detection of Empty Space: Conditional Marked Point Processes for Object Detection
- Towards Revealing the Effect of Batch Size Scheduling on Pre-training
- Towards Robust Real-World Multivariate Time Series Forecasting: A Unified Framework for Dependency, Asynchrony, and Missingness
- Towards Safe and Optimal Online Bidding: A Modular Look-ahead Lyapunov Framework
- Towards Safe Reasoning in Large Reasoning Models via Corrective Intervention
- Towards Sampling Data Structures for Tensor Products in Turnstile Streams
- Towards Self-Evolving Agent Benchmarks : Validatable Agent Trajectory via Test-Time Exploration
- Towards Self-Robust LLMs: Intrinsic Prompt Noise Resistance via CoIPO
- Towards Sequence Modeling Alignment between Tokenizer and Autoregressive Model
- Towards Spatial Supersensing in Video
- Towards Strategic Persuasion with Language Models
- Towards Sustainable Investment Policies Informed by Opponent Shaping
- Towards Text-Mask Consistency in Medical Image Segmentation
- Towards True Speech-to-Speech Models Without Text Guidance
- Towards Understanding Subliminal Learning: When and How Hidden Biases Transfer
- Towards Understanding The Calibration Benefits of Sharpness-Aware Minimization
- Towards Understanding the Nature of Attention with Low-Rank Sparse Decomposition
- Towards Understanding the Shape of Representations in Protein Language Models
- Towards Understanding Valuable Preference Data for Large Language Model Alignment
- Toward Universal and Transferable Jailbreak Attacks on Vision-Language Models
- TPDiff: Temporal Pyramid Video Diffusion Model
- TPRU: Advancing Temporal and Procedural Understanding in Large Multimodal Models
- TP-Spikformer: Token Pruned Spiking Transformer
- Traceable Black-Box Watermarks For Federated Learning
- Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Method
- Trace Anything: Representing Any Video in 4D via Trajectory Fields
- TRACEDET: HALLUCINATION DETECTION FROM THE DECODING TRACE OF DIFFUSION LARGE LANGUAGE MODELS
- TRACED: Transition-aware Regret Approximation with Co-learnability for Environment Design
- TRACE: Your Diffusion Model is Secretly an Instance Edge Detector
- Tracing and Reversing Edits in LLMs: A Study on Rank-One Model Edits
- Tracing the Traces: Latent Temporal Signals for Efficient and Accurate Reasoning
- Tractability via Low Dimensionality: The Parameterized Complexity of Training Quantized Neural Networks
- TRAC: Tensor-Train based Across-layer Compression for Parameter-Efficient Fine-Tuning
- Trade in Minutes! Rationality-Driven Agentic System for Quantitative Financial Trading
- Train-before-Test Harmonizes Language Model Rankings
- Trained on Tokens, Calibrated on Concepts: The Emergence of Semantic Calibration in LLMs
- Training Deep Normalization-Free Spiking Neural Networks with Lateral Inhibition.
- Training Dynamics Impact Post-Training Quantization Robustness
- Training-free Counterfactual Explanation for Temporal Graph Model Inference
- Training-Free Determination of Network Width via Neural Tangent Kernel
- Training-Free Loosely Speculative Decoding: Accepting Semantically Correct Drafts Beyond Exact Match
- Training-Free Reward-Guided Image Editing via Trajectory Optimal Control
- Training-Free Text-Guided Color Editing with Multi-Modal Diffusion Transformer
- Training Large Language Models To Reason In Parallel With Global Forking Tokens
- Training Large Reasoning Models Efficiently via Progressive Thought Encoding
- Training LLMs with LogicReward for Faithful and Rigorous Reasoning
- Train Once, Answer All: Many Pretraining Experiments for the Cost of One
- Train on Validation (ToV): Fast data selection with applications to fine-tuning
- TrainRef: Curating Data with Label Distribution and Minimal Reference for Accurate Prediction and Reliable Confidence
- TRAJECT-Bench:A Trajectory-Aware Benchmark for Evaluating Agentic Tool Use
- Trajectory-aware Shifted State Space Models for Online Video Super-Resolution
- Trajectory Generation with Conservative Value Guidance for Offline Reinforcement Learning
- TrajFlow: Nation-wide Pseudo GPS Trajectory Generation with Flow Matching Models
- TrajTok: What makes for a good trajectory tokenizer in behavior generation?
- Transducing Language Models
- Transductive Visual Programming: Evolving Tool Libraries from Experience for Spatial Reasoning
- Transferable and Stealthy Adversarial Attacks on Large Vision-Language Models
- Transfer Learning in Infinite Width Feature Learning Networks
- Transfer Paramatters: Optimal per-Module Hyperparameters Across All Scaling Axes
- Transformers are Inherently Succinct
- Transformers as a Measure-Theoretic Associative Memory: A Statistical Perspective
- Transformers as Unsupervised Learning Algorithms: A study on Gaussian Mixtures
- Transformers Don’t Need LayerNorm at Inference Time: Scaling LayerNorm Removal to GPT-2 XL and Implications for Mechanistic Interpretability
- Transformers Learn Latent Mixture Models In-Context via Mirror Descent
- Transformers Trained via Gradient Descent Can Provably Learn a Class of Teacher Models
- Transformers with Endogenous In-Context Learning: Bias Characterization and Mitigation
- Transitive RL: Value Learning via Divide and Conquer
- Translate Policy to Language: Flow Matching Generated Rewards for LLM Explanations
- Translating Flow to Policy via Hindsight Online Imitation
- Translation Heads: Unveiling Attention's Role in LLM Multilingual Translation
- TraPO: A Semi-Supervised Reinforcement Learning Framework for Boosting LLM Reasoning
- Trapped by simplicity: When Transformers fail to learn from noisy features
- Tree-based Dialogue Reinforced Policy Optimization for Red-Teaming Attacks
- TreeGrad-Ranker: Feature Ranking via $O(L)$-Time Gradients for Decision Trees
- TreeGRPO: Tree-Advantage GRPO for Online RL Post-Training of Diffusion Models
- Tree Search for LLM Agent Reinforcement Learning
- Tree-sliced Sobolev IPM
- Triangle Multiplication is All You Need for Biomolecular Structure Representations
- TRIBE: TRImodal Brain Encoder for whole-brain fMRI response prediction
- Tricks or Traps? A Deep Dive into RL for LLM Reasoning
- TriC-Motion: Tri-Domain Causal Modeling Grounded Text-to-Motion Generation
- TRIDENT: Cross-Domain Trajectory Spatio-Temporal Representation via Distance-Preserving Triplet Learning
- TRIM: Hybrid Inference via Targeted Stepwise Routing in Multi-Step Reasoning Tasks
- TrimR: Verifier-based Training-Free Thinking Trimming for Efficient Test-Time Scaling
- Trinity: An Evolved LLM Coordinator
- Triple-BERT: Do We Really Need MARL for Order Dispatch on Ride-Sharing Platforms?
- TripleSumm: Adaptive Triple-Modality Fusion for Video Summarization
- TriQDef: Disrupting Semantic and Gradient Alignment to Prevent Adversarial Patch Transferability in Quantized Neural Networks
- TrojanTO: Action-Level Backdoor Attacks Against Trajectory Optimization Models
- TROLL: Trust Regions Improve Reinforcement Learning for Large Language Models
- True Self-Supervised Novel View Synthesis is Transferable
- Trust but Verify: Adaptive Conditioning for Reference-Based Diffusion Super-Resolution via Implicit Reference Correlation Modeling
- TrustGen: A Platform of Dynamic Benchmarking on the Trustworthiness of Generative Foundation Models
- TrustJudge: Inconsistencies of LLM-as-a-Judge and How to Alleviate Them
- Trust-Region Adaptive Policy Optimization
- Trust The Typical
- Truthfulness Despite Weak Supervision: Evaluating and Training LLMs Using Peer Prediction
- Truthful or Fabricated? Using Causal Attribution to Mitigate Reward Hacking in Explanations
- TS$^2$: Training with Sparsemax+, Testing with Softmax for Accurate and Diverse LLM Fine-Tuning
- TS-Attn: Temporal-wise Separable Attention for Multi-Event Video Generation
- TS-DDAE: A novel Temporal-Spectral Denoising Diffusion AutoEncoder for Wireless Signal Recognition Model Pre-training
- TSLM: Tree-Structured Language Modeling for Divergent Thinking
- TSM-Bench: Detecting LLM-Generated Text in Real-World Wikipedia Editing Practices
- t-SNE Exaggerates Clusters, Provably
- TSPulse: Tiny Pre-Trained Models with Disentangled Representations for Rapid Time-Series Analysis
- T-TAMER: Provably Taming Trade-offs in ML Serving
- TTOM: Test-Time Optimization and Memorization for Compositional Video Generation
- TTS Can Speak in Any Style with Any Voice
- TTSDS2: Resources and Benchmark for Evaluating Human-Quality Text to Speech Systems
- TTT3R: 3D Reconstruction as Test-Time Training
- Tucker-FNO: Tensor Tucker-Fourier Neural Operator and its Universal Approximation Theory
- Tug-of-War No More: Harmonizing Accuracy and Robustness in Vision-Language Models via Stability-Aware Task Vector Merging
- TUMIX: Multi-Agent Test-Time Scaling with Tool-Use Mixture
- TumorChain: Interleaved Multimodal Chain-of-Thought Reasoning for Traceable Clinical Tumor Analysis
- Tuning the burn-in phase in training recurrent neural networks improves their performance
- TurboBoA: Faster and Exact Attention-aware Quantization without Backpropagation
- Turbo-DDCM: Fast and Flexible Zero-Shot Diffusion-Based Image Compression
- TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate
- Turning Internal Gap into Self-Improvement: Promoting the Generation-Understanding Unification in MLLMs
- TusoAI: Agentic Optimization for Scientific Methods
- Tversky Neural Networks: Psychologically Plausible Deep Learning with Differentiable Tversky Similarity
- TVTSyn: Content-Synchronous Time-Varying Timbre for Streaming Voice Conversion and Anonymization
- TwinFlow: Realizing One-step Generation on Large Models with Self-adversarial Flows
- TwinVLA: Data-Efficient Bimanual Manipulation with Twin Single-Arm Vision-Language-Action Models
- Two failure modes of deep transformers and how to avoid them: a unified theory of signal propagation at initialisation
- Two-Layer Convolutional Autoencoders Trained on Normal Data Provably Detect Unseen Anomalies
- Two (narrow) heads are better than (an arbitrarily wide) one
- Two-Way Is Better Than One: Bidirectional Alignment with Cycle Consistency for Exemplar-Free Class-Incremental Learning
- Type-Compliant Adaptation Cascades
- TyphoonMLA: A Mixed Naive-Absorb MLA Kernel For Shared Prefix
- U2-BENCH: Benchmarking Large Vision-Language Models on Ultrasound Understanding
- UALM: Unified Audio Language Model for Understanding, Generation and Reasoning
- UFO-4D: Unposed Feedforward 4D reconstruction from Two Images
- UI-Ins: Enhancing GUI Grounding with Multi-Perspective Instruction as Reasoning
- UIS-Digger: Towards Comprehensive Research Agent Systems for Real-world Unindexed Information Seeking
- ULD-Net: Enabling Ultra-Low-Degree Fully Polynomial Networks for Homomorphically Encrypted Inference
- ULTRA-360: Unconstrained Dataset for Large-scale Temporal 3D Reconstruction across Altitudes and Omnidirectional Views
- UltraGauss: Ultrafast Gaussian Reconstruction of 3D Ultrasound Volumes
- UltraLLaDA: Scaling the Context Length to 128K for Diffusion Large Language Models
- UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning
- UltraViCo: Breaking Extrapolation Limits in Video Diffusion Transformers
- U-MARVEL: Unveiling Key Factors for Universal Multimodal Retrieval via Embedding Learning with MLLMs
- UME-R1: Exploring Reasoning-Driven Generative Multimodal Embeddings
- Unbalanced Soft-Matching Distance For Neural Representational Comparison With Partial Unit Correspondence
- Unbiased Gradient Estimation for Event Binning via Functional Backpropagation
- Uncertainty as Feature Gaps: Epistemic Uncertainty Quantification of LLMs in Contextual Question-Answering
- Uncertainty-Aware 3D Reconstruction for Dynamic Underwater Scenes
- Uncertainty-Aware Diagnostics for Physics-Informed Machine Learning
- Uncertainty-Aware Gaussian Map for Vision-Language Navigation
- Uncertainty-driven Embedding Convolution
- Uncertainty Estimation via Hyperspherical Confidence Mapping
- Uncertainty Matters in Dynamic Gaussian Splatting for Monocular 4D Reconstruction
- Uncovering Conceptual Blindspots in Generative Image Models Using Sparse Autoencoders
- Uncovering Robot Vulnerabilities through Semantic Potential Fields
- Uncovering Semantic Selectivity of Latent Groups in Higher Visual Cortex with Mutual Information-Guided Diffusion
- Uncover Underlying Correspondence for Robust Multi-view Clustering
- Understanding and Improving Continuous LLM Adversarial Training via In-context Learning Theory
- Understanding and Improving Hyperbolic Deep Reinforcement Learning
- Understanding and Improving Length Generalization in Hierarchical Sparse Attention Models
- Understanding and improving Shampoo and SOAP via Kullback-Leibler Minimization
- Understanding and Relaxing the Limitations of Transformers for Linear Algebra
- Understanding Collaboration Mechanism In VAE Recommender Systems
- Understanding Dataset Distillation via Spectral Filtering
- Understanding In-Context Learning on Structured Manifolds: Bridging Attention to Kernel Methods
- Understanding Language Prior of LVLMs by Contrasting Chain-of-Embedding
- Understanding Routing Mechanism in Mixture-of-Experts Language Models
- Understanding Sensitivity of Differential Attention through the Lens of Adversarial Robustness
- Understanding Task Vectors in In-Context Learning: Emergence, Functionality, and Limitations
- Understanding the Dynamics of Forgetting and Generalization in Continual Learning via the Neural Tangent Kernel
- Understanding the Emergence of Seemingly Useless Features in Next-Token Predictors
- Understanding the Implicit Biases of Design Choices for Time Series Foundation Models
- Understanding the Learning Phases in Self-Supervised Learning via Critical Periods
- Understanding the Mechanisms of Fast Hyperparameter Transfer
- Understanding the Mixture-of-Experts with Nadaraya-Watson Kernel
- Understanding the Robustness of Distributed Self-Supervised Learning Frameworks Against Non-IID Data
- Understanding the Role of Training Data in Test-Time Scaling
- UNDERSTANDING TRANSFORMERS FOR TIME SEIRES FORECASTING: A CASE STUDY ON MOIRAI
- Understanding Transformers for Time Series: Rank Structure, Flow-of-ranks, and Compressibility
- Understanding VLMs Spatial Mental Modeling Capability from Limited Views
- Understanding vs. Generation: Navigating Optimization Dilemma in Multimodal Models
- Unfolding Spatial Cognition: Evaluating Multimodal Models on Visual Simulations
- (U)NFV: Supervised and Unsupervised Neural Finite Volume Methods for Solving Hyperbolic PDEs
- UniCalli: A Unified Diffusion Framework for Column-Level Generation and Recognition of Chinese Calligraphy
- UniCA: Unified Covariate Adaptation for Time Series Foundation Model
- UniCon: Unified Framework for Efficient Contrastive Alignment via Kernels
- Uni-CoT: Towards Unified Chain-of-Thought Reasoning Across Text and Vision
- Uni-DPO: A Unified Paradigm for Dynamic Preference Optimization of LLMs
- UniEdit-Flow: Unleashing Inversion and Editing in the Era of Flow Models
- UniF$^2$ace: A $\underline{Uni}$fied $\underline{F}$ine-grained $\underline{Face}$ Understanding and Generation Model
- Unified 3D Scene Understanding Through Physical World Modeling
- Unified Analyses for Hierarchical Federated Learning: Topology Selection under Data Heterogeneity
- Unified and Efficient Multi-view Clustering from Probabilistic Perspective
- Unified Biomolecular Trajectory Generation via Pretrained Variational Bridge
- Unified Diffusion VLA: Vision-Language-Action Model via Joint Discrete Diffusion Diffusion Process
- Unified In-Context Video Editing
- Unified Multi-Modal Interactive and Reactive 3D Motion Generation via Rectified Flow
- Unified Privacy Guarantees for Decentralized Learning via Matrix Factorization
- Unified Registration of Cortical and Subcortical Structures
- Unified Vision-Language-Action Model
- Unified Vision–Language Modeling via Concept Space Alignment
- UniFlow: A Unified Pixel Flow Tokenizer for Visual Understanding and Generation
- Uniform Discrete Diffusion with Metric Path for Video Generation
- Unifying Complexity-Theoretic Perspectives on Provable Explanations
- Unifying Concept Representation Learning
- Unifying Diffusion and Autoregression for Generalizable Vision-Language-Action Model
- Unifying Stable Optimization and Reference Regularization in RLHF
- UniHand: A Unified Model for Diverse Controlled 4D Hand Motion Modeling
- UniHM: Unified Dexterous Hand Manipulation with Vision Language Model
- UniLiP: Adapting CLIP for Unified Multimodal Understanding, Generation and Editing
- Uni-NTFM: A Unified Foundation Model for EEG Signal Representation Learning
- UniOD: A Universal Model for Outlier Detection across Diverse Domains
- UniQL: Unified Quantization and Low-rank Compression for Adaptive Edge LLMs
- UniRestorer: Universal Image Restoration via Adaptively Estimating Image Degradation at Proper Granularity
- UniSplat: Unified Spatio-Temporal Fusion via 3D Latent Scaffolds for Dynamic Driving Scene Reconstruction
- UniSS: Unified Expressive Speech-to-Speech Translation with Your Voice
- UNITE: Universal kNowledge Integration from Task-specific Experts
- UniTrack: Differentiable Graph Representation Learning for Multi-Object Tracking
- UniUGG: Unified 3D Understanding and Generation via Geometric-Semantic Encoding
- UNIVERSAL AND EFFICIENT LOADING BALANCING FOR RL TRAINING OF LARGE MULTIMODAL MODELS
- Universal Beta Splatting
- Universal Inverse Distillation for Matching Models with Real-Data Supervision (No GANs)
- Universal Model Routing for Efficient LLM Inference
- Universal Multi-Domain Translation via Diffusion Routers
- Universal Properties of Activation Sparsity in Modern Large Language Models
- Universal Value-Function Uncertainties
- Uni-X: Mitigating Modality Conflict with a Two-End-Separated Architecture for Unified Multimodal Models
- Unlearning during Training: Domain-Specific Gradient Ascent for Domain Generalization
- Unlearning Evaluation through Subset Statistical Independence
- Unlearning Isn't Invisible: Detecting Unlearning Traces in LLMs from Model Outputs
- Unleashing Guidance Without Classifiers for Human-Object Interaction Animation
- Unleashing LLMs in Bayesian Optimization: Preference-Guided Framework for Scientific Discovery
- Unleashing Perception-Time Scaling to Multimodal Reasoning Models
- Unleashing Scientific Reasoning for Bio-experimental Protocol Generation via Structured Component-based Reward Mechanism
- Unlocking Full Efficiency of Token Filtering in Large Language Model Training
- Unlocking Long-Horizon Agentic Search with Large-Scale End-to-End RL
- Unlocking the Essence of Beauty: Advanced Aesthetic Reasoning with Relative-Absolute Policy Optimization
- Unlocking the Potential of Weighting Methods in Federated Learning Through Communication Compression
- Unlocking the Power of Co-Occurrence in CLIP: A DualPrompt-Driven Method for Training-Free Zero-Shot Multi-Label Classification
- Unlocking the Power of Multi-Agent LLM for Reasoning: From Lazy Agents to Deliberation
- Unlocking the Value of Text: Event-Driven Reasoning and Multi-Level Alignment for Time Series Forecasting
- UnLoc: Leveraging Depth Uncertainties for Floorplan Localization
- Unmasking Backdoors: An Explainable Defense via Gradient-Attention Anomaly Scoring for Pre-trained Language Models
- Unmute the Patch Tokens: Rethinking Probing in Multi-Label Audio Classification
- Unpacking Human Preference for LLMs: Demographically Aware Evaluation with the Diverse Framework
- Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation
- Unsupervised Invariant Risk Minimization
- Unsupervised Learning of Efficient Exploration: Pre-training Adaptive Policies via Self-Imposed Goals
- Unsupervised Representation Learning for 3D Mesh Parameterization with Semantic and Visibility Objectives
- Untraceable DeepFakes via Traceable Fingerprint Elimination
- Unveiling Downstream Performance Scaling of LLMs: A Clustering-Based Perspective
- Unveiling Perceptual Artifacts: A Fine-Grained Benchmark for Interpretable AI-Generated Image Detection
- Unveiling Super Experts in Mixture-of-Experts Large Language Models
- Unveiling the Cognitive Compass: Theory-of-Mind–Guided Multimodal Emotion Reasoning
- Unveiling the Mechanism of Continuous Representation Full-Waveform Inversion: A Wave Based Neural Tangent Kernel Framework
- Unveiling the Potential of Diffusion Large Language Model in Controllable Generation
- UP2You: Fast Reconstruction of Yourself from Unconstrained Photo Collections
- UrbanFeel:A Comprehensive Benchmark for Temporal and Perceptual Understanding of City Scenes through Human Perspective
- UrbanGraph: Physics-Informed Spatio-Temporal Dynamic Heterogeneous Graphs for Urban Microclimate Prediction
- UrbanGS: Efficient and Scalable Architecture for Geometrically Accurate Large-Scene Reconstruction
- Urban Socio-Semantic Segmentation with Vision-Language Reasoning
- UrbanVerse: Scaling Urban Simulation by Watching City-Tour Videos
- Use the Online Network If You Can: Towards Fast and Stable Reinforcement Learning
- Using cognitive models to reveal value trade-offs in language models
- Using maximal information auxiliary variables to improve synthetic data generation based on TabPFN foundation models
- Using Reinforcement Learning to Train Large Language Models to Explain Human Decisions
- USTBench: Benchmarking and Dissecting Spatiotemporal Reasoning Capabilities of LLMs as Urban Agents
- V2P-Bench: Evaluating Video-Language Understanding with Visual Prompts for Better Human-Model Interaction
- VADv2: End-to-End Autonomous Driving via Probabilistic Planning
- Value Flows
- Value Gradient Flow: Behavior-Regularized RL without Regularization
- Value Matching: Scalable and Gradient-Free Reward-Guided Flow Adaptation
- VARestorer: One-Step VAR Distillation for Real-World Image Super-Resolution
- Variance-Dependent Regret Lower Bounds for Contextual Bandits
- Variational Autoencoding Discrete Diffusion with Enhanced Dimensional Correlations Modeling
- Variational Deep Learning via Implicit Regularization
- Variational Inference for Cyclic Learning
- Variational Reasoning for Language Models
- Variation-aware Flexible 3D Gaussian Editing
- Variation in Verification: Understanding Verification Dynamics in Large Language Models
- VaseVQA-3D: Benchmarking 3D VLMs on Ancient Greek Pottery
- vAttention: Verified Sparse Attention via Sampling
- vCache: Verified Semantic Prompt Caching
- VCWorld: A Biological World Model for Virtual Cell Simulation
- VEAttack: Downstream-agnostic Vision Encoder Attack against Large Vision Language Models
- VenusX: Unlocking Fine-Grained Functional Understanding of Proteins
- VeriCoT: Neuro-symbolic Chain-of-Thought Validation via Logical Consistency Checks
- VeriEquivBench: An Equivalence Score for Ground-Truth-Free Evaluation of Formally Verifiable Code
- VerifAI-2: The Second Workshop on AI Verification in the Wild
- Verification and Co-Alignment via Heterogeneous Consistency for Preference-Aligned LLM Annotations
- Verification of the Implicit World Model in a Generative Model via Adversarial Sequences
- Verifier-free Test-Time Sampling for Vision Language Action Models
- VERIFY: A Novel Multi-Domain Dataset Grounding LTL in Contextual Natural Language via Provable Intermediate Logic
- VerifyBench: Benchmarking Reference-based Reward Systems for Large Language Models
- Verifying Chain-of-Thought Reasoning via its Computational Graph
- VERINA: Benchmarking Verifiable Code Generation
- VeriRole: Verifiable Role-Awareness through Hint-Guided Reinforcement Learning
- Veritas: Generalizable Deepfake Detection via Pattern-Aware Reasoning
- VeriTrail: Closed-Domain Hallucination Detection with Traceability
- Vertically Unified Agents for Graph Retrieval-Augmented Complex Reasoning
- VER: Vision Expert Transformer for Robot Learning via Foundation Distillation and Dynamic Routing
- VFScale: Intrinsic Reasoning through Verifier-Free Test-time Scalable Diffusion Model
- VGR: Visual Grounded Reasoning
- VibeVoice: Expressive Podcast Generation with Next-Token Diffusion
- Vid2World: Crafting Video Diffusion Models to Interactive World Models
- VidBridge-R1: Bridging QA and Captioning for RL-based Video Understanding Models with Intermediate Proxy Tasks
- VideoAgentTrek: Computer-Use Pretraining from Unlabeled Videos
- VideoAnchor: Reinforcing Subspace-Structured Visual Cues for Coherent Visual-Spatial Reasoning
- Video-As-Prompt: Unified Semantic Control for Video Generation
- VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling
- Video-GPT via Next Clip Diffusion
- VideoJudge: Bootstrapping Enables Scalable Supervision of MLLM-as-a-Judge for Video Understanding
- Video-KTR: Reinforcing Video Reasoning via Key Token Attribution
- Video-LevelGauge: Investigating Contextual Positional Bias in Video Language Models.
- VideoMathQA: Benchmarking Mathematical Reasoning via Multimodal Understanding in Video
- VideoMind: A Chain-of-LoRA Agent for Temporal-Grounded Video Reasoning
- VideoNSA: Native Sparse Attention Scales Video Understanding
- VideoPhy-2: A Challenging Action-Centric Physical Commonsense Evaluation in Video Generation
- VideoReasonBench: Can MLLMs Perform Vision-Centric Complex Video Reasoning?
- Video Scene Segmentation with Genre and Duration Signals
- Video-STAR: Reinforcing Open-Vocabulary Action Recognition with Tools
- Video Unlearning via Low-Rank Refusal Vector
- VideoZoomer: Reinforcement-Learned Temporal Focusing for Long Video Reasoning
- VidGuard-R1: AI-Generated Video Detection and Explanation via Reasoning MLLMs and RL
- Vid-LLM: A Compact Video-based 3D Multimodal LLM with Reconstruction–Reasoning Synergy
- villa-X: Enhancing Latent Action Modeling in Vision-Language-Action Models
- ViMo: A Generative Visual GUI World Model for App Agents
- VINCIE: Unlocking In-context Image Editing from Video
- ViPER: Empowering the Self-Evolution of Visual Perception Abilities in Vision-Language Models
- ViPO: Visual Preference Optimization at Scale
- ViPRA: Video Prediction for Robot Actions
- Virne: A Comprehensive Benchmark for RL-based Network Resource Allocation in NFV
- Virtual Community: An Open World for Humans, Robots, and Society
- VIRTUE: Visual-Interactive Text-Image Universal Embedder
- VisCoder2: Building Multi-Language Visualization Coding Agents
- VisCodex: Unified Multimodal Code Generation via Merging Vision and Coding Models
- VisioMath: Benchmarking Figure-based Mathematical Reasoning in LMMs
- Vision-Language-Action Instruction Tuning: From Understanding to Manipulation
- Vision Language Models are Biased
- VisionLaw: Inferring Interpretable Intrinsic Dynamics from Visual Observations via Bilevel Optimization
- Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models
- VisionReasoner: Unified Reasoning-Integrated Visual Perception via Reinforcement Learning
- VisionTrim: Unified Vision Token Compression for Training-Free MLLM Acceleration
- Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play
- VisJudge-Bench: Aesthetics and Quality Assessment of Visualizations
- Visual Autoregressive Modeling for Instruction-Guided Image Editing
- Visual Backdoor Attacks on MLLM Embodied Decision Making via Contrastive Trigger Learning
- Visual Jigsaw Post-Training Improves MLLMs
- Visual Multi-Agent System: Mitigating Hallucination Snowballing via Visual Flow
- Visual Planning: Let's Think Only with Images
- VisualPRM400K: An Effective Dataset for Training Multimodal Process Reward Models
- Visual Prompt-Agnostic Evolution
- VisualPrompter: Semantic-Aware Prompt Optimization with Visual Feedback for Text-to-Image Synthesis
- Visual Self-Refine: A Pixel-Guided Paradigm for Accurate Chart Parsing
- Visual symbolic mechanisms: Emergent symbol processing in Vision Language Models
- VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models
- VisuRiddles: Fine-grained Perception is a Primary Bottleneck for Multimodal Large Language Models in Abstract Visual Reasoning
- VitaBench: Benchmarking LLM Agents with Versatile Interactive Tasks in Real-world Applications
- VITA: Vision-to-Action Flow Matching Policy
- VITA: Zero-Shot Value Functions via Test-Time Adaptation of Vision–Language Models
- ViTSP: A Vision Language Models Guided Framework for Large-Scale Traveling Salesman Problems
- Vivid-VR: Distilling Concepts from Text-to-Video Diffusion Transformer for Photorealistic Video Restoration
- Vlaser: Vision-Language-Action Model with Synergistic Embodied Reasoning
- VLBiMan: Vision-Language Anchored One-Shot Demonstration Enables Generalizable Bimanual Robotic Manipulation
- VL-JEPA: Joint Embedding Predictive Architecture for Vision-language
- VLM4VLA: Revisiting Vision-Language-Models in Vision-Language-Action Models
- VLMgineer: Vision-Language Models as Robotic Toolsmiths
- VLM-Guided Adaptive Negative Prompting for Creative Generation
- VLM-SubtleBench: How Far Are VLMs from Human-Level Subtle Comparative Reasoning?
- VLSU: Mapping the Limits of Joint Multimodal Understanding for AI Safety
- VMDiff: Visual Mixing Diffusion for Limitless Cross-Object Synthesis
- VMoBA: Mixture-of-Block Attention for Video Diffusion Models
- VoG: Enhancing LLM Reasoning through Stepwise Verification on Knowledge Graphs
- VOGUE: Unified Understanding, Generation, and Editing for Videos
- VoMP: Predicting Volumetric Mechanical Property Fields
- VowelPrompt: Hearing Speech Emotions from Text via Vowel-level Prosodic Augmentation
- VoxPrivacy: A Benchmark for Evaluating Interactional Privacy of Speech Language Models
- VPI-Bench: Visual Prompt Injection Attacks for Computer-Use Agents
- VQ-Transplant: Efficient VQ-Module Integration for Pre-trained Visual Tokenizers
- VSF: Simple, Efficient, and Effective Negative Guidance in Few-Step Image Generation Models By Value Sign Flip
- VTool-R1: VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool Use
- VUDG: A Dataset for Video Understanding Domain Generalization
- Vulcan: Crafting Compact Class-Specific Vision Transformers For Edge Intelligence
- WAFT: Warping-Alone Field Transforms for Optical Flow
- WALT: Web Agents that Learn Tools
- WARC-Bench: Web Archive based Benchmark for GUI Subtask Executions
- WARP: Weight Teleportation for Attack-Resilient Unlearning Protocols
- Watch the Weights: Unsupervised monitoring and control of fine-tuned LLMs
- Watch your steps: Dormant Adversarial Behaviors that Activate upon LLM Finetuning
- WaterDrum: Watermark-based Data-centric Unlearning Metric
- Watermark-based Attribution of AI-Generated Images
- Watermarking Diffusion Language Models
- WATS: Wavelet-Aware Temperature Scaling for Reliable Graph Neural Networks
- WavefrontDiffusion: Dynamic Decoding Schedule for Improved Reasoning
- WAVE: Learning Unified & Versatile Audio-Visual Embeddings with Multimodal LLM
- Wavelet Predictive Representations for Non-Stationary Reinforcement Learning
- WavePolyp: Video Polyp Segmentation via Hierarchical Wavelet-Based Feature Aggregation and Inter-Frame Divergence Perception
- wd1: Weighted Policy Optimization for Reasoning in Diffusion Language Models
- Weak Correlations as the Underlying Principle for Linearization of Gradient-Based Learning Systems
- Weak-to-Strong Diffusion
- Weak-to-Strong Generalization with Failure Trajectories
- WearVox: An Egocentric Multichannel Voice Assistant Benchmark for Wearables
- WebArbiter: A Generative Reasoning Process Reward Model for Web Agents
- Web-CogReasoner: Towards Knowledge-Induced Cognitive Reasoning for Web Agents
- WebDevJudge: Evaluating (M)LLMs as Critiques for Web Development Quality
- WebDS: An End-to-End Benchmark for Web-based Data Science
- WebFactory: Automated Compression of Foundational Language Intelligence into Grounded Web Agents
- WebGen-Agent: Enhancing Interactive Website Generation with Multi-Level Feedback and Step-Level Reinforcement Learning
- WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data and Scalable Reinforcement Learning
- Webscale-RL: Automated Data Pipeline for Scaling RL Data to Pretraining Levels
- WebSeer: Training Deeper Search Agents through Reinforcement Learning with Self-Reflection
- WebShaper: Agentically Data Synthesizing via Information-Seeking Formalization
- WebWatcher: Breaking New Frontiers of Vision-Language Deep Research Agent
- WebWeaver: Structuring Web-Scale Evidence with Dynamic Outlines for Open-Ended Deep Research
- W-EDIT: A Wavelet-Based Frequency-Aware Framework for Text-Driven Image Editing
- Weight Decay may matter more than µP for Learning Rate Transfer in Practice
- Weight-Space Linear Recurrent Neural Networks
- Weight Space Representation Learning on Diverse NeRF Architectures
- Welfarist Formulations for Diverse Similarity Search
- We-Math 2.0: A Versatile MathBook System for Incentivizing Visual Mathematical Reasoning
- WeTok: Powerful Discrete Tokenization for High-Fidelity Visual Reconstruction
- WFR-FM: Simulation-Free Dynamic Unbalanced Optimal Transport
- What Do Large Language Models Know About Opinions?
- Whatever Remains Must Be True: Filtering Drives Reasoning in LLMs, Shaping Diversity
- What Exactly Does Guidance Do in Masked Discrete Diffusion Models
- What Generative Search Engines Like and How to Optimize Web Content Cooperatively
- What Happens Next? Anticipating Future Motion by Generating Point Trajectories
- What happens when generative AI models train recursively on each others' outputs?
- What Layers When: Learning to Skip Compute in LLMs with Residual Gates
- What Lies Beyond the View? Actively Constructing Spatial Beliefs in Foundation Models
- What Matters for Batch Online Reinforcement Learning in Robotics?
- What Matters for Bioacoustic Encoding
- What matters for Representation Alignment: Global Information or Spatial Structure?
- What "Not" to Detect: Negation-Aware VLMs via Structured Reasoning and Token Merging
- What Scales in Cross-Entropy Scaling Law?
- What's In My Human Feedback? Learning Interpretable Descriptions of Preference Data
- What's the plan? Metrics for implicit planning in LLMs and their application to rhyme generation
- When Agents “Misremember” Collectively: Exploring the Mandela Effect in LLM-based Multi-Agent Systems
- When and Where to Reset Matters for Long-Term Test-Time Adaptation
- When a Robot is More Capable than a Human: Learning from Constrained Demonstrators
- When Bias Helps Learning: Bridging Initial Prejudice and Trainability
- When Data is the Algorithm: A Systematic Study and Curation of Preference Optimization Datasets
- When Does Divide and Conquer Work for Long Context LLM? A Noise Decomposition Framework
- When Flatness Does (Not) Guarantee Adversarial Robustness
- When Foundation Models are One-Liners: Limitations and Future Directions for Time Series Anomaly Detection
- When Greedy Wins: Emergent Exploitation Bias in Meta-Bandit LLM Training
- When Is Diversity Rewarded in Cooperative Multi-Agent Learning?
- When Language Models Lose Their Mind: The Consequences of Brain Misalignment
- When Large Multimodal Models Confront Evolving Knowledge: Challenges and Explorations
- When LLMs get significantly worse: A statistical approach to detect model degradations
- When Machine Learning Gets Personal: Evaluating Prediction and Explanation
- When MLLMs Meets Compression Distortion: A Coding Paradigm Tailored to MLLMs
- When More is Less: Understanding Chain-of-Thought Length in LLMs
- When Priors Backfire: On the Vulnerability of Unlearnable Examples to Pretraining
- When Reasoning Meets Compression: Understanding the Effects of LLMs Compression on Large Reasoning Models
- When Scores Learn Geometry: Rate Separations under the Manifold Hypothesis
- When Shift Happens - Confounding Is to Blame
- When Silence Is Golden: Can LLMs Learn to Abstain in Temporal QA and Beyond?
- When Style Breaks Safety: Defending LLMs Against Superficial Style Alignment
- When Thinking Backfires: Mechanistic Insights into Reason-induced Misalignment
- When to Ensemble: Identifying Token-Level Points for Stable and Fast LLM Ensembling
- When to Retrain after Drift: A Data-Only Test of Post-Drift Data Size Sufficiency
- When to use Graphs in RAG: A Comprehensive Analysis for Graph Retrieval-Augmented Generation
- When Weak LLMs Speak with Confidence, Preference Alignment Gets Stronger
- Where Did It Go Wrong? Attributing Undesirable LLM Behaviors via Representation Gradient Tracing
- Where Did This Sentence Come From? Tracing Provenance in LLM Reasoning Distillation
- WholeBodyVLA: Towards Unified Latent VLA for Whole-body Loco-manipulation Control
- Who Matters Matters: Agent-Specific Conservative Offline MARL
- Why Adversarially Train Diffusion Models?
- Why Ask One When You Can Ask $k$? Learning-to-Defer to the Top-$k$ Experts
- Why Attention Patterns Exist: A Unifying Temporal Perspective Analysis
- Why Do Unlearnable Examples Work: A Novel Perspective of Mutual Information
- Why DPO is a Misspecified Estimator and How to Fix It
- Why High-rank Neural Networks Generalize?: An Algebraic Framework with RKHSs
- Why is Your Language Model a Poor Implicit Reward Model?
- Why Keep Your Doubts to Yourself? Trading Visual Uncertainties in Multi-Agent Bandit Systems
- Why Less is More (Sometimes): A Theory of Data Curation
- Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention
- Why Prototypes Collapse: Diagnosing and Preventing Partial Collapse in Prototypical Self-Supervised Learning
- Why Reinforcement Fine-Tuning Enables MLLMs Preserve Prior Knowledge Better: A Data Perspective
- Why We Need New Benchmarks for Local Intrinsic Dimension Estimation
- Wide-In, Narrow-Out: Revokable Decoding for Efficient and Effective DLLMs
- WideSearch: Benchmarking Agentic Broad Info-Seeking
- Wiki-R1: Incentivizing Multimodal Reasoning for Knowledge-based VQA via Data and Sampling Curriculum
- WILD-Diffusion: A WDRO Inspired Training Method for Diffusion Models under Limited Data
- Will AI Tell Lies to Save Sick Children? Litmus-Testing AI Values Prioritization with AIRiskDilemmas
- WIMFRIS: WIndow Mamba Fusion and Parameter Efficient Tuning for Referring Image Segmentation
- WIMLE: Uncertainty‑Aware World Models with IMLE for Sample‑Efficient Continuous Control
- WINA: Weight Informed Neuron Activation for Accelerating Large Language Model Inference
- WinT3R: Window-Based Streaming Reconstruction with Camera Token Pool
- Winter Soldier: Backdooring Language Models at Pre-Training with Indirect Data Poisoning
- WithAnyone: Toward Controllable and ID Consistent Image Generation
- WMPO: World Model-based Policy Optimization for Vision-Language-Action Models
- Workshop on Logical Reasoning of Large Language Models
- Workshop on Multi-Agent Learning and Its Opportunities in the Era of Generative AI
- Workshop on Scaling Post-training for LLMs (SPOT)
- World2Minecraft: Occupancy-Driven simulated scenes Construction
- WorldEdit: Towards Open-World Image Editing with a Knowledge-Informed Benchmark
- WorldGym: World Model as An Environment for Policy Evaluation
- WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs
- WorldSplat: Gaussian-Centric Feed-Forward 4D Scene Generation for Autonomous Driving
- WorldTree: Towards 4D Dynamic Worlds from Monocular Video using Tree-Chains
- WOW-Seg: A Word-free Open World Segmentation Model
- WoW!: World Models in a Closed-Loop World
- WRING Out The Bias: A Rotation-Based Alternative To Projection Debiasing
- WSM: Decay-Free Learning Rate Schedule via Checkpoint Merging for LLM Pre-training
- WSVD: Weighted Low-Rank Approximation for Fast and Efficient Execution of Low-Precision Vision-Language Models
- XIL: Cross-Expanding Incremental Learning
- xLSTM Scaling Laws: Competitive Performance with Linear Time-Complexity
- XModBench: Benchmarking Cross-Modal Capabilities and Consistency in Omni-Language Models
- XQC: Well-conditioned Optimization Accelerates Deep Reinforcement Learning
- xRFM: Accurate, scalable, and interpretable feature learning models for tabular data
- X-VLA: Soft-Prompted Transformer as Scalable Cross-Embodiment Vision-Language-Action Model
- YoNoSplat: You Only Need One Model for Feedforward 3D Gaussian Splatting
- You Point, I Learn: Online Adaptation of Interactive Segmentation Models for Handling Distribution Shifts in Medical Imaging
- Your Agent May Misevolve: Emergent Risks in Self-evolving LLM Agents
- Your Language Model Secretly Contains Personality Subnetworks
- Your Models Have Thought Enough: Training Large Reasoning Models to Stop Overthinking
- Your VAR Model is Secretly an Efficient and Explainable Generative Classifier
- YuE: Scaling Open Foundation Models for Long-Form Music Generation
- Zebra-CoT: A Dataset for Interleaved Vision-Language Reasoning
- Zephyrus: An Agentic Framework for Weather Science
- ZeroGR: A Generalizable and Scalable Framework for Zero-Shot Generative Retrieval
- Zero-Sacrifice Lifelong Adversarial Defense for Pre-Trained Encoders
- Zeros can be Informative: Masked Binary U-Net for Image Segmentation on Tensor Cores
- Zero-Shot Adaptation of Behavioral Foundation Models to Unseen Dynamics
- Zero-shot Forecasting by Simulation Alone
- Zero-shot HOI Detection with MLLM-based Detector-agnostic Interaction Recognition
- Zero-shot Human Pose Estimation using Diffusion-based Inverse solvers
- ZeroSiam: An Efficient Siamese for Test-Time Entropy Optimization without Collapse
- ZeroTuning: Unlocking the Initial Token's Power to Enhance Large Language Models Without Training
- ZIP-RC: Zero-overhead Inference-time Prediction of Reward and Cost for Adaptive and Interpretable Generation
Successful Page Load