Poster
in
Workshop: ICLR 2026 Workshop on Multimodal Intelligence: Next Token Prediction and Beyond

Unifying Autoregressive and Discrete Diffusion Language Modeling via Cross-Regressive Decoding

Dmitry Abulkhanov ⋅ Daniil Strizhakov ⋅ Maxim Panov

Project Page [ OpenReview]

Abstract

Autoregressive language models are trained to generate text one token at a time, causing inference latency and cost to scale linearly with output length. However, modern large language models often exhibit semi-autoregressive predictive capabilities, frequently aided by speculative decoding or other multi-token prediction methods. In contrast, discrete diffusion models promise parallel text generation but fundamentally struggle to model sequential correlations due to their reliance on mean-field approximations, ignoring the causality inherent in natural language. We introduce $\textbf{Cross-Regression}$, an approach aimed at achieving true hybridization of autoregressive and discrete diffusion sequence modeling. Cross-Regression using a parallel $\textit{predictive stream}$ coupled to exact causal probabilities from a $\textit{control stream}$. At inference time, Cross-Regression computes proposal and verification signals jointly in a single shared forward pass, using residual-energy acceptance to early-accept multiple tokens and a residual correction step to avoid discarding computation after mismatches. The method provides an explicit knob between $\textit{lossless sampling}$ and a faster $\textit{lossy regime}$ with controllable deviation. Across models from 1.5B to 70B parameters, we observe strong scaling of acceptance length and realize $$3$–$6\times$$ speedups with near-complete quality retention across reasoning, code, and dialogue benchmarks, and we demonstrate $\textbf{modality transfer}$ by accelerating Whisper decoding.

Chat is not available.