Unifying Autoregressive and Discrete Diffusion Language Modeling via Cross-Regressive Decoding
Dmitry Abulkhanov ⋅ Daniil Strizhakov ⋅ Maxim Panov
Abstract
Autoregressive language models are trained to generate text one token at a time, causing inference latency and cost to scale linearly with output length. However, modern large language models often exhibit semi-autoregressive predictive capabilities, frequently aided by speculative decoding or other multi-token prediction methods. In contrast, discrete diffusion models promise parallel text generation but fundamentally struggle to model sequential correlations due to their reliance on mean-field approximations, ignoring the causality inherent in natural language. We introduce $\textbf{Cross-Regression}$, an approach aimed at achieving true hybridization of autoregressive and discrete diffusion sequence modeling. Cross-Regression using a parallel $\textit{predictive stream}$ coupled to exact causal probabilities from a $\textit{control stream}$. At inference time, Cross-Regression computes proposal and verification signals jointly in a single shared forward pass, using residual-energy acceptance to early-accept multiple tokens and a residual correction step to avoid discarding computation after mismatches. The method provides an explicit knob between $\textit{lossless sampling}$ and a faster $\textit{lossy regime}$ with controllable deviation. Across models from 1.5B to 70B parameters, we observe strong scaling of acceptance length and realize $\(3\)–\(6\times\)$ speedups with near-complete quality retention across reasoning, code, and dialogue benchmarks, and we demonstrate $\textbf{modality transfer}$ by accelerating Whisper decoding.
Chat is not available.
Successful Page Load