Next-ToBE: Probabilistic Next Token-Bag Exploitation for Activating Anticipatory Capacity in LLMs
Abstract
Auto-regressive large language models (LLMs) have achieved remarkable success recently. Though trained to predict only one token at a time, LLMs intriguingly exhibit longer-term foresight and a degree of anticipatory capacity. Yet, how to profile, enhance and leverage this capacity to improve reasoning performance remains an open question. In this paper, we propose Next Token-Bag Exploitation (Next-ToBE), a simple yet effective method to tackle the challenges. Next-ToBE quantifies LLM’s anticipatory capacity by measuring how well tokens in the future window are pre-captured within the model’s current prediction. Empirically, this capacity strongly correlates with the model’s generative quality, but it is often suppressed by the rigid one-hot objective in next-token prediction. To address this, Next-ToBE replaces the one-hot target vector in the next-token prediction paradigm with a soft target distribution spanning additional future tokens beyond the current step. In this formulation, the immediate next token retains the highest importance, while more distant "look-ahead tokens" are also included to enrich supervision, with their importance dynamically determined by temporal and semantic relevance patterns. Furthermore, the fitting process emphasizes the model’s intrinsic anticipatory tendencies, thus preserving the confidence and fidelity of the original pre-trained model while also improving training stability. Overall, Next-ToBE effectively activates the anticipatory capacity of LLMs, yielding up to a 3.9\% absolute accuracy gain over MTP baselines on complex reasoning benchmarks (math, code, and commonsense reasoning), while reducing peak memory consumption by as much as 68\%. This highlights its value as a scalable and lightweight strategy to make LLM see further and reason more effectively.