Poster
in
Workshop: Workshop on Spurious Correlation and Shortcut Learning: Foundations and Solutions
Multi-token prediction boosts creativity in algorithmic tasks
Vaishnavh Nagarajan · Chen Wu · Charles Ding · Aditi Raghunathan
Keywords: [ next-token prediction ] [ planning ] [ multi-token prediction ] [ creativity ] [ diffusion ] [ short cuts ]
In open-ended tasks --- such as designing word problems or discovering novel proofs --- the goal is not only correctness but also diversity and originality. Often, this requires a far-sighted, creative leap of thought. We argue that this requirement is misaligned with the objective of next-token prediction (NTP). To formulate our intuition, we design a suite of minimal algorithmic tasks loosely based on real-world creative endeavors. Concretely, our tasks require an open-ended stochastic planning step that (a) discovers new connections in a knowledge graph (loosely inspired by word-play, humor or drawing analogies) or (b) constructs new patterns (loosely inspired by constructing word problems, puzzles or mysteries). We then conceptually and empirically argue how NTP leads to myopic shortcut-learning and excessive memorization, limiting its ability to generate novel solutions. In contrast, we find that multi-token approaches, namely teacherless training and diffusion models, can overcome these limitations and comparatively excel on our algorithmic test-bed. Orthogonally, we find that creativity in our tasks is greatly improved by training with a random hash prefix (which we dub as ``{hash-conditioning''). Thus our work offers a principled, minimal test-bed for studying open-ended forms of intelligence and also a new angle to take a more serious interest in the paradigm of multi-token prediction.