Oral
in
Workshop: Deep Generative Model in Machine Learning: Theory, Principle and Efficacy
The Diffusion Duality
Subham Sahoo · Justin Deschenaux · Aaron Gokaslan · Guanghan Wang · Justin Chiu · Volodymyr Kuleshov
Keywords: [ Language Models ] [ Discrete Diffusion ] [ Diffusion Models ]
Abstract:
Discrete diffusions models have been demonstrated to be surprisingly strong language models.In this work, we show that discrete diffusion language models can be further improved by adapting methods from continuous-state diffusion models.We establish a core property of uniform state diffusion: it stems from an underlying Gaussian diffusion process.This property allows us to improve both training by utilizing a curriculum learning strategy that reduces training variance and leads to $\mathbf{2\times}$ faster convergence, as well as sampling by adapting efficient distillation methods from continuous-state diffusion models.As a result, models surpass an autoregressive model’s zero-shot perplexity on 3 out of 7 benchmarks and we manage to reduce the sampling steps by **two orders of magnitude** while preserving sample quality.
Chat is not available.
Successful Page Load