Quantization-Aware Diffusion Models For Maximum Likelihood Training
Abstract
Diffusion models are powerful generative models for continuous signals, such as images and videos. However, real-world digital data are quantized; hence, they take not continuous values but only a finite set of discrete values. For example, pixels in 8‑bit images can take only 256 discrete values. In existing diffusion models, quantization is either ignored by treating data as continuous, or handled by adding small noise to make the data continuous. Neither approach guarantees that samples from the model will converge to the finite set of quantized points. In this work, we propose a methodology to explicitly account for quantization within diffusion models. Specifically, by adopting a particular form of parameterization, we guarantee that samples from the reverse diffusion process converge to quantized points. In experiments, we demonstrate that our quantization-aware model can substantially improve the performance of diffusion models for density estimation, and achieve state‑of‑the‑art results on pixel‑level image generation in likelihood evaluation. In particular, for CIFAR‑10 image generation, the negative log‑likelihood improves substantially from 2.42 to 0.27, approaching the theoretical lower bound.