S$^2$-Guidance: Stochastic Self-Guidance for Training-Free Enhancement of Diffusion Models
Chubin Chen · Jiashu Zhu · Xiaokun Feng · Nisha Huang · Chen Zhu · Meiqi Wu · Fangyuan Mao · Jiahong Wu · Xiangxiang Chu · Xiu Li
Abstract
Classifier-free Guidance (CFG) is a widely used technique in modern diffusion models for generating high-quality samples. However, through an empirical analysis on both Gaussian mixture models with closed-form solutions and real-world data distributions, we observe a discrepancy between the suboptimal results produced by CFG and the ground truth. The model's excessive reliance on these suboptimal predictions often leads to low fidelity and semantic incoherence. To address this issue, we first empirically demonstrate that the model's suboptimal predictions can be effectively refined using sub-networks of the model itself, without requiring additional training or the integration of external modules. Building on this insight, we propose **$S^2$-Guidance ($S$tochastic $S$elf-Guidance)**, a novel method that leverages stochastic block-dropping during the denoising process to construct sub-networks. This approach effectively guides the model away from potential low-quality predictions, thereby improving sample quality. Extensive qualitative and quantitative experiments across multiple standard benchmarks for text-to-image and text-to-video generation tasks demonstrate that **$S^2$-Guidance** delivers superior performance, consistently surpassing CFG and other advanced guidance strategies. Our code will be released.
Successful Page Load