STEDiff: Revealing the Spatial and Temporal Redundancy of Backdoor Attacks in Text-to-Image Diffusion Models
Abstract
Recently, diffusion models have been recognized as state-of-the-art models for image generation due to their ability to produce high-quality images. However, recent studies have shown that diffusion models are susceptible to backdoor attacks, where an attacker can activate hidden biases using a specific trigger pattern, causing the model to generate a predefined target. Fortunately, executing backdoor attacks is still challenging, as they typically require substantial time and memory to perform parameter-based fine-tuning. In this paper, we are the first to reveal the spatio-temporal redundancy in backdoor attacks on diffusion models. Regarding spatial redundancy, we observed the enrichment phenomenon, which reflects the abnormal gradient accumulation induced by backdoor injection. Regarding temporal redundancy, we observed a marginal effect associated with specific time steps, indicating that only a limited subset of time steps plays a critical role in backdoor injection. Building on these findings, we present a novel framework, STEDiff, comprising two key components: STEBA and STEDF. STEBA is a spatio-temporally efficient accelerated attack strategy that achieves up to 15.07× speedup in backdoor injection while reducing GPU memory usage by 82%. STEDF is a detection framework leveraging spatio-temporal features, by modeling the enrichment phenomenon in weights and anisotropy across time steps, which achieves a backdoor detection rate of up to 99.8%. Our code is available at: https://anonymous.4open.science/r/STEDiff-9E9F/.