Poster
in
Workshop: Deep Generative Model in Machine Learning: Theory, Principle and Efficacy
Latent Diffusion U-Net Representations Contain Positional Embeddings and Anomalies
Jonas Loos · Lorenz Linhardt
Keywords: [ anomaly ] [ latent diffusion ] [ position embedding ] [ stable diffusion ] [ representation analysis ] [ image generation ]
Text-conditioned image diffusion models have demonstrated remarkable capabilities in synthesizing realistic images, spurring growing interest in using their internal representations for various downstream tasks. To better understand the robustness of these representations, we analyze popular Stable Diffusion models using representational similarity and norms. Our findings reveal three phenomena: (1) the presence of a learned positional embedding in intermediate representations, (2) high-similarity corner artifacts, and (3) anomalous high-norm artifacts. These findings underscore the need to further investigate the properties of diffusion model representations, particularly before considering them for downstream tasks that require robust features of high spatial fidelity.