Poster
in
Workshop: Deep Generative Model in Machine Learning: Theory, Principle and Efficacy
Image-Alchemy : Advancing Subject Fidelity in Personalized Text-to-Image Generation
Kaustubh Sharma · Ojasva Nema · Amritanshu Tiwari · Cherish Puniani
Keywords: [ Deep generative models ] [ Catastrophic Forgetting ] [ Text-to-Image Synthesis ] [ LoRA Fine-tuning ] [ Personalized Image Generation ] [ Latent Diffusion Models ]
Recent advances in text-to-image diffusion models, particularly Stable Diffusion, have enabled the generation of highly detailed and semantically rich images. However, personalizing these models to represent novel subjects based on a few reference images remains challenging. This often leads to catastrophic forgetting, over-fitting, or large computational overhead. We propose a two-stage pipeline that addresses these limitations by leveraging LoRA-based fine-tuning on the attention weights within the U-net of Stable Diffusion XL model. Next, we exploit the unmodified SDXL to generate a generic scene, replacing the subject with its class label. Then we selectively insert the personalized subject through a segmentation-driven Img2Img pipeline that uses the trained LoRA weights. The framework isolates the subject encoding from the overall composition, thus preserving SDXL’s broader generative capabilities while integrating the new subject in a high-fidelity manner. Our method achieves a DINO similarity score of 0.789 on SDXL, outperforming existing personalized text-to-image approaches. Our code is available at (link hidden due to anonymity)