Poster
in
Workshop: SCOPE: SCALABLE OPTIMIZATION FOR EFFICIENT AND ADPATIVE FOUNDATION MODELS
Margin-aware Preference Optimization for Aligning Diffusion Models without Reference
Jiwoo Hong · Sayak Paul · Noah Lee · Kashif Rasul · James Thorne · Jongheon Jeong
Keywords: [ text-to-image diffusion models ] [ preference alignment ]
Preference alignment methods (such as DPO) typically rely on divergence regularization for stability but struggle with reference mismatch when preference data deviates from the reference model. In this paper, we identify the negative impacts of reference mismatch in aligning text-to-image (T2I) diffusion models. Motivated by this analysis, we propose a reference-agnostic alignment of T2I diffusion models, coined margin-aware preference optimization (MaPO). By freeing the reference model, MaPO enables a new way to address diverse T2I downstream tasks, with varying levels of reference mismatch.. We validate this with five representative T2I tasks: (1) preference alignment, (2) cultural representation, (3) safe generation, (4) style learning, and (5) personalization. MaPO surpasses Diffusion DPO as the level of reference mismatch starts to increase while also being superior to task-specific methods like DreamBooth. Additionally, MaPO enjoys being more efficient in both training time and memory without compromising quality.