Skip to yearly menu bar Skip to main content


Poster
in
Workshop: SCOPE: SCALABLE OPTIMIZATION FOR EFFICIENT AND ADPATIVE FOUNDATION MODELS

Margin-aware Preference Optimization for Aligning Diffusion Models without Reference

Jiwoo Hong · Sayak Paul · Noah Lee · Kashif Rasul · James Thorne · Jongheon Jeong

Keywords: [ text-to-image diffusion models ] [ preference alignment ]


Abstract:

Preference alignment methods (such as DPO) typically rely on divergence regularization for stability but struggle with reference mismatch when preference data deviates from the reference model. In this paper, we identify the negative impacts of reference mismatch in aligning text-to-image (T2I) diffusion models. Motivated by this analysis, we propose a reference-agnostic alignment of T2I diffusion models, coined margin-aware preference optimization (MaPO). By freeing the reference model, MaPO enables a new way to address diverse T2I downstream tasks, with varying levels of reference mismatch.. We validate this with five representative T2I tasks: (1) preference alignment, (2) cultural representation, (3) safe generation, (4) style learning, and (5) personalization. MaPO surpasses Diffusion DPO as the level of reference mismatch starts to increase while also being superior to task-specific methods like DreamBooth. Additionally, MaPO enjoys being more efficient in both training time and memory without compromising quality.

Chat is not available.