Skip to yearly menu bar Skip to main content


Reward-Augmented Data Enhances Direct Preference Alignment of LLMs

Shenao Zhang · Zhihan Liu · Boyi Liu · Yufeng Zhang · Yingxiang Yang · Yongfei Liu · Liyu Chen · TAO SUN · Zhaoran Wang

Abstract

Video

Chat is not available.