Skip to yearly menu bar Skip to main content


Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-constraint

Wei Xiong · Hanze Dong · Chenlu Ye · Ziqi Wang · Han Zhong · Heng Ji · Nan Jiang · Tong Zhang

Abstract

Chat is not available.