Skip to yearly menu bar Skip to main content


Principled Reinforcement Learning with Human Feedback from Pairwise or $K$-wise Comparisons

Banghua Zhu ⋅ Jiantao Jiao ⋅ Michael Jordan

Abstract

Video

Chat is not available.