Skip to yearly menu bar Skip to main content


Poster

Mitigating Reward Over-Optimization in RLHF via Behavior-Supported Regularization

Juntao Dai ⋅ Taiye Chen ⋅ Yaodong Yang ⋅ Qian Zheng ⋅ Gang Pan
2025 Poster

Abstract

Video

Chat is not available.