Skip to yearly menu bar Skip to main content


Poster

Learning What Reinforcement Learning Can't: Interleaved Online Fine-Tuning for Hardest Questions

Lu Ma · Hao Liang · Meiyi Qiang · Lexiang Tang · Xiaochen Ma · Zhen Wong · Junbo Niu · Chengyu Shen · Runming He · Yanhao Li · Wentao Zhang · Bin CUI

Abstract

Log in and register to view live content