Skip to yearly menu bar Skip to main content


Poster

On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting

Wenhao Zhang · Yuexiang Xie · Yuchang Sun · Yanxi Chen · Guoyin Wang · Yaliang Li · Bolin Ding · Jingren Zhou

Abstract

Log in and register to view live content