Skip to yearly menu bar Skip to main content


Poster

Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs

Xumeng Wen · Zihan Liu · Shun Zheng · Shengyu Ye · Zhirong Wu · Yang Wang · Zhijian Xu · Xiao Liang · Junjie Li · Ziming Miao · Jiang Bian · Mao Yang

Abstract

Log in and register to view live content