Skip to yearly menu bar Skip to main content


Poster

Asymmetric Proximal Policy Optimization: mini-critics boost LLM reasoning

Jiashun Liu · Johan S Obando Ceron · Han Lu · Yancheng He · Weixun Wang · wenbo su · Bo Zheng · Pablo Samuel Castro · Aaron Courville · Ling Pan

Abstract

Log in and register to view live content