Skip to yearly menu bar Skip to main content


Poster

RESTRAIN: From Spurious Votes to Signals — Self-Training RL with Self-Penalization

Zhaoning Yu · Zhaolun Su · Leitian Tao · Haozhu Wang · Aashu Singh · Hanchao Yu · Jianyu Wang · Hongyang Gao · Weizhe Yuan · Jason E Weston · Ping Yu · Jing Xu

Abstract

Log in and register to view live content