Skip to yearly menu bar Skip to main content


(6 events)   Timezone:  
Show all
Toggle Poster Visibility
Oral
Sat Apr 26 12:30 AM -- 12:42 AM (PDT) @ Hall 1 Apex None
Training Language Models to Self-Correct via Reinforcement Learning
Aviral Kumar · Vincent Zhuang · Rishabh Agarwal · Yi Su · JD Co-Reyes · Avi Singh · Kate Baumli · Shariq Iqbal · Colton Bishop · Rebecca Roelofs · Lei Zhang · Kay McKinney · Disha Shrivastava · Cosmin Paduraru · George Tucker · Doina Precup · Feryal Behbahani · Aleksandra Faust
[ OpenReview
Oral
Sat Apr 26 12:42 AM -- 12:54 AM (PDT) @ Hall 1 Apex None
Reasoning Elicitation in Language Models via Counterfactual Feedback
Alihan Hüyük · Xinnuo Xu · Jacqueline Maasch · Aditya Nori · Javier Hernandez
[ OpenReview
Oral
Sat Apr 26 12:54 AM -- 01:06 AM (PDT) @ Hall 1 Apex None
Self-Improvement in Language Models: The Sharpening Mechanism
Audrey Huang · Adam Block · Dylan Foster · Dhruv Rohatgi · Cyril Zhang · Max Simchowitz · Jordan Ash · Akshay Krishnamurthy
[ OpenReview
Oral
Sat Apr 26 01:06 AM -- 01:18 AM (PDT) @ Hall 1 Apex None
ReGenesis: LLMs can Grow into Reasoning Generalists via Self-Improvement
XIANGYU PENG · Congying Xia · Xinyi Yang · Caiming Xiong · Chien-Sheng Wu · Chen Xing
[ OpenReview
Oral
Sat Apr 26 01:18 AM -- 01:30 AM (PDT) @ Hall 1 Apex None
Mind the Gap: Examining the Self-Improvement Capabilities of Large Language Models
Yuda Song · Hanlin Zhang · Carson Eisenach · Sham Kakade · Dean Foster · Udaya Ghai
[ OpenReview
Oral
Sat Apr 26 01:30 AM -- 01:42 AM (PDT) @ Hall 1 Apex None
Learning Dynamics of LLM Finetuning
YI REN · Danica Sutherland
[ Slides [ OpenReview