Skip to yearly menu bar Skip to main content


Offline Reinforcement Learning for LLM Multi-Step Reasoning

Huaijie Wang · Shibo Hao · Hanze Dong · Shenao Zhang · Yilin Bao · Ziran Yang · Yi Wu

Abstract

Chat is not available.