Keynote Talk: Junxian He (HKUST): Taming Reinforcement Learning for Effective and Efficient Reasoners
2025
in
Workshop: Workshop on Reasoning and Planning for Large Language Models
in
Workshop: Workshop on Reasoning and Planning for Large Language Models
Abstract
In this talk, I will begin by presenting our recent work, SimpleRL-Zoo, where we explore the R1-like zero Reinforcement Learning (RL) paradigm across 10 diverse base models. I will highlight the key components of a successful RL recipe for various base models and share some intriguing findings. For instance, we discovered that an increase in response length does not always correlate with the emergence of specific cognitive behaviors such as verification (i.e., the "aha moment"). In the second part of the talk, I will present a unified view for existing efficient reasoning works and propose a dynamic, difficulty-aware length shaping approach, which facilitates cost-effective reasoning through reinforcement learning.
Video
Chat is not available.
Successful Page Load