Skip to yearly menu bar Skip to main content


ByteDance

Expo Talk Panel

verl: Flexible and Efficient Infrastructures for Post-training LLMs

Qiying Yu · Haibin Lin · Yuxuan Tong


Abstract:

Recent advances in reinforcement learning significantly boosts the reasoning capabilities of LLMs. Models such as OpenAI o3, Claude 3.7, DeepSeek r1, etc,. demonstrates magnificent performance in STEM and coding tasks. Yet, training such models requires complex infrastructures. In this talk, we present verl (https://github.com/volcengine/verl), a comprehensive framework that utilizes HybridFlow programming abstraction to achieve both flexibility to implement various algorithms and high performance. Through this talk, audiences will gain i) a basic understanding of various RL algorithms including PPO and GRPO; ii) best practices to train state-of-the-art open source language models and vision language models such as QWen series using verl. iii) best practices to implement tool calling and multi-turn rollout.

Chat is not available.