Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Deep Generative Model in Machine Learning: Theory, Principle and Efficacy

Efficient Consistency Model Training for Policy Distillation in Reinforcement Learning

Bowen Fang · Xuan Di

Keywords: [ Policy Gradient ] [ Probability Flow-ODE ] [ Consistency Model ] [ Reinforcement Learning ] [ Efficiency ]


Abstract:

This paper proposes an efficient consistency model (CM) training scheme in the context of reinforcement learning (RL). More specifically, we leverage Probability Flow ODE (PF-ODE) and introduce two novel loss functions to improve CM training for RL policy distillation. We propose Importance Weighting (IW) and Gumbel-Based Sampling (GBS) as strategies to refine policy learning under limited sampling budgets. Our approach enables efficient training by directly incorporating probability estimates, which reduces variance and improves sample efficiency. The numerical experiments demonstrate that our method outperforms conventional CM training, achieving more accurate policy representations under limited samples. These findings highlight the potential of CMs as an efficient alternative for policy optimization in RL.

Chat is not available.