Backstepping Temporal Difference Learning
Han-Dong Lim · Donghwan Lee
Keywords:
policy evaluation
temporal difference learning
reinforcement learning
Reinforcement Learning
2023 In-Person Poster presentation / poster accept
Abstract
Off-policy learning ability is an important feature of reinforcement learning (RL) for practical applications. However, even one of the most elementary RL algorithms, temporal-difference (TD) learning, is known to suffer form divergence issue when the off-policy scheme is used together with linear function approximation. To overcome the divergent behavior, several off-policy TD learning algorithms have been developed until now. In this work, we provide a unified view of such algorithms from a purely control-theoretic perspective. Our method relies on the backstepping technique, which is widely used in nonlinear control theory.
Video
Chat is not available.
Successful Page Load