Skip to yearly menu bar Skip to main content


In-Person Poster presentation / poster accept

Backstepping Temporal Difference Learning

Han-Dong Lim · Donghwan Lee

MH1-2-3-4 #101

Keywords: [ Reinforcement Learning ] [ reinforcement learning ] [ temporal difference learning ] [ policy evaluation ]


Abstract:

Off-policy learning ability is an important feature of reinforcement learning (RL) for practical applications. However, even one of the most elementary RL algorithms, temporal-difference (TD) learning, is known to suffer form divergence issue when the off-policy scheme is used together with linear function approximation. To overcome the divergent behavior, several off-policy TD learning algorithms have been developed until now. In this work, we provide a unified view of such algorithms from a purely control-theoretic perspective. Our method relies on the backstepping technique, which is widely used in nonlinear control theory.

Chat is not available.