In-Person Poster presentation / poster accept

Backstepping Temporal Difference Learning

Han-Dong Lim · Donghwan Lee

Keywords: policy evaluation temporal difference learning reinforcement learning Reinforcement Learning

2023 In-Person Poster presentation / poster accept

[ Poster] [ OpenReview]

Abstract

Off-policy learning ability is an important feature of reinforcement learning (RL) for practical applications. However, even one of the most elementary RL algorithms, temporal-difference (TD) learning, is known to suffer form divergence issue when the off-policy scheme is used together with linear function approximation. To overcome the divergent behavior, several off-policy TD learning algorithms have been developed until now. In this work, we provide a unified view of such algorithms from a purely control-theoretic perspective. Our method relies on the backstepping technique, which is widely used in nonlinear control theory.

Video

Chat is not available.