Finite-Time Analysis of Actor-Critic Methods with Deep Neural Network Approximation
Xuyang Chen · Fengzhuo Zhang · Keyu Yan · Lin Zhao
Abstract
Actor–critic (AC) algorithms underpin many of today’s most successful reinforcement learning (RL) applications, yet their finite-time convergence in realistic settings remains largely underexplored. Existing analyses often rely on oversimplified formulations and are largely confined to linear function approximation. In practice, however, nonlinear approximations with deep neural networks dominate AC implementations, leaving a substantial gap between theory and practice. In this work, we provide the first finite-time analysis of single-timescale AC with deep neural network approximation in continuous state-action spaces. In particular, we consider the challenging time-average reward setting, where one needs to simultaneously control three highly-coupled error terms including the reward error, the critic error, and the actor error. Our novel analysis is able to establish convergence to a stationary point at a rate $\widetilde{\mathcal{O}}(T^{-1/2})$, where $T$ denotes the total number of iterations, thereby providing theoretical grounding for widely used deep AC methods. We substantiate these theoretical guarantees with experiments that confirm the proven convergence rate and further demonstrate strong performance on MuJoCo benchmarks.
Successful Page Load