ICLR Poster DMBP: Diffusion model-based predictor for robust offline reinforcement learning against state observation perturbations

Poster

DMBP: Diffusion model-based predictor for robust offline reinforcement learning against state observation perturbations

Zhihe Yang · Yunjian Xu

Halle B #190

[ Abstract ] [ Project Page ]

Wed 8 May 1:45 a.m. PDT — 3:45 a.m. PDT

Abstract:

Offline reinforcement learning (RL), which aims to fully explore offline datasets for training without interaction with environments, has attracted growing recent attention. A major challenge for the real-world application of offline RL stems from the robustness against state observation perturbations, e.g., as a result of sensor errors or adversarial attacks. Unlike online robust RL, agents cannot be adversarially trained in the offline setting. In this work, we propose Diffusion Model-Based Predictor (DMBP) in a new framework that recovers the actual states with conditional diffusion models for state-based RL tasks. To mitigate the error accumulation issue in model-based estimation resulting from the classical training of conventional diffusion models, we propose a non-Markovian training objective to minimize the sum entropy of denoised states in RL trajectory. Experiments on standard benchmark problems demonstrate that DMBP can significantly enhance the robustness of existing offline RL algorithms against different scales of ran- dom noises and adversarial attacks on state observations. Further, the proposed framework can effectively deal with incomplete state observations with random combinations of multiple unobserved dimensions in the test. Our implementation is available at https://github.com/zhyang2226/DMBP.

Live content is unavailable. Log in and register to view live content