Skip to yearly menu bar Skip to main content


Virtual presentation / poster accept

Memory Gym: Partially Observable Challenges to Memory-Based Agents

Marco Pleines · Matthias Pallasch · Frank Zimmer · Mike Preuss

Keywords: [ Reinforcement Learning ] [ memory ] [ Proximal Policy Optimization ] [ Gated Recurrent Unit ] [ HELM ] [ deep reinforcement learning ] [ benchmark ]


Abstract:

Memory Gym is a novel benchmark for challenging Deep Reinforcement Learning agents to memorize events across long sequences, be robust to noise, and generalize. It consists of the partially observable 2D and discrete control environments Mortar Mayhem, Mystery Path, and Searing Spotlights. These environments are believed to be unsolvable by memory-less agents because they feature strong dependencies on memory and frequent agent-memory interactions. Empirical results based on Proximal Policy Optimization (PPO) and Gated Recurrent Unit (GRU) underline the strong memory dependency of the contributed environments. The hardness of these environments can be smoothly scaled, while different levels of difficulty (some of them unsolved yet) emerge for Mortar Mayhem and Mystery Path. Surprisingly, Searing Spotlights poses a tremendous challenge to GRU-PPO, which remains an open puzzle. Even though therandomly moving spotlights reveal parts of the environment’s ground truth, environmental ablations hint that these pose a severe perturbation to agents that leverage recurrent model architectures as their memory. Source Code: https://github.com/MarcoMeter/drl-memory-gym/

Chat is not available.