Skip to yearly menu bar Skip to main content


In-Person Poster presentation / poster accept

Can Agents Run Relay Race with Strangers? Generalization of RL to Out-of-Distribution Trajectories

Li-Cheng Lan · Huan Zhang · Cho-Jui Hsieh

MH1-2-3-4 #167

Keywords: [ Reinforcement Learning ] [ reinforcement learning ] [ Genralization ]


Abstract:

In this paper, we evaluate and improve the generalization performance for reinforcement learning (RL) agents on the set of ``controllable'' states, where good policies exist on these states to achieve the goal. An RL agent that generally masters a task should reach its goal starting from any controllable state of the environment instead of memorizing a small set of trajectories. To practically evaluate this type of generalization, we propose relay evaluation, which starts the test agent from the middle of other independently well-trained stranger agents' trajectories. With extensive experimental evaluation, we show the prevalence of generalization failure on controllable states from stranger agents. For example, in the Humanoid environment, we observed that a well-trained Proximal Policy Optimization (PPO) agent, with only 3.9\% failure rate during regular testing, failed on 81.6\% of the states generated by well-trained stranger PPO agents. To improve "relay generalization," we propose a novel method called Self-Trajectory Augmentation (STA), which will reset the environment to the agent's old states according to the Q function during training. After applying STA to the Soft Actor Critic's (SAC) training procedure, we reduced the failure rate of SAC under relay-evaluation by more than three times in most settings without impacting agent performance and increasing the needed number of environment interactions. Our code is available at https://github.com/lan-lc/STA.

Chat is not available.