Skip to yearly menu bar Skip to main content


Poster
in
Workshop: A Roadmap to Never-Ending RL

CoMPS: Continual Meta Policy Search

Glen Berseth · Zhiwei Zhang · Chelsea Finn · Sergey Levine


Abstract:

We develop a new continual meta-learning method to address challenges in sequential multi-task learning. In this setting the goal of the agent is to quickly achieve high reward over an any sequence of tasks. Prior meta-reinforcement learning algorithms have demonstrated promising results in accelerating the acquisition of new tasks. Beyond simply transferring past experience to new tasks, our goal is to devise continual reinforcement learning algorithms that learn to learn, using their experience on previous tasks to learn new tasks more quickly. However, they require access to all tasks during training. We introduce a new method, continual meta-policy search (CoMPS), that removes this limitation by meta-training in incremental fashion, over each task in a sequence, without revisiting prior tasks. CoMPS continuously repeats two subroutines: learning a new task and meta-learning to prepare for subsequent task learning. To solve each new task, CoMPS runs reinforcement learning from its current meta-learned initial parameters. For meta-training, CoMPS performs an entirely offline meta-reinforcement learning procedure over data collected from previous tasks. On several sequences of challenging continuous control tasks, we find that CoMPS outperforms prior continual learning and off-policy meta-reinforcement methods.

Chat is not available.