Skip to yearly menu bar Skip to main content


Poster

Hierarchical RL Using an Ensemble of Proprioceptive Periodic Policies

Kenneth Marino · Abhinav Gupta · Rob Fergus · Arthur Szlam

Great Hall BC #70

Abstract:

In this paper we introduce a simple, robust approach to hierarchically training an agent in the setting of sparse reward tasks. The agent is split into a low-level and a high-level policy. The low-level policy only accesses internal, proprioceptive dimensions of the state observation. The low-level policies are trained with a simple reward that encourages changing the values of the non-proprioceptive dimensions. Furthermore, it is induced to be periodic with the use a ``phase function.'' The high-level policy is trained using a sparse, task-dependent reward, and operates by choosing which of the low-level policies to run at any given time. Using this approach, we solve difficult maze and navigation tasks with sparse rewards using the Mujoco Ant and Humanoid agents and show improvement over recent hierarchical methods.

Live content is unavailable. Log in and register to view live content