Poster
in
Workshop: World Models: Understanding, Modelling and Scaling
Combining Unsupervised and Offline RL via World Models
Daniel Khapun · Dan Rosenbaum
Keywords: [ offline RL ] [ fast adatation ] [ unsupervised RL ] [ world models ]
Deep reinforcement learning has proven an effective method to solve many intricate tasks, yet it still struggles in data efficiency and generalization to novel scenarios. Recent approaches to deal with this include (1) unsupervised pretraining of the agent in an environment without reward signals, and (2) training the agent using offline data coming from various possible sources. In this paper we propose to consider both of these approaches together and argue that this results in a more realistic setting where different types of data are available, and fast online adaptation to new tasks is required. Towards this goal we consider the Unsupervised RL Benchmark and show that access to unsupervised data is better used as a source of exploration trajectories rather than for pretraining a policy. Following this observation we develop a method based on training a world-model as smart offline buffer of exploration data. We show that this approach outperforms previous methods in fast adaptation. We then propose to consider a fast adaptation setup that includes access to both unsupervised exploratory data, and offline expert demonstrations when testing the agents' online performance on novel tasks in the environment.