Yes, Q-learning Helps Offline In-Context RL
Abstract
In this preliminary work, we explore the integration of Reinforcement Learning (RL) approaches within a scalable offline In-Context RL (ICRL) framework. To the best of our knowledge, this is the first study to explicitly optimize the RL objective in an offline ICRL setting using a scalable Transformer architecture. Through experiments across 96 datasets derived from GridWorld-based environments, we demonstrate that optimizing RL objectives improves performance by approximately 30\% on average compared to the powerful Algorithm Distillation (AD) baseline. Our results reveal that RL-based methods, particularly those from the offline RL family, outperform approaches such as DQN, which is not specifically designed for offline scenarios, across various dataset coverages, expertise levels, and environmental complexities. These findings underscore the importance of aligning the learning objectives with RL’s reward-maximization goal and suggest promising directions for applying offline RL in ICRL settings.