RoboCasa365: A Large-Scale Simulation Framework for Training and Benchmarking Generalist Robots
Abstract
Recent advances in robot learning have accelerated progress toward generalist robots that can operate across diverse tasks and environments. Yet despite this momentum, it remains difficult to gauge how close we are to this goal, as the field lacks a reproducible, large-scale benchmark for systematic evaluation. To address this gap, we present RoboCasa365, a comprehensive robot simulation benchmark for everyday tasks. Built on the RoboCasa platform, RoboCasa365 introduces 365 everyday tasks across 2,500 diverse kitchen environments, over 600 hours of human demonstration data and over 1600 hours of synthetically generated demonstration data, making it one of the most diverse and large-scale resources for studying generalist policies. We design the benchmark to support evaluation across key settings, including multi-task learning, robot foundation model training, and lifelong learning. We present extensive experiments with state-of-the-art methods and analyze how task diversity, dataset scale, and environment variation shape generalization. Our results provide new insights into what factors most strongly affect the performance of generalist robots and help inform strategies for future progress in the field.