Poster
in
Workshop: World Models: Understanding, Modelling and Scaling

Newton - A Small Benchmark for Interactive Foundation World Models

Spruce Campbell

Keywords: physics benchmark foundation models world models interactive world models evaluation

Project Page [ OpenReview]

Abstract

Foundation world models (FWMs) are an emerging class of generative model that aim to generate realistic, interactive worlds from pre-training on video data. FWMs in particular promise to provide an online, stable environment for training generalist embodied agents. However, contemporary models suffer from several drawbacks, including poor object permanence, and struggle to apply physical principles consistently. Unlike large language models (LLMs) and video models, no benchmarks currently exist to specifically evaluate foundation world models' performance in the context of interactivity. We present Newton, a series of datasets and benchmarks for training and evaluating small interactive FWMs, particularly on long-context memory and physics tasks. Newton-OP includes 5,000 examples of occlusion and camera rotation, aiming to evaluate models' ability to recall objects in 3D space over long time periods. Newton-Physics additionally includes 5,000 examples of interactive rigid body physics, evaluating both action following and physical accuracy. We additionally release code to evaluate models, and demonstrate the performance of common baselines.

Video

Chat is not available.