Poster
in
Workshop: World Models: Understanding, Modelling and Scaling

Text2World: Benchmarking World Modeling Capabilities of Large Language Models via Program Synthesis

Mengkang Hu · Tianxing Chen · Yude Zou · Yuheng Lei · Qiguang Chen · Ming Li · Qiwei Liang · Yao Mu · Hongyuan Zhang · Wenqi Shao · Ping Luo

Keywords: World Model Large Language Model

Project Page [ OpenReview]

Abstract

Recently, there is a growing interest in leveraging pre-trained large language models (LLMs) to generate symbolic world models for planning tasks.Despite extensive exploration, prior studies on LLMs as world models face significant challenges, including evaluation uncertainty, reliance on indirect metrics, and limited domain scope, hindering a comprehensive understanding of their effectiveness in complex environments.To address the limitations in prior work, we introduce a novel benchmark, Text2World, based on planning domain definition language (PDDL), featuring hundreds of diverse domains with natural language descriptions, and employ multi-criteria, execution-based metrics for a more robust evaluation of world modeling capabilities, ensuring a comprehensive and high-quality assessment.Furthermore, we conduct a thorough evaluation based on Text2World to benchmark the current LLMs, and based on the experimental results, we identify several significant findings.We hope that Text2World can serve as a crucial resource, laying the groundwork for future research in leveraging LLMs as world models.

Video

Chat is not available.