Workshop
World Models: Understanding, Modelling and Scaling
Mengyue Yang · Haoxuan Li · Firas Laakom · Xidong Feng · Jiaxin Shi · Zhu Li · Guohao Li · Francesco Faccio · Jürgen Schmidhuber
Our workshop covers the widest range of topics related to World Models, including understanding, modelling, and closely aligning with cutting-edge generative AI and broader applications such as robotics and embodied AI. We are glad to announce that nine confirmed top-tier researchers including the founder of world models have confirmed to attend in person as speakers and panelists. The workshop widely targets AI researchers, industry professionals, and students interested in World Models, generative AI, reinforcement learning and related applications. Participants should have a basic understanding of generative models and reinforcement learning concepts. Familiarity with recent advancements in both fields will be beneficial but not mandatory. We also welcome submissions from researchers in the natural sciences (e.g., physics,chemistry, biology) and social sciences (e.g., pedagogy, sociology) to offer attendees a more comprehensiveperspective. In summary, our topics of interest mainly include, but are not limited to:- Understanding World Rules;- World model training and evaluation;- Scaling World Models across language, vision, and control;- World Models in general domains.For the contributed paper sessions, regarding the recent surge in publications in related areas and the success of similar workshops, we project over 250 paper submissions and over 1,500 participants.
Schedule
|
Sun 5:30 p.m. - 5:45 p.m.
|
Poster Session I
(
Poster
)
>
|
🔗 |
|
Sun 5:45 p.m. - 6:00 p.m.
|
Opening Remarks
(
Intro
)
>
SlidesLive Video |
🔗 |
|
Sun 6:00 p.m. - 6:30 p.m.
|
Keynote #1: Xiaolong Wang (UCSD) & Nicklas Hansen (UCSD)
(
Keynote
)
>
SlidesLive Video |
Xiaolong Wang · Nick Hansen 🔗 |
|
Sun 6:30 p.m. - 7:00 p.m.
|
Keynote #2 Chelsea Finn (Stanford & Physical Intelligent)
(
Keynote
)
>
SlidesLive Video |
Chelsea Finn 🔗 |
|
Sun 7:15 p.m. - 7:30 p.m.
|
Industru Demo # Wayve
(
Demo
)
>
SlidesLive Video |
🔗 |
|
Sun 7:30 p.m. - 8:00 p.m.
|
Keynote #3 Tim Rocktäschel (UCL & Google DeepMind) & Jack Parker-Holder (Google DeepMind)
(
Keynote
)
>
SlidesLive Video |
Tim Rocktaeschel 🔗 |
|
Sun 8:00 p.m. - 8:30 p.m.
|
Keynote #4 Furong Huang (University of Maryland)
(
Keynote
)
>
SlidesLive Video |
Furong Huang 🔗 |
|
Sun 8:30 p.m. - 9:00 p.m.
|
Keynote #5 Jeff Clune (UBC)
(
Keynote
)
>
SlidesLive Video |
Jeff Clune 🔗 |
|
Sun 9:00 p.m. - 10:00 p.m.
|
Poster Session II and Lunch Break
(
Poster
)
>
|
🔗 |
|
Sun 10:00 p.m. - 10:30 p.m.
|
Keynote # 6 Hong Zhou
(
Keynote
)
>
SlidesLive Video |
🔗 |
|
Sun 10:30 p.m. - 11:00 p.m.
|
Keynote # 7 Stefano Ermon (Stanford University)
(
Keynote
)
>
SlidesLive Video |
Stefano Ermon 🔗 |
|
Sun 11:00 p.m. - 11:50 p.m.
|
Panel - Current Development and Future Challenge of World Models
(
Discussion Panel
)
>
SlidesLive Video |
Tim Rocktaeschel · Jürgen Schmidhuber · Jeff Clune · Stefano Ermon · Kun Zhang · Furong Huang · David Ha · Elahe Arani 🔗 |
|
Sun 11:50 p.m. - 12:30 a.m.
|
Poster Session III and Coffee Break
(
Poster
)
>
|
🔗 |
|
Mon 12:30 a.m. - 12:40 a.m.
|
Oral #1 Improving Transformer World Models for Data-Efficient RL
(
Oral
)
>
SlidesLive Video |
Joseph Ortiz 🔗 |
|
Mon 12:40 a.m. - 12:50 a.m.
|
Oral # 2 From Foresight to Forethought: VLM-IN-THE- LOOP Policy Steering via Latent Aligment
(
Oral
)
>
SlidesLive Video |
Yilin Wu · Michelle Zhao 🔗 |
|
Mon 12:50 a.m. - 1:00 a.m.
|
Oral #3 When do neural networks learn world models?
(
Oral
)
>
SlidesLive Video |
🔗 |
|
Mon 1:00 a.m. - 1:30 a.m.
|
Keynote # 8 Jakob Foerster (Oxford University)
(
Keynote
)
>
SlidesLive Video |
Jakob Foerster 🔗 |
|
Mon 1:30 a.m. - 2:00 a.m.
|
Keynote # 9 Tom Everitt (Google DeepMind)
(
Keynote
)
>
SlidesLive Video |
Tom Everitt 🔗 |
|
Mon 2:00 a.m. - 2:10 a.m.
|
Oral # 4 Scalable Humanoid Whole-Body Control via Differentiable Neural Network Dynamics
(
Oral
)
>
SlidesLive Video |
Yu Lei 🔗 |
|
Mon 2:10 a.m. - 2:20 a.m.
|
Oral #5 Masked Generative Priors Improve World Models Sequence Modelling Capabilities
(
Oral
)
>
SlidesLive Video |
Zarif Ikram 🔗 |
|
Mon 2:20 a.m. - 2:30 a.m.
|
Oral # 6 Temporal Difference Flows
(
Oral
)
>
SlidesLive Video |
🔗 |
|
Mon 2:30 a.m. - 3:00 a.m.
|
Paper Award & Closing Remarks
(
Intro
)
>
SlidesLive Video |
🔗 |
|
-
|
Recurrent world model with tokenized latent states ( Poster ) > link | Guangyao Zhai · Xingyuan Zhang · Nassir Navab 🔗 |
|
-
|
Newton - A Small Benchmark for Interactive Foundation World Models ( Poster ) > link | Spruce Campbell 🔗 |
|
-
|
Effectively Designing 2-Dimensional Sequence Models for Multivariate Time Series ( Poster ) > link | Daniel Cao · Ali Behrouz · Ali Parviz · Mahdi Karami · Michele Santacatterina · Ramin Zabih 🔗 |
|
-
|
Stress-Testing Offline Reward-Free Reinforcement Learning: A Case for Planning with Latent Dynamics Models ( Poster ) > link | Vlad Sobal · Wancong Zhang · Kyunghyun Cho · Randall Balestriero · Tim G. J. Rudner · Yann LeCun 🔗 |
|
-
|
Accelerating Goal-Conditioned RL Algorithms and Research ( Poster ) > link | Michał Bortkiewicz · Władysław Pałucki · Vivek Myers · Tadeusz Dziarmaga · Tomasz Arczewski · Łukasz Kuciński · Benjamin Eysenbach 🔗 |
|
-
|
COMPARATIVE STUDY OF WORLD MODELS, NVAE- BASED HIERARCHICAL MODELS, AND NOISYNET- AUGMENTED MODELS IN CARRACING-V2 ( Poster ) > link | Vidyavarshini Jayashankar · Banafsheh Rekabdar 🔗 |
|
-
|
Utilizing World Models for Adaptively Covariate Acquisition Under Limited Budget for Causal Decision Making Problem ( Poster ) > link | Haocheng Yang 🔗 |
|
-
|
Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models ( Poster ) > link |
11 presentersVictor Weixin Liang · Lili Yu · Liang Luo · Srini Iyer · Ning Dong · Chunting Zhou · Gargi Ghosh · Mike Lewis · Scott Yih · Luke Zettlemoyer · Victoria Lin |
|
-
|
LEARNING FROM LESS: SINDY SURROGATES IN RL ( Poster ) > link | Aniket Dixit · Muhammad Ibrahim Khan · Faizan Ahmed · James Brusey 🔗 |
|
-
|
Masked Generative Priors Improve World Models Sequence Modelling Capabilities ( Poster ) > link | Cristian Meo · Mircea Lică · Zarif Ikram · Akihiro Nakano · Vedant Shah · Aniket Rajiv Didolkar · Dianbo Liu · Anirudh Goyal · Justin Dauwels 🔗 |
|
-
|
Latent Action Learning Requires Supervision in the Presence of Distractors ( Poster ) > link | Alexander Nikulin · Ilya Zisman · Denis Tarasov · Lyubaykin Nikita · Andrei Polubarov · Igor Kiselev · Vladislav Kurenkov 🔗 |
|
-
|
Do Vision-Language Models Have Internal World Models? Towards an Atomic Evaluation ( Poster ) > link |
23 presentersQiyue Gao · Xinyu Pi · Kevin Liu · Junrong Chen · Ruolan Yang · Xinqi Huang · Xinyu Fang · Lu Sun · Gautham Kishore · Bo Ai · Stone Tao · Mengyang Liu · Jiaxi Yang · Chao-Jung Lai · Chuanyang Jin · Jiannan Xiang · Benhao Huang · David Danks · Hao Su · Tianmin Shu · Ziqiao Ma · Lianhui Qin · Zhiting Hu |
|
-
|
BEYOND SINGLE-STEP: MULTI-FRAME ACTION- CONDITIONED VIDEO GENERATION FOR REINFORCE- MENT LEARNING ENVIRONMENTS ( Poster ) > link | Zongyue Li · Sikuan Yan · Yunpu Ma · Yusong Li · Xing Lyu · Matthias Schubert 🔗 |
|
-
|
Trajectory World Models for Heterogeneous Environments ( Poster ) > link | Shaofeng Yin · Jialong Wu · Siqiao Huang · Xingjian Su · Xu He · Jianye HAO · Mingsheng Long 🔗 |
|
-
|
SEAL: SEmantic-Augmented Imitation Learning via Language Model ( Poster ) > link | Chengyang GU · Yuxin Pan · Haotian Bai · Hui Xiong · Yize Chen 🔗 |
|
-
|
Decentralized Transformers with Centralized Aggregation are Sample-Efficient Multi-Agent World Models ( Poster ) > link | Yang Zhang · Chenjia Bai · Bin Zhao · Junchi Yan · Xiu Li · Xuelong Li 🔗 |
|
-
|
FROM FORESIGHT TO FORETHOUGHT: VLM-IN-THE- LOOP POLICY STEERING VIA LATENT ALIGNMENT ( Poster ) > link | Yilin Wu · Ran Tian · Gokul Swamy · Andrea Bajcsy 🔗 |
|
-
|
Accelerating Model-Based Reinforcement Learning with State-Space World Models ( Poster ) > link | Elie Aljalbout · Maria Krinner · Angel Romero · Davide Scaramuzza 🔗 |
|
-
|
Programmatic Video Prediction Using Large Language Models ( Poster ) > link | Hao Tang · Kevin Ellis · Suhas Lohit · Michael Jones · Moitreya Chatterjee 🔗 |
|
-
|
Pre-Trained Video Generative Models as World Simulators ( Poster ) > link | Haoran He · Yang Zhang · Liang Lin · Zhongwen Xu · Ling Pan 🔗 |
|
-
|
Emergent Stack Representations in Modeling Counter Languages Using Transformers ( Poster ) > link | Utkarsh Tiwari · Aviral Gupta · Michael Hahn 🔗 |
|
-
|
Temporal Difference Flows ( Poster ) > link | Jesse Farebrother · Matteo Pirotta · Andrea Tirinzoni · Remi Munos · Alessandro Lazaric · Ahmed Touati 🔗 |
|
-
|
Mixture-of-Mamba: Enhancing Multi-Modal State-Space Models with Modality-Aware Sparsity ( Poster ) > link | Victor Weixin Liang · Junhong Shen · Genghan Zhang · Ning Dong · Luke Zettlemoyer · Lili Yu 🔗 |
|
-
|
Improving World Models using Supervision with Co-Evolving Linear Probes ( Poster ) > link | Andrii Zahorodnii 🔗 |
|
-
|
MS-SSM: A Multi-Scale State Space Model for Enhanced Sequence Modeling ( Poster ) > link | Mahdi Karami · Ali Behrouz · Peilin Zhong · Razvan Pascanu · Vahab Mirrokni 🔗 |
|
-
|
PINT: Physics-Informed Neural Time Series Models with Applications to Long-term Inference on WeatherBench 2m-Temperature Data ( Poster ) > link | Keonvin Park · Jisu Kim · Jaemin Seo 🔗 |
|
-
|
Text2World: Benchmarking World Modeling Capabilities of Large Language Models via Program Synthesis ( Poster ) > link |
11 presentersMengkang Hu · Tianxing Chen · Yude Zou · Yuheng Lei · Qiguang Chen · Ming Li · Qiwei Liang · Yao Mu · Hongyuan Zhang · Wenqi Shao · Ping Luo |
|
-
|
DIALOGUES BETWEEN ADAM AND EVE: EXPLORATION OF UNKNOWN CIVILIZATION LANGUAGE BY LLM ( Poster ) > link | Wang Xu · Fengzhou Wang · Yiquan Wang 🔗 |
|
-
|
Model-based Offline Reinforcement Learning with Lower Expectile Q-Learning ( Poster ) > link | Kwanyoung Park · Youngwoon Lee 🔗 |
|
-
|
RADI: LLMs as World Models for Robotic Action Decomposition and Imagination ( Poster ) > link | Dongqi Zuo · Chuan Zhou · Yandong Guo · Xiao He · Mingming Gong 🔗 |
|
-
|
Combining Unsupervised and Offline RL via World Models ( Poster ) > link | Daniel Khapun · Dan Rosenbaum 🔗 |
|
-
|
Generalist World Model Pre-Training for Efficient Reinforcement Learning ( Poster ) > link | Yi Zhao · Aidan Scannell · Yuxin Hou · Tianyu Cui · Le Chen · Dieter Büchler · Arno Solin · Juho Kannala · Joni Pajarinen 🔗 |
|
-
|
A Virtual Reality-Integrated System for Behavioral Analysis in Neurological Decline ( Poster ) > link | Chen Zhang · Jiaxin Shi · Yanan Sui 🔗 |
|
-
|
Improving Transformer World Models for Data-Efficient RL ( Poster ) > link | Antoine Dedieu · Joseph Ortiz · Xinghua Lou · Carter Wendelken · Wolfgang Lehrach · J. Swaroop Guntupalli · Miguel Lazaro-Gredilla · Kevin Murphy 🔗 |
|
-
|
Object-Centric Latent Action Learning ( Poster ) > link | Albina Klepach · Alexander Nikulin · Ilya Zisman · Denis Tarasov · Alexander Derevyagin · Andrei Polubarov · Lyubaykin Nikita · Vladislav Kurenkov 🔗 |
|
-
|
Memory Helps, but Confabulation Misleads: Understanding Streaming Events in Videos with MLLMs ( Poster ) > link | Gengyuan Zhang · Mingcong Ding · Tong Liu · Yao Zhang · Volker Tresp 🔗 |
|
-
|
HEP-JEPA: A foundation model for collider physics ( Poster ) > link | Jai Bardhan · Radhikesh Agrawal · Abhiram Tilak · Cyrin Neeraj · Subhadip Mitra 🔗 |
|
-
|
Transformers Use Causal World Models in Maze-Solving Tasks ( Poster ) > link | Alexander Spies · William Edwards · Michael Ivanitskiy · Adrians Skapars · Tilman Räuker · Katsumi Inoue · Alessandra Russo · Murray Shanahan 🔗 |
|
-
|
Adapting a World Model for Trajectory Following in a 3D Game ( Poster ) > link |
13 presentersMarko Tot · Shu Ishida · Abdelhak Lemkhenter · David Bignell · Pallavi Choudhury · Chris Lovett · Luis França · Matheus de Mendonça · Tarun Gupta · Darren Gehring · Sam Devlin · Sergio Valcarcel Macua · Raluca Georgescu |
|
-
|
Scaling Laws for Pre-training Agents and World Models ( Poster ) > link | Tim Pearce · Tabish Rashid · David Bignell · Raluca Georgescu · Sam Devlin · Katja Hofmann 🔗 |
|
-
|
BiD: Behavioral Agents in Dynamic Auctions ( Poster ) > link | Weitong Zhang · Chengqi Zang · Mark Schmidt · Richard Blythman 🔗 |
|
-
|
World Modeling Makes a Better Planner: Dual Preference Optimization for Embodied Task Planning ( Poster ) > link | Siyin Wang · Zhaoye Fei · Qinyuan Cheng · Shiduo Zhang · Panpan Cai · Jinlan Fu · Xipeng Qiu 🔗 |
|
-
|
A Proposal for Networks Capable of Continual Learning ( Poster ) > link | Zeki Doruk Erden · Boi Faltings 🔗 |
|
-
|
Scalable Humanoid Whole-Body Control via Differentiable Neural Network Dynamics ( Poster ) > link | Yu Lei · Zhengyi Luo · Tairan He · Jinkun Cao · Guanya Shi · Kris Kitani 🔗 |
|
-
|
Scaling Inference-Time Search with Vision Value Model for Improved Visual Comprehension ( Poster ) > link | Xiyao Wang · Zhengyuan Yang · Linjie Li · Hongjin Lu · Yuancheng Xu · Chung-Ching Lin · Kevin Lin · Furong Huang · Lijuan Wang 🔗 |
|
-
|
TwinMarket: A Scalable Behavioral and Social Simulation for Financial Markets ( Poster ) > link | Yuzhe YANG · Yifei Zhang · Minghao Wu · Kaidi Zhang · Yunmiao Zhang · Honghai Yu · Yan Hu · Wang Benyou 🔗 |
|
-
|
Object-Centric World Model for Language-Guided Manipulation ( Poster ) > link | Youngjoon Jeong · Junha Chun · Soonwoo Cha · Taesup Kim 🔗 |
|
-
|
Reward-free World Models for Online Imitation Learning ( Poster ) > link | Shangzhe Li · Zhiao Huang · Hao Su 🔗 |
|
-
|
HuWo: Building Physical Interaction World Models for Humanoid Robot Locomotion ( Poster ) > link | Han Zheng · Yi Cheng · Hang Liu · Linqi Ye · Houde Liu 🔗 |
|
-
|
Distribution Recovery in Compact Diffusion World Models via Conditioned Frame Interpolation ( Poster ) > link | Sam Gijsen · Kerstin Ritter 🔗 |
|
-
|
Fixed-Point RNNs: From Diagonal to Dense in a Few Iterations ( Poster ) > link | Sajad Movahedi · Felix Sarnthein · Nicola Muca Cirone · Antonio Orvieto 🔗 |
|
-
|
Pushing the Limit of Sample-Efficient Offline Reinforcement Learning ( Poster ) > link | Peng Cheng · Zhihao Wu · Jianxiong Li · Ziteng He · Haoran Xu · Wei Sun · Youfang Lin · Xianyuan Zhan 🔗 |
|
-
|
Unifying Causal and Object-centric Representation Learning allows Causal Composition ( Poster ) > link | Avinash Kori · Ben Glocker · David Ha · Francesco Locatello 🔗 |
|
-
|
Object-Centric Representations Generalize Better Compositionally with Less Compute ( Poster ) > link | Ferdinand Kapl · Amir Mohammad Karimi Mamaghan · Max Horn · Carsten Marr · Stefan Bauer · Andrea Dittadi 🔗 |
|
-
|
Latent Representation Encoding and Multimodal Biomarkers for Post-Stroke Speech Assessment ( Poster ) > link | Giulia Sanguedolce · Dragos-Cristian Gruia · Patrick Naylor · Fatemeh Geranmayeh 🔗 |
|
-
|
Knowledge Graphs as World Models for Material-Aware Obstacle Handling in Autonomous Vehicles ( Poster ) > link | Ayush Bheemaiah · Seungyong Yang 🔗 |
|
-
|
When do neural networks learn world models? ( Poster ) > link | Tianren Zhang · Guanyu Chen · Feng Chen 🔗 |
|
-
|
Reframing LLM Finetuning Through the Lens of Bayesian Optimization ( Poster ) > link | Bojana Ranković · Ryan-Rhys Griffiths · Philippe Schwaller 🔗 |
|
-
|
ACT-Bench: Towards Action Controllable World Models for Autonomous Driving ( Poster ) > link | Hidehisa Arai · Keishi Ishihara · Tsubasa Takahashi · Yu Yamaguchi 🔗 |
|
-
|
ACDiT: Interpolating Autoregressive Conditional Modeling and Diffusion Transformer ( Poster ) > link | Jinyi Hu · Shengding Hu · Yuxuan Song · Yufei Huang · Mingxuan Wang · Hao Zhou · Zhiyuan Liu · Wei-Ying Ma · Maosong Sun 🔗 |
|
-
|
Revisiting the Othello World Model Hypothesis ( Poster ) > link | Yifei Yuan · Anders Søgaard 🔗 |
|
-
|
Reconstructing Dynamics from Steady Spatial Patterns with Partial Observations ( Poster ) > link | Xinyue Luo · Xuzhe Qian · Yu Chen · Huaxiong Huang · Jin Cheng 🔗 |
|
-
|
Generating Symbolic World Models via Test-time Scaling of Large Language Models ( Poster ) > link | Zhouliang Yu · yuhuan yuan · Tim Xiao · Fuxiang Xia · Jie Fu · Ge Zhang · Ge lin · Weiyang Liu 🔗 |