Workshop
Generative Models for Robot Learning
Ziwei Wang · Congyue Deng · Changliu Liu · Zhenyu Jiang · Haoran Geng · Huazhe Xu · Yansong Tang · Philip Torr · Ziwei Liu · Angelique Taylor · Yuke Zhu
Next generation of robots should combine ideas from other fields such as computer vision, natural language processing, machine learning and many others, because the close-loop system is required to deal with complex tasks based on multimodal input in the complicated real environment. This workshop proposal focuses on generative models for robot learning, which lies in the important and fundamental field of AI and robotics. Learning-based methods in robotics have achieved high success rate and generalization ability in a wide variety of tasks such as manipulation, navigation, SLAM, scene reconstruction, proprioception, and physics modeling. However, robot learning faces several challenges including the expensive cost of data collection and weak transferability across different tasks and scenarios. Inspired by the significant progress in computer vision and natural language processing, efforts have been made to combine generative models with robot learning to address the above challenges such as synthesizing high-quality data, and incorporating generation frameworks into representation and policy learning. Besides, pre-trained large language models (LLMs), vision-language models (VLMs) and vision-language-action (VLA) models are adapted to various downstream tasks to fully leverage the rich commonsense knowledge. This progressive development enables robot learning frameworks to be applied in complex and diverse real-world tasks. This workshop aims to enable interdisciplinary communication for researchers in the broader community, so that more attention can be drawn to this field. In this workshop, the state-of-the-art process and promising future directions will be discussed, which will inspire new ideas and fantastic applications in related fields.
Schedule
|
Sun 6:00 p.m. - 6:05 p.m.
|
Opening Remarks and Welcome
SlidesLive Video |
🔗 |
|
Sun 6:05 p.m. - 6:45 p.m.
|
Xiaojuan Qi
(
Invited Talk
)
>
SlidesLive Video |
🔗 |
|
Sun 6:45 p.m. - 7:25 p.m.
|
Sergey Levine
(
Invited Talk
)
>
SlidesLive Video |
🔗 |
|
Sun 7:25 p.m. - 7:40 p.m.
|
Coffee Break
(
Coffee Break
)
>
|
🔗 |
|
Sun 7:40 p.m. - 8:20 p.m.
|
Shuran Song
(
Invited Talk
)
>
SlidesLive Video |
🔗 |
|
Sun 8:20 p.m. - 8:30 p.m.
|
Policy Agnostic RL: Offline RL and Online RL Fine-Tuning of Any Class and Backbone
(
oral session
)
>
SlidesLive Video |
🔗 |
|
Sun 8:30 p.m. - 8:40 p.m.
|
Latent Action Pretraining from Videos
SlidesLive Video |
🔗 |
|
Sun 8:40 p.m. - 8:50 p.m.
|
TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist Robotic Policies
SlidesLive Video |
🔗 |
|
Sun 8:50 p.m. - 9:20 p.m.
|
Lunch Break
(
Lunch Break
)
>
|
🔗 |
|
Sun 9:20 p.m. - 10:20 p.m.
|
Poster Session
|
🔗 |
|
Sun 10:20 p.m. - 11:00 p.m.
|
Daquan Zhou
(
Invited Talk
)
>
SlidesLive Video |
🔗 |
|
Sun 11:00 p.m. - 11:40 p.m.
|
Yilun Du
(
Invited Talk
)
>
SlidesLive Video |
🔗 |
|
Sun 11:40 p.m. - 12:20 a.m.
|
Qi Dou
(
Invited TalK
)
>
SlidesLive Video |
🔗 |
|
Mon 12:20 a.m. - 12:25 a.m.
|
Closing Remarks
(
Closing Remarks
)
>
SlidesLive Video |
🔗 |
|
-
|
RL Zero: Zero-Shot Language to Behaviors Without Any Supervision
(
Poster
)
>
|
Harshit Sikchi · Siddhant Agarwal · Pranaya Jajoo · Samyak Parajuli · Caleb Chuck · Max Rudolph · Peter Stone · Amy Zhang · Scott Niekum 🔗 |
|
-
|
Learning from Massive Human Videos for Universal Humanoid Pose Control
(
Poster
)
>
|
Jiageng Mao · Siheng Zhao · Siqi Song · Tianheng Shi · Junjie Ye · Mingtong Zhang · Haoran Geng · Jitendra Malik · Vitor Campagnolo Guizilini · Yue Wang 🔗 |
|
-
|
PEAR: Primitive Enabled Adaptive Relabeling for Boosting Hierarchical Reinforcement Learning
(
Poster
)
>
|
Utsav Singh · Vinay Purushothaman Namboodiri 🔗 |
|
-
|
Modality-Composable Diffusion Policy via Inference-Time Distribution-level Composition
(
Poster
)
>
|
Jiahang Cao · Qiang Zhang · Hanzhong Guo · Jiaxu Wang · Hao Cheng · Renjing Xu 🔗 |
|
-
|
SAM2Act: Integrating Visual Foundation Model with A Memory Architecture for Robotic Manipulation
(
Poster
)
>
|
Haoquan Fang · Markus Grotz · Wilbert Pumacay · Yi Ru Wang · Dieter Fox · Ranjay Krishna · Jiafei Duan 🔗 |
|
-
|
Learning Novel Skills from Language-Generated Demonstrations
(
Poster
)
>
SlidesLive Video |
11 presentersAo-Qun Jin · Tian-Yu Xiang · Xiao-Hu Zhou · Mei-Jiang Gui · Xiao-Liang Xie · Shi-Qi Liu · Shuang-Yi Wang · Yue Cao · Sheng-Bin Duan · Fu-Chao Xie · Zeng-Guang Hou |
|
-
|
VLABench: A Large-Scale Benchmark for Language-Conditioned Robotics Manipulation with Long-Horizon Reasoning Tasks ( Poster ) > link |
11 presentersShiduo Zhang · Zhe Xu · Peiju Liu · Xiaopeng Yu · Yuan Li · Qinghui Gao · Zhaoye Fei · Zhangyue Yin · Zuxuan Wu · Yugang Jiang · Xipeng Qiu |
|
-
|
Solving New Tasks by Adapting Internet Video Knowledge
(
Poster
)
>
|
Calvin Luo · Zilai Zeng · Yilun Du · Chen Sun 🔗 |
|
-
|
AVID: Adapting Video Diffusion Models to World Models
(
Poster
)
>
SlidesLive Video |
Marc Rigter · Tarun Gupta · Agrin Hilmkil · Chao Ma 🔗 |
|
-
|
Policy Agnostic RL: Offline RL and Online RL Fine-Tuning of Any Class and Backbone
(
Poster
)
>
|
Max Sobol Mark · Tian Gao · Georgia Gabriela Sampaio · Mohan Kumar Srirama · Archit Sharma · Chelsea Finn · Aviral Kumar 🔗 |
|
-
|
Diffusion Model Predictive Control
(
Poster
)
>
|
Stannis (Guangyao) Zhou · Sivaramakrishnan Swaminathan · Rajkumar Vasudeva Raju · J. Swaroop Guntupalli · Wolfgang Lehrach · Joseph Ortiz · Antoine Dedieu · Miguel Lázaro-Gredilla · Kevin Murphy 🔗 |
|
-
|
FP3: A 3D Foundation Policy for Robotic Manipulation
(
Poster
)
>
|
Rujia Yang · Geng Chen · Chuan Wen · Yang Gao 🔗 |
|
-
|
ET-Plan-Bench: Embodied Task-level Planning Benchmark Towards Spatial-Temporal Cognition with Foundation Models
(
Poster
)
>
SlidesLive Video |
14 presentersLingfeng Zhang · Yuening Wang · Hongjian Gu · Atia Hamidizadeh · Zhanguang Zhang · Yuecheng Liu · Yutong Wang · David Gamaliel Arcos Bravo · Junyi Dong · Shunbo Zhou · Tongtong Cao · Yuzheng Zhuang · Yingxue Zhang · Jianye Hao |
|
-
|
Sampling from Energy-based Policies using Diffusion
(
Poster
)
>
|
Vineet Jain · Tara Akhound-Sadegh · Siamak Ravanbakhsh 🔗 |
|
-
|
Offline Learning of Controllable Diverse Behaviors
(
Poster
)
>
|
Mathieu Petitbois · Rémy Portelas · sylvain lamprier · Ludovic Denoyer 🔗 |
|
-
|
Latent Action Pretraining from Videos
(
Poster
)
>
|
16 presentersSeonghyeon Ye · Joel Jang · Byeongguk Jeon · Sejune Joo · Jianwei Yang · Baolin Peng · Ajay Mandlekar · Reuben Tan · Yu-Wei Chao · Bill Yuchen Lin · Lars Liden · Kimin Lee · Jianfeng Gao · Luke Zettlemoyer · Dieter Fox · Minjoon Seo |
|
-
|
Overcoming Slow Decision Frequencies in Continuous Control: Model-Based Sequence Reinforcement Learning for Model-Free Control
(
Poster
)
>
|
Devdhar Patel · Hava Siegelmann 🔗 |
|
-
|
DexTrack: Towards Generalizable Neural Tracking Control for Dexterous Manipulation from Human References
(
Poster
)
>
SlidesLive Video |
Xueyi Liu · Jianibieke Adalibieke · Qianwei Han · Yuzhe Qin · Li Yi 🔗 |
|
-
|
Bidirectional Decoding: Improving Action Chunking via Closed-Loop Resampling
(
Poster
)
>
|
Yuejiang Liu · Jubayer Hamid · Yoonho Lee · Annie Xie · Max Du · Chelsea Finn 🔗 |
|
-
|
TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist Robotic Policies
(
Poster
)
>
|
Ruijie Zheng · Yongyuan Liang · Shuaiyi Huang · Jianfeng Gao · Hal Daume · Andrey Kolobov · Furong Huang · Jianwei Yang 🔗 |
|
-
|
EQM-MPD EQUIVARIANT ON-MANIFOLD MOTION PLANNING DIFFUSION
(
Poster
)
>
SlidesLive Video |
Evangelos Chatzipantazis · Nishanth Arun Rao · Kostas Daniilidis 🔗 |
|
-
|
Responsive Noise-Relaying Diffusion Policy: Responsive and Efficient Visuomotor Control
(
Poster
)
>
|
Zhuoqun Chen · Xiu Yuan · Tongzhou Mu · Hao Su 🔗 |
|
-
|
DemoGen: Synthetic Demonstration Generation for Data-Efficient Visuomotor Policy Learning
(
Poster
)
>
|
Zhengrong Xue · Shuying Deng · Zhenyang Chen · Yixuan Wang · Zhecheng Yuan · Huazhe Xu 🔗 |
|
-
|
Generative Quality Diversity Imitation Learning for Robot Skill Acquisition
(
Poster
)
>
|
Zhenglin Wan · Xingrui Yu · David Bossens · Yueming Lyu · Qing Guo · Flint Xiaofeng Fan · Ivor Tsang 🔗 |
|
-
|
Stem-OB: Generalizable Visual Imitation Learning with Stem-Like Convergent Observation through Diffusion Inversion
(
Poster
)
>
|
Kaizhe Hu · Zihang Rui · Yao He · Yuyao Liu · Pu Hua · Huazhe Xu 🔗 |
|
-
|
Environment as Policy: Generative Curriculum for Autonomous Racing
(
Poster
)
>
SlidesLive Video |
Jiaxu Xing · Hongze Wang · Nico Messikommer · Davide Scaramuzza 🔗 |
|
-
|
Contrastive Initial State Buffer for Reinforcement Learning
(
Poster
)
>
SlidesLive Video |
Nico Messikommer · Yunlong Song · Davide Scaramuzza 🔗 |