Learning to Simulate Robot Worlds

Robotics is entering an era where high-fidelity simulation and data-driven controllable world models are converging to revolutionize learning for manipulation and navigation. Traditionally, physics-based simulators and digital twins have been the go-to approach for training and evaluating robot policies. However, constructing realistic simulations with engines like Unity or Unreal is labor-intensive and often fails to capture the full complexity of the real world.

Recent advances in large-scale pretrained models have enabled the generation of videos, 3D assets, and even entire simulators. At the same time, compact world models learned from scratch can capture task-specific dynamics tailored to a particular robot. Together, these approaches show that simulations no longer need to be manually constructed, they can be learned directly from data.

This workshop aims to bridge the spectrum from physics-grounded simulation, photorealistic digital twins, AI-controlled simulators, to fully learned neural world models. We will explore how cutting-edge simulation technologies, such as differentiable physics engines and realistic rendering, can integrate with or evolve into learned models of the world.

The goal is to bring together researchers from traditionally separate communities: those focused on high-fidelity simulators and those developing learned implicit models. Together, we will discuss how to combine the strengths of both approaches to advance robot learning.

We are sourcing two different types of papers: four-page papers and one-page abstracts, focusing on learning to simulate robot worlds and their applications in robotics. Read the submission page for more details.

Topics

Real-to-Sim and Sim-to-Real Learning
Photorealistic Differentiable Simulation for Robotics (e.g. NeRFs, 3D Gaussian Splatting)
Learnable Physics Simulations and System Identification
Reconstruction and generation of articulated and deformable objects
Policy training and evaluation with learned world models, realistic simulations and digital twins
End-to-end world models learning
Explicit vs. implicit world models
Compositional world models and Controllable video generation models

Important Dates and Links

Submission site opens	25.07.2025
Submission deadline (4-page submissions & 1-page abstracts)	~~18.08.2025~~ 25.08.2025
Decisions announced	05.09.2025
Camera-ready due	15.09.2025

Speakers and Panelists

Animesh Garg

Georgia Tech

Short Bio: Animesh Garg is the Stephen Fleming Early Career Professor in Computer Science at Georgia Tech, within the School of Interactive Computing, and is affiliated with the Robotics and Machine Learning programs. He holds courtesy appointments at the University of Toronto and the Vector Institute. Previously, he held research leadership positions at Nvidia and Apptronik. His research focuses on the algorithmic foundations of generalizable autonomy, enabling robots to acquire cognitive and dexterous skills and collaborate with humans in novel environments. His group explores structured inductive biases, causality in decision-making, multimodal object-centric representations, self-supervised learning for control, and efficient dexterous skill acquisition.

Daniel Ho

1X Technologies

Short Bio: Daniel Ho is the Director of Evaluation at 1X Technologies. His goal is to deploy generalist machines that grow from experience and correct their own mistakes. He's building World Models and large-scale evaluation pipelines towards this mission. Previously, he worked on robotics, perception, and machine learning as a Senior Software Engineer at Waymo and Everyday Robots (X, Google[X]). His research has focused on learning algorithms and representation learning to generalize ML model understanding in robotics, computer vision, and self-driving.

Talk Title: 1X World Model: Solving humanoid policy training and evaluation with data synthesis and action control

Hao Su

UC San Diego

Short Bio: Hao Su is an Associate Professor of Computer Science at UC San Diego and Founder & CTO of Hillbot, a robotics startup. He directs the Embodied Intelligence Lab and is a founding member of the Halıcıoğlu Data Science Institute. His research spans computer vision, machine learning, graphics, and robotics, focusing on algorithms to simulate and interact with the physical world. He holds Ph.D.s in Computer Science from Stanford and Mathematics from Beihang University. He helped develop datasets like ImageNet, ShapeNet, and tools like PointNet. Su is Program Chair of CVPR 2025 and has received NSF CAREER and SIGGRAPH awards.

Talk Title:Learning World Models for Embodied AI

Katerina Fragkiadaki

Carnegie Mellon University

Short Bio: Katerina Fragkiadaki is the JPMorgan Chase Associate Professor in Machine Learning at Carnegie Mellon University. She earned her B.S. from the National Technical University of Athens and her Ph.D. from the University of Pennsylvania, followed by postdoctoral work at UC Berkeley and Google Research. Her research combines common sense reasoning with deep visuomotor learning to enable few-shot and continual learning for perception, action, and language grounding. Her group develops methods in 2D-to-3D perception, vision-language grounding, and navigation policies. She received awards including the NSF CAREER and DARPA Young Investigator Awards and is Program Chair for ICLR 2024.

Talk Title: From Videos to Physics Engine Simulations to Neural Simulations

Yilun Du

Harvard University

Short Bio: Yilun Du is an Assistant Professor at Harvard’s Kempner Institute and Computer Science Department, and a Senior Research Scientist at Google DeepMind. He earned his Ph.D. in EECS from MIT, advised by Leslie Kaelbling, Tomas Lozano-Perez, and Joshua Tenenbaum. He holds a bachelor’s from MIT and has been a research fellow at OpenAI and a visiting researcher at FAIR and DeepMind. A gold medalist at the International Biology Olympiad, his research focuses on generative models, decision making, robot learning, and embodied agents. He develops energy-based models enabling generalization and advances in diffusion models, scene understanding, and trajectory planning.

Talk Title: Learning Generative World Simulators

Nicholas Pfaff

MIT

Short Bio: Nicholas Pfaff is a third-year PhD student in the Robot Locomotion Group at MIT, advised by Prof. Russ Tedrake. His research focuses on scaling realistic simulations to generate training data for robotic manipulation foundation models, combining model-based and learning-based approaches. Prior to MIT, he worked at Ocado Technology on large-scale pick-and-place systems, and he is a recipient of an MIT Graduate Teaching Award.

Talk Title: Scaling Robot Simulation: Automating Physically Accurate and Diverse Worlds

Boyi Li

NVIDIA & UC Berkeley

Short Bio: Boyi Li is a Research Staff at NVIDIA Research and a Research Fellow at UC Berkeley. She earned her Ph.D. in EECS from Cornell University & Cornell Tech. Her research focuses on advancing multimodal embodied intelligence, developing generalizable algorithms, and creating interactive intelligent systems. Central to this work is reasoning, language models, generative models, and robotics. A key aspect involves aligning representations from diverse multimodal data, including 2D pixels, 3D geometry, language, and audio.

Talk Title:Learning to Simulate Multimodal Robot Worlds