Robotics is entering an era where high-fidelity simulation and data-driven controllable world models are converging to revolutionize learning for manipulation and navigation. Traditionally, physics-based simulators and digital twins have been the go-to approach for training and evaluating robot policies. However, constructing realistic simulations with engines like Unity or Unreal is labor-intensive and often fails to capture the full complexity of the real world.

Recent advances in large-scale pretrained models have enabled the generation of videos, 3D assets, and even entire simulators. At the same time, compact world models learned from scratch can capture task-specific dynamics tailored to a particular robot. Together, these approaches show that simulations no longer need to be manually constructed, they can be learned directly from data.

This workshop aims to bridge the spectrum from physics-grounded simulation, photorealistic digital twins, AI-controlled simulators, to fully learned neural world models. We will explore how cutting-edge simulation technologies, such as differentiable physics engines and realistic rendering, can integrate with or evolve into learned models of the world.

The goal is to bring together researchers from traditionally separate communities: those focused on high-fidelity simulators and those developing learned implicit models. Together, we will discuss how to combine the strengths of both approaches to advance robot learning.

We are sourcing two different types of papers: four-page papers and one-page abstracts, focusing on learning to simulate robot worlds and their applications in robotics. Read the submission page for more details.

Topics

  • Real-to-Sim and Sim-to-Real Learning
  • Photorealistic Differentiable Simulation for Robotics (e.g. NeRFs, 3D Gaussian Splatting)
  • Learnable Physics Simulations and System Identification
  • Reconstruction and generation of articulated and deformable objects
  • Policy training and evaluation with learned world models, realistic simulations and digital twins
  • End-to-end world models learning
  • Explicit vs. implicit world models
  • Compositional world models and Controllable video generation models

Important Dates and Links

Submission site opens 25.07.2025
Submission deadline (4-page submissions & 1-page abstracts) 18.08.2025 25.08.2025
Decisions announced 05.09.2025
Camera-ready due 10.09.2025

Speakers and Panelists (TBD)

Animesh Garg

Animesh Garg

Georgia Tech

Short Bio: Animesh Garg is the Stephen Fleming Early Career Professor in Computer Science at Georgia Tech, within the School of Interactive Computing, and is affiliated with the Robotics and Machine Learning programs. He holds courtesy appointments at the University of Toronto and the Vector Institute. Previously, he held research leadership positions at Nvidia and Apptronik. His research focuses on the algorithmic foundations of generalizable autonomy, enabling robots to acquire cognitive and dexterous skills and collaborate with humans in novel environments. His group explores structured inductive biases, causality in decision-making, multimodal object-centric representations, self-supervised learning for control, and efficient dexterous skill acquisition.


Daniel Ho

1X Technologies

Short Bio: Daniel Ho is the Director of Evaluation at 1X Technologies. His goal is to deploy generalist machines that grow from experience and correct their own mistakes. He's building World Models and large-scale evaluation pipelines towards this mission. Previously, he worked on robotics, perception, and machine learning as a Senior Software Engineer at Waymo and Everyday Robots (X, Google[X]). His research has focused on learning algorithms and representation learning to generalize ML model understanding in robotics, computer vision, and self-driving.

Talk Title: 1X World Model: Solving humanoid policy training and evaluation with data synthesis and action control


Hao Su

UC San Diego

Short Bio: Hao Su is an Associate Professor of Computer Science at UC San Diego and Founder & CTO of Hillbot, a robotics startup. He directs the Embodied Intelligence Lab and is a founding member of the Halıcıoğlu Data Science Institute. His research spans computer vision, machine learning, graphics, and robotics, focusing on algorithms to simulate and interact with the physical world. He holds Ph.D.s in Computer Science from Stanford and Mathematics from Beihang University. He helped develop datasets like ImageNet, ShapeNet, and tools like PointNet. Su is Program Chair of CVPR 2025 and has received NSF CAREER and SIGGRAPH awards.

Talk Title:Learning World Models for Embodied AI


Katerina Fragkiadaki

Carnegie Mellon University

Short Bio: Katerina Fragkiadaki is the JPMorgan Chase Associate Professor in Machine Learning at Carnegie Mellon University. She earned her B.S. from the National Technical University of Athens and her Ph.D. from the University of Pennsylvania, followed by postdoctoral work at UC Berkeley and Google Research. Her research combines common sense reasoning with deep visuomotor learning to enable few-shot and continual learning for perception, action, and language grounding. Her group develops methods in 2D-to-3D perception, vision-language grounding, and navigation policies. She received awards including the NSF CAREER and DARPA Young Investigator Awards and is Program Chair for ICLR 2024.

Talk Title: From Videos to Physics Engine Simulations to Neural Simulations


Yilun Du

Harvard University

Short Bio: Yilun Du is an Assistant Professor at Harvard’s Kempner Institute and Computer Science Department, and a Senior Research Scientist at Google DeepMind. He earned his Ph.D. in EECS from MIT, advised by Leslie Kaelbling, Tomas Lozano-Perez, and Joshua Tenenbaum. He holds a bachelor’s from MIT and has been a research fellow at OpenAI and a visiting researcher at FAIR and DeepMind. A gold medalist at the International Biology Olympiad, his research focuses on generative models, decision making, robot learning, and embodied agents. He develops energy-based models enabling generalization and advances in diffusion models, scene understanding, and trajectory planning.

Talk Title: Learning Generative World Simulators


Russ Tedrake

TRI and MIT

Short Bio: Dr. Russ Tedrake is Senior Vice President of Large Behavior Models at Toyota Research Institute (TRI) and Toyota Professor at the Massachusetts Institute of Technology (MIT). The Large Behavior Models division is building foundation models for dexterous robotic manipulation and includes world-class researchers who focus on machine learning (especially computer vision and large multimodal models), dynamics and simulation, human-robot interaction, and hardware and software for dexterous manipulation.

Talk Title: TBD


Boyi Li

NVIDIA

Short Bio: Boyi Li is a Research Scientist in the Autonomous Vehicle Research group at NVIDIA. She is broadly interested in the computer Vision, machine Learning and multimedia Art. Particularly, her research focuses on multimodal and data-efficient machine learning. Her research vision is to enable interactive, user-friendly and reliable autonomy for a broad range of high-integrity robotics applications. Prior to joining NVIDIA, received her Ph.D. at Cornell University, advised by Prof. Serge Belongie and Prof. Kilian Q. Weinberger.

Panelist

Organizers

Andrii Zadaianchuk

University of Amsterdam

Christian Gumbsch

University of Amsterdam

Leonardo Barcellona

University of Amsterdam

Alberta Longhini

KTH Royal Institute of Technology

Katherine Liu

Toyota Research Institute

Sergey Zakharov

Toyota Research Institute

Fabien Despinoy

Toyota Motor Europe

Rahaf Aljundi

Toyota Motor Europe

Rares Ambrus

Toyota Research Institute

Yunzhu Li

Columbia University

Efstratios Gavves

University of Amsterdam

For questions / comments, reach out to: learning-to-simulate-robot-worlds@googlegroups.com

Website template adapted from the OSC/ORLR workshops, originally based on the template of the BAICS workshop.