MWM: Mobile World Models for Action-Conditioned Consistent Prediction


Han Yan* Zishang Xiang* Zeyu Zhang*† Hao Tang
School of Computer Science, Peking University
*Equal contribution. Project lead. Corresponding author.
Paper Code Model

Real World Results


Exo View
Ego View
Prediction

Benchmark Results


Prediction
GT
Prediction
GT
Prediction
GT
Prediction
GT

Methods


Overview of the Two-stage training pipeline for MWM. Our training paradigm first performs structure pretraining to learn fine-grained geometry and illumination-dependent appearance, then applies ACC post-training to mitigate compounding error while freezing the CDiT backbone and updating only AdaLN. Within post-training, we introduce ICSD to enable distillation that preserves the consistency objective, while aligning truncated training-time estimates with the inference-time endpoint.

Robot Setup


Real-world deployment setup on the AIRBOT Mobile Manipulation Kit 2 (MMK2). (a) Hardware platform. (b) Deployment process.

Bibtex

@article{yan2026mwm,
  title={MWM: Mobile World Models for Action-Conditioned Consistent Prediction},
  author={Yan, Han and Xiang, Zishang and Zhang, Zeyu and Tang, Hao},
  journal={arXiv preprint arXiv:2603.07799},
  year={2026}
}