MWM: Mobile World Models for Action-Conditioned Consistent Prediction

MWM: Mobile World Models for Action-Conditioned Consistent Prediction

Han Yan^* Zishang Xiang^* Zeyu Zhang^*† Hao Tang^‡
School of Computer Science, Peking University
^*Equal contribution. ^†Project lead. ^‡Corresponding author.

Paper Code Model

Real World Results

Exo View

Ego View

Prediction

Benchmark Results

Prediction

Methods

Overview of the Two-stage training pipeline for MWM. Our training paradigm first performs structure pretraining to learn fine-grained geometry and illumination-dependent appearance, then applies ACC post-training to mitigate compounding error while freezing the CDiT backbone and updating only AdaLN. Within post-training, we introduce ICSD to enable distillation that preserves the consistency objective, while aligning truncated training-time estimates with the inference-time endpoint.

Robot Setup

Real-world deployment setup on the AIRBOT Mobile Manipulation Kit 2 (MMK2). (a) Hardware platform. (b) Deployment process.

Bibtex

@article{yan2026mwm,
  title={MWM: Mobile World Models for Action-Conditioned Consistent Prediction},
  author={Yan, Han and Xiang, Zishang and Zhang, Zeyu and Tang, Hao},
  journal={arXiv preprint arXiv:2603.07799},
  year={2026}
}