MWM: Mobile World Models for Action-Conditioned Consistent Prediction
Overview of the Two-stage training pipeline for MWM. Our training paradigm first performs structure pretraining to learn fine-grained geometry and illumination-dependent appearance, then applies ACC post-training to mitigate compounding error while freezing the CDiT backbone and updating only AdaLN. Within post-training, we introduce ICSD to enable distillation that preserves the consistency objective, while aligning truncated training-time estimates with the inference-time endpoint.
Real-world deployment setup on the AIRBOT Mobile Manipulation Kit 2 (MMK2). (a) Hardware platform. (b) Deployment process.
@article{yan2026mwm,
title={MWM: Mobile World Models for Action-Conditioned Consistent Prediction},
author={Yan, Han and Xiang, Zishang and Zhang, Zeyu and Tang, Hao},
journal={arXiv preprint arXiv:2603.07799},
year={2026}
}