MobileVLA-R1: Reinforcing Vision-Language-Action for Mobile Robots

TL;DR: MobileVLA-R1 enables robust real-world quadruped control by unifying language reasoning and continuous action through structured CoT alignment and GRPO training.

Real World Results

Exo View

Ego View

Depth

Point Cloud

Corridor tasks

Starting from the initial position, walk forward to find a cardboard box, stop in front of it, and lie down.

Starting from the initial position, walk forward until you find a trash can, stop in front of it, and then greet it.

Starting from the initial position, avoid the cardboard boxes and trash cans, turn right at the second door, stop there, and sit down.

Outdoor tasks #1

Starting from your initial position, walk to the right and find a black bicycle. Once you find it, sit down next to it.

Turn left and slowly go up the stairs, then walk straight ahead for five seconds and stop. Turn right and descend the stairs smoothly.

Starting from your starting position, walk straight ahead until you see a forest, then turn right and stop and stand upside down for three seconds.

Outdoor tasks #2

Starting from the initial position, walk straight forward, then move to the right in front of the white electric scooter, and finally lie down.

Starting from the initial position, look for the black car to your right rear, walk to its tires, sit down and then stand up.

Starting from the initial position, walk straight forward to the stone block and stop, then jump forward.

Simulation Results

Ego View

Depth

BEV

R2R tasks

Walk through doorway towards the tub, turn left and walk towards the bar, right before the bar turn left, walk up the stairs and walk through the first door on the right, stop right before the table.

Walk around the bed and exit the room. Turn right and walk into the room with another bed in it. Stop at the side of the bed.

Walk away from the pool, up the three marble steps to the left of the bar, walk through the doorway, up three more steps, and stop in the doorway on the right.

Exit the room, then immediately turn left through the doorway. Wait by the double doors.

RxR tasks

You are facing a long table with chairs. You are going to turn around and walk down a walk way. You are going to continue forward. There will be glass doors on your right and on your left will be a marble island with stools at it. You are going to stop right in between those glass doors and the marble top island with the chairs and you are done.

You are standing near a table which has some machines on it, take few steps straight ahead and exit through the entrance on your right, then take a step inside the beige floor hallway where there is a console on your right then take the way in this kitchen which has brick walls and there is a counter on your left and there is an oven on your right, position yourself between them and move straight ahead when the oven and the counter ends there is an entrance on your right towards the living room, take one step inside the living room turn towards the console table which has white lamp, then turn left and take one more step towards the cross on the console table on your left then turn right, then there is a round table with two chairs take this way cross them and keep them on right and move towards the another dining table on your left, which has a candle holder and a white flower vase on it, once you are near that table turn a slight right towards the pots with spiral plants, they are on the window then move towards the back of the head chair facing the window you are done.

You’re starting in the toilet room of a master bathroom. Turn around to your right. Exit this room into the rest of the master bathroom, walking toward the white, large tub. Veer to the left of the tub and walk through the archway. And, you’re now in a hallway. To the left, there is a TV. To the right, there’s another large archway with a brown door that goes into what looks like a closet. Go straight ahead. You’ll see another arched doorway with two doors that are open. Go straight through. This will take you to a wall with two framed pictures (I guess). Take a right, walk down the hall, and stand next to the piano, and you’re done.

MobileVLA-R1

You browser does not support this image.

Architecture of MobileVLA-R1. MobileVLA-R1 is an end-to-end framework that integrates natural-language instructions with multimodal perception. It processes RGB, depth, and point cloud observations together with textual commands to generate continuous locomotion actions, enabling mobile robots to follow complex instructions and adapt to diverse environments in real time.

CoT Data Engine

CoT Data Engine. We construct the MobileVLA-CoT by defining navigation and step-level instructions, integrating RGB–Depth visual inputs, and specifying structured reasoning prompts. These inputs are fed into Gemini-2.5-Flash, which generates multi-granularity Chain-of-Thought (CoT) annotations with corresponding action outputs.

RLVR

The pipeline of RL policy. The model generates N responses from a given input, rewards are then computed for each response. After normalizing and clipping, these rewards are conflated with a KL-divergence term, which prevents the model from over-updating, to update the policy.

BibTeX

@article{huang2025mobilevla,
  title={MobileVLA-R1: Reinforcing Vision-Language-Action for Mobile Robots},
  author={Huang, Ting and Li, Dongjian and Yang, Rui and Zhang, Zeyu and Yang, Zida and Tang, Hao},
  journal={arXiv preprint arXiv:2511.17889},
  year={2025},
  url={https://arxiv.org/abs/2511.17889}
}