TL;DR: Nav-R1 is an embodied foundation model that integrates dialogue, reasoning, planning, and navigation capabilities to enable intelligent interaction and task execution in 3D environments.
Nav-R1 demonstrates strong multimodal understanding, effectively aligning visual, language, and action inputs for navigation.
Nav-R1 enables detailed planning by generating precise, step-by-step trajectories for complex navigation tasks.
Nav-R1 achieves robust navigation, maintaining reliable performance across diverse and challenging environments.
Nav-R1 employs an understanding reward to enhance semantic grounding and improve instruction comprehension.
Nav-R1 incorporates a navigation reward to promote accurate trajectory following and successful task completion.
Nav-R1 leverages a format reward to ensure well-structured reasoning chains and action outputs during navigation.
Nav-CoT-110K is built with a Gemini 2.5 Pro data engine that systematically generates large-scale, diverse navigation trajectories and instructions.
Nav-CoT-110K provides high-quality chain-of-thought annotations that deliver explicit step-by-step reasoning for navigation tasks.
Nav-CoT-110K offers diverse modality coverage, spanning language, vision, and action signals for robust navigation learning.
@article{liu2025navr1, title={Nav-R1: Reasoning and Navigation in Embodied Scenes}, author={Liu, Qingxiang and Huang, Ting and Zhang, Zeyu and Tang, Hao}, journal={arXiv preprint arXiv:2509.10884}, year={2025} }