PlatonicNav: Unveiling Semantic Correspondence in
Navigation with Platonic Topological Maps

Junlin Long1* Zeyu Zhang2*† Xu Deng3* Yiran Wang1*
Yue Yang2 Luke Borgnolo2 Maxwell Twelftree2 Yang Zhao4‡
1USYD 2Maincode 3UNSW 4La Trobe
*Equal contribution. Project lead. Corresponding author.

arXiv Code

0:00

TL;DR: PlatonicNav enables training-free embodied navigation through blind semantic matching between vision and language, unifying VLN and ObjNav using self-supervised visual representations.



Real World Evaluation: ObjNav

Demo 1: Pre-exploration

Ego Depth Pointmap
Ego Object Map

Demo 1: Navigation

Ego Depth Pointmap
BEV

Demo 2: Pre-exploration

Ego Depth Pointmap
Ego Object Map

Demo 2: Navigation

Ego Depth Pointmap
BEV

Demo 3: Pre-exploration

Ego Depth Pointmap
Ego Object Map

Demo 3: Navigation

Ego Depth Pointmap
BEV

Real World Evaluation: VLN

Find the Plant

Ego Depth Pointmap
BEV

Go to the Chair

Ego Depth Pointmap
BEV

Go to the Lamp

Ego Depth Pointmap
BEV

Simulation: ObjNav (OVON)

Demo 1: Refrigerator

Ego Depth BEV

Demo 2: TV Stand

Ego Depth BEV

Demo 3: Desk

Ego Depth BEV

Demo 4: Sofa Chair

Ego Depth BEV

Demo 5: Dining Chair

Ego Depth BEV

Demo 6: Chair

Ego Depth BEV

Demo 7: Photo

Ego Depth BEV

Simulation: VLN (R2R-CE)

Demo 1: Fireplace

From here, walk to the front of the fireplace.

Ego Depth BEV

Demo 2: Stairs

From here, head towards the stairs. Stop on the round rug next to the flowers.

Ego Depth BEV

Demo 3: Couch

From here, turn left and go straight until you get to three tables with chairs. Turn left and wait near the couch.

Ego Depth BEV

Demo 4: Island

From here, walk into the dining room area. Stop in front of the island.

Ego Depth BEV

Demo 5: Table

From here, walk into the kitchen, around the dining table to the buffet. Stop and wait there.

Ego Depth BEV

Demo 6: Desk

From here, walk towards the desk in the office area. Stop next to the desk.

Ego Depth BEV

Demo 7: Chair

From here, move ahead in between bar and table to the chair.

Ego Depth BEV

Demo 8: Table

From here, turn left and go straight until you get to a large table.

Ego Depth BEV

Demo 9: Stairs

From here, walk down the first set of stairs. Wait there.

Ego Depth BEV

Demo 10: Stairs

From here, turn left continue down the hallway until you get to the stairs. Wait there.

Ego Depth BEV

Demo 11: Stairs

From here, exit the living room, turn left, wait at the bottom of the stairs.

Ego Depth BEV

Demo 12: Stairs

From here, then turn left again and go down the stairs. Stop before going outside.

Ego Depth BEV

Demo 13: Stairs

From here, walk down stairs. Wait at bottom of stairs.

Ego Depth BEV

Method

PlatonicNav pipeline

PlatonicNav Pipeline. (a) Mapping: We construct Platonic Topological Map as a semantic scene graph, where image segments are used as object nodes, and edges are weighted by both geometric distance and semantic distance computed from vision embedding space. (b) Goal Selection: Given the natural-language instruction, we pairwise blind match language embeddings of goal object category and visual embedding of segment cluster, selecting the candidate goal nodes in Platonic Topological Map. (c) Execution: Given the map and candidate goal nodes, we compute the paths to the goal node which can be reached by lightest edge weight; the resulting path lengths are assigned to segmentation masks to form a PlatonicObject Costmap for control prediction.

Blind Matching of Vision and Language in Navigation

Blind matching of vision and language in navigation

Blind matching of vision and language in navigation scene. Text and images are both abstractions of the same underlying world. Vision and language encoders fv and fl learn similar pairwise relations between concepts. We exploit these pairwise relations in a matching solver to recover valid correspondences between vision and language representations without requiring any paired data.

Citation

@techreport{long2026platonicnav,
    title={PlatonicNav: Unveiling Semantic Correspondence in Navigation with Platonic Topological Maps},
    author={Junlin Long and Zeyu Zhang and Xu Deng and Yiran Wang and Yue Yang and Luke Borgnolo and Maxwell Twelftree and Yang Zhao},
    institution={Technical Report},
    year={2026}
    }