UniMesh: Unifying 3D Mesh Understanding and Generation

1Boston University
2Peking University
*Equal contribution. Project lead. Corresponding author.

Mesh Generation and Editing Results

Images meshes edited images edited meshes
astronaut image
astronaut edited image
balloon image
balloon edited image
bulldozer image
bulldozer edited image
lion image
lion edited image
motorcycle image
motorcycle edited image
butterfly image
butterfly edited image
fish image
fish edited image
frog image
frog edited image
spiderman2 image
spiderman2 edited image
vase image
vase edited image

Framework of UniMesh

Framework of UniMesh
Framework of UniMesh. Given a text prompt or modification instruction, BAGEL with Qwen generates an image latent, which is transformed by the Mesh Head into a conditioning latent for Hunyuan3D to produce a 3D mesh. The reference image latent of the generated mesh can be fed back into BAGEL for iterative refinement via Chain-of-Mesh, while self-reflection enables semantic feedback loops for understanding tasks.

Chain of Mesh

Chain of Mesh
Chain of Mesh. A closed-loop "latent, prompting, and re-generation" cycle.

Self-Reflection

Pipeline of Self-Reflection
Pipeline of Self-Reflection. The pipeline progresses from a 3D object, through rendering, view selection, to model captioning. The Reflexion agent continuously corrects errors through iterative loops, proposes improvements, and eventually provides the final answer.

Mesh Generation and Editing

UniMesh enables semantic-aware 3D mesh generation and editing
UniMesh enables semantic-aware 3D mesh generation and editing. From a single text prompt (top row), UniMesh generates high-fidelity 3D meshes. Leveraging its unified understanding--generation architecture, it further supports iterative semantic edits (bottom row), such as changing object color ("blue motorcycle" to "red motorcycle"), adding attributes ("astronaut" to "astronaut holding the Moon"), or modifying structure ("tracks" to "wheels"), demonstrating the synergy between 3D understanding and generation capabilities within the Chain-of-Mesh mechanism.

Object Captioning

Captions generated by UniMesh
Captions generated by UniMesh. In each box, there are 4 good views of a 3D object and a caption of it generated by UniMesh. UniMesh generates detailed, attribute-rich captions, describing not only object identity but also color combinations, structural elements, etc.