Framework of UniMesh.
Given a text prompt or modification instruction, BAGEL with Qwen generates an image latent, which
is transformed by the Mesh Head into a conditioning latent for Hunyuan3D to produce a 3D mesh. The
reference image latent of the generated mesh can be fed back into BAGEL for iterative refinement
via Chain-of-Mesh, while self-reflection enables semantic feedback loops for understanding tasks.
Chain of Mesh
Chain of Mesh.
A closed-loop "latent, prompting, and re-generation" cycle.
Self-Reflection
Pipeline of Self-Reflection.
The pipeline progresses from a 3D object, through rendering, view selection, to model captioning.
The Reflexion agent continuously corrects errors through iterative loops, proposes improvements,
and eventually provides the final answer.
Mesh Generation and Editing
UniMesh enables semantic-aware 3D mesh generation and editing.
From a single text prompt (top row), UniMesh generates high-fidelity 3D meshes. Leveraging its
unified understanding--generation architecture, it further supports iterative semantic edits
(bottom row), such as changing object color ("blue motorcycle" to "red motorcycle"), adding
attributes ("astronaut" to "astronaut holding the Moon"), or modifying structure ("tracks" to
"wheels"), demonstrating the synergy between 3D understanding and generation capabilities within
the Chain-of-Mesh mechanism.
Object Captioning
Captions generated by UniMesh.
In each box, there are 4 good views of a 3D object and a caption of it generated by UniMesh.
UniMesh generates detailed, attribute-rich captions, describing not only object identity but also
color combinations, structural elements, etc.