VaseVQA-3D: Benchmarking 3D VLMs on Ancient Greek Pottery

TL;DR: VaseVQA-3D introduces the innovative 3D visual question-answering dataset for ancient Greek pottery, featuring 664 annotated vase models, while VaseVLM is a domain-adaptive vision-language model trained for cultural heritage analysis.

3D Caption Dataset

Athenian black-figure lekythos, c. 575–525 BCE, adorned with dogs and owls; National Museum, Copenhagen.

Squat Athenian red-figure lekythos, c. 425–375 BCE, depicting a woman and Eros; Cleveland Museum of Art.

Athenian red-figure cup by Epiktetos, c. 525–475 BCE, featuring symposium scene with reclining woman.

Athenian black-figure lekythos, c. 500–450 BCE, Beldam Workshop, ivy and berry motif, Nola provenance.

Athenian black-figure lekythos, c. 500–450 BCE, adorned with ivy leaf and berry motifs; National Museum, Warsaw.

Athenian black-figure Panathenaic amphora, c. 525–475 BCE, Athena and chariot motifs, attributed to Kleophrades Painter.

3D-QA Dataset

Query_1: What is the fabric of the vase?
Answer_1: The fabric of the vase is ATHENIAN.

Query_2: What is the technique of the vase?
Answer_2: The technique of the vase is BLACK-FIGURE.

……

Query_6: What is the decoration of the vase?
Answer_6: The decoration of the vase is a: fight with chariot, warrior (in nebris?), shield device, snake; b: Dionysos with drinking horn between satyrs, one with wineskin.

Query_1: What is the fabric of the vase?
Answer_1: The fabric of the vase is ATHENIAN.

Query_2: What is the technique of the vase?
Answer_2: The technique of the vase is RED-FIGURE.

……

Query_6: What is the decoration of the vase?
Answer_6: The decoration of the vase is body: head of woman in sakkos, tendril.

VaseEval

Hunyuan3D

TripoSG

VaseVLM

You browser does not support this image.

Complete Pipeline for Vase Dataset Construction. The pipeline progresses from initial data collection (30K+ images) through quality filtering (664 images), 3D generation (664 models), QA construction (9K pairs), to final model training. Each component includes specific quality control mechanisms and validation procedures.

Data-Centric Learning

Complete Data Quality Filtering Pipeline. The figure shows our comprehensive filtering methodology, including ResNet-50-based quality assessment for removing low-quality images, followed by dual CLIP-based semantic filtering for fragment removal and optimal image selection.

Reinforcement Learning

Reinforcement Learning with Verifiable Rewards (RLVR) Framework. The figure shows our multi-dimensional reward computation system that evaluates archaeological descriptions across six semantic dimensions: Fabric, Technique, Shape, Dating, Decoration, and Attribution. The framework includes semantic similarity analysis, quality control penalties, and similarity rewards to ensure accurate and academically appropriate responses.

BibTeX

@article{zhang2025vasevqa,
  title={VaseVQA-3D: Benchmarking 3D VLMs on Ancient Greek Pottery},
  author={Zhang, Nonghai and Zhang, Zeyu and Wang, Jiazi and Zhao, Yang and Tang, Hao},
  journal={arXiv preprint arXiv:2510.04479},
  year={2025}
}