VaseVQA-3D: Benchmarking 3D VLMs on Ancient Greek Pottery
TL;DR: VaseVQA-3D introduces the innovative 3D visual question-answering dataset for ancient Greek pottery, featuring 664 annotated vase models, while VaseVLM is a domain-adaptive vision-language model trained for cultural heritage analysis.
3D Caption Dataset
3D-QA Dataset
Query_1: What is the fabric of the vase?
Answer_1: The fabric of the vase is ATHENIAN.
Query_2: What is the technique of the vase?
Answer_2: The technique of the vase is BLACK-FIGURE.
……
Query_6: What is the decoration of the vase?
Answer_6: The decoration of the vase is a: fight with chariot, warrior (in nebris?), shield device, snake; b: Dionysos with drinking horn between satyrs, one with wineskin.
Query_1: What is the fabric of the vase?
Answer_1: The fabric of the vase is ATHENIAN.
Query_2: What is the technique of the vase?
Answer_2: The technique of the vase is RED-FIGURE.
……
Query_6: What is the decoration of the vase?
Answer_6: The decoration of the vase is body: head of woman in sakkos, tendril.
VaseEval
VaseVLM
Complete Pipeline for Vase Dataset Construction. The pipeline progresses from initial data collection (30K+ images) through quality filtering (664 images), 3D generation (664 models), QA construction (9K pairs), to final model training. Each component includes specific quality control mechanisms and validation procedures.
Data-Centric Learning
Complete Data Quality Filtering Pipeline. The figure shows our comprehensive filtering methodology, including ResNet-50-based quality assessment for removing low-quality images, followed by dual CLIP-based semantic filtering for fragment removal and optimal image selection.
Reinforcement Learning
Reinforcement Learning with Verifiable Rewards (RLVR) Framework. The figure shows our multi-dimensional reward computation system that evaluates archaeological descriptions across six semantic dimensions: Fabric, Technique, Shape, Dating, Decoration, and Attribution. The framework includes semantic similarity analysis, quality control penalties, and similarity rewards to ensure accurate and academically appropriate responses.