PresentAgent-2

Towards Generalist Multimodal Presentation Agents

Wei Wu1* Ziyang Xu1* Zeyu Zhang1*† Yang Zhao2 Hao Tang1‡
1Peking University 2La Trobe University
*Equal contribution. Project lead. Corresponding author.

The video was generated entirely by PresentAgent-2 without any manual curation.

Single Presentation

3DInAction
AutoSDF
BANMo
Dual Shutter
FastForward
Feature 3DGS
Virtual Sketching
K-Planes
LMTraj
Neural Humans
LightIt
MobileNeRF
MultiPly
RainyGS
RoDynRF
SemanticDraw
SpectroMotion
Think with Video
Trajectory2Pose
ViewDiff

Discussion Presentation

Scientific Taste
Chain of World
DeepSeek V4
DexWM
EgoScale
Exclusive Self-Attention
Fast ThinkAct
Flow Matching
GenMimic
GigaBrain 0.5M
Grow Don't Overwrite
KLong
MultiWorld
Scale Space Diffusion
SimVLA
SlopCodeBench
Million-Step LLM Task
Speculative Decoding
ThinkAct VLA
Thinking with Video

Interaction Presentation

FastForward

K-Planes

RDRF

BANMo

CoWVLA

GenMimic

Speculative Decoding

Chain of World

Method: PresentAgent-2

Overview of the PresentAgent-2 framework. Given a user query and a selected presentation mode, PresentAgent-2 first performs deep research to collect multimodal resources, then constructs presentation content, and finally generates a presentation video in single presentation, discussion, or interaction mode.

Benchmark: PresentEval

Evaluation pipeline. Objective quiz evaluation measures knowledge delivery, while subjective evaluation scores mode-specific presentation quality.

Presentation Agent Comparison

Capability comparison between PresentAgent-2 and representative related systems. ✓ indicates explicit support, △ indicates partial or indirect support, and × indicates that the capability is not supported or not the target of the method.
Method Presentation Discussion Interaction Text Image GIF Video
Paper2Video × × ×
Paper2Poster × × × ×
VideoDirectorGPT × × × × × ×
VideoStudio × × × × × ×
LVD × × × × × ×
PresentAgent × × × ×
PresentAgent-2

BibTeX

@article{wu2026presentagent2,
  title={PresentAgent-2: Towards Generalist Multimodal Presentation Agents},
  author={Wu, Wei and Xu, Ziyang and Zhang, Zeyu and Zhao, Yang and Tang, Hao},
  journal={arXiv preprint arXiv:2605.11363},
  year={2026}
}