MR$^2$-Bench: Going Beyond Matching to Reasoning in Multimodal Retrieval Paper • 2509.26378 • Published Sep 30
MomentSeeker: A Comprehensive Benchmark and A Strong Baseline For Moment Retrieval Within Long Videos Paper • 2502.12558 • Published Feb 18
Any Information Is Just Worth One Single Screenshot: Unifying Search With Visualized Information Retrieval Paper • 2502.11431 • Published Feb 17
VideoDeepResearch: Long Video Understanding With Agentic Tool Using Paper • 2506.10821 • Published Jun 12 • 19
Video-XL: Extra-Long Vision Language Model for Hour-Scale Video Understanding Paper • 2409.14485 • Published Sep 22, 2024 • 2
MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval Paper • 2412.14475 • Published Dec 19, 2024 • 55
MLVU: A Comprehensive Benchmark for Multi-Task Long Video Understanding Paper • 2406.04264 • Published Jun 6, 2024 • 2
VISTA: Visualized Text Embedding For Universal Multi-Modal Retrieval Paper • 2406.04292 • Published Jun 6, 2024 • 1