MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing Paper • 2509.22186 • Published Sep 26 • 136
StreamingVLM: Real-Time Understanding for Infinite Video Streams Paper • 2510.09608 • Published Oct 10 • 50
Trace Anything: Representing Any Video in 4D via Trajectory Fields Paper • 2510.13802 • Published Oct 15 • 30
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing Paper • 2509.22186 • Published Sep 26 • 136
Easy Dataset: A Unified and Extensible Framework for Synthesizing LLM Fine-Tuning Data from Unstructured Documents Paper • 2507.04009 • Published Jul 5 • 51
Native Visual Understanding: Resolving Resolution Dilemmas in Vision-Language Models Paper • 2506.12776 • Published Jun 15 • 2