omlab/VLM-R1-Qwen2.5VL-3B-Math-0305 Visual Question Answering • 4B • Updated Apr 14 • 107 • 8
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models Paper • 2504.10479 • Published Apr 14 • 303
Configuration error Featured 359 GOT Online 💬 359 Extract text from images using various OCR modes
Running Featured 501 InternVL âš¡ 501 Interact with a multimodal chatbot that analyzes images and text
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling Paper • 2412.05271 • Published Dec 6, 2024 • 159
DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception Paper • 2410.12628 • Published Oct 16, 2024 • 41