StreamingVLM: Real-Time Understanding for Infinite Video Streams Paper β’ 2510.09608 β’ Published Oct 10 β’ 50
CommonForms: A Large, Diverse Dataset for Form Field Detection Paper β’ 2509.16506 β’ Published Sep 20 β’ 19
V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning Paper β’ 2506.09985 β’ Published Jun 11 β’ 29
ComfyUI-Copilot: An Intelligent Assistant for Automated Workflow Development Paper β’ 2506.05010 β’ Published Jun 5 β’ 79
D-FINE Collection State-of-the-art real-time object detection model with Apache 2.0 licence β’ 15 items β’ Updated May 5 β’ 56
view article Article Introducing smolagents: simple agents that write actions in code. +1 Dec 31, 2024 β’ 1.15k
Executable Code Actions Elicit Better LLM Agents Paper β’ 2402.01030 β’ Published Feb 1, 2024 β’ 182
view article Article π¦Έπ»#14: What Is MCP, and Why Is Everyone β Suddenly!β Talking About It? Mar 17 β’ 344
LLM-based User Profile Management for Recommender System Paper β’ 2502.14541 β’ Published Feb 20 β’ 6
From RAG to Memory: Non-Parametric Continual Learning for Large Language Models Paper β’ 2502.14802 β’ Published Feb 20 β’ 13
Enhancing Cognition and Explainability of Multimodal Foundation Models with Self-Synthesized Data Paper β’ 2502.14044 β’ Published Feb 19 β’ 8
RelaCtrl: Relevance-Guided Efficient Control for Diffusion Transformers Paper β’ 2502.14377 β’ Published Feb 20 β’ 12
Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation Paper β’ 2502.14846 β’ Published Feb 20 β’ 14
NAVIG: Natural Language-guided Analysis with Vision Language Models for Image Geo-localization Paper β’ 2502.14638 β’ Published Feb 20 β’ 11
S^2R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning Paper β’ 2502.12853 β’ Published Feb 18 β’ 29