Efficient Multi-turn RL for GUI Agents via Decoupled Training and Adaptive Data Curation Paper • 2509.23866 • Published Sep 28 • 13
Chain-of-Focus: Adaptive Visual Search and Zooming for Multimodal Reasoning via RL Paper • 2505.15436 • Published May 21 • 2
InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning Paper • 2502.11573 • Published Feb 17 • 9
Multi-modal Agent Tuning: Building a VLM-Driven Agent for Efficient Tool Usage Paper • 2412.15606 • Published Dec 20, 2024 • 2
FIRE: A Dataset for Feedback Integration and Refinement Evaluation of Multimodal Models Paper • 2407.11522 • Published Jul 16, 2024 • 9