Information Gain-based Policy Optimization: A Simple and Effective Approach for Multi-Turn LLM Agents Paper • 2510.14967 • Published Oct 16 • 33
The Choice of Divergence: A Neglected Key to Mitigating Diversity Collapse in Reinforcement Learning with Verifiable Reward Paper • 2509.07430 • Published Sep 9 • 3
Pixel Reasoner: Incentivizing Pixel-Space Reasoning with Curiosity-Driven Reinforcement Learning Paper • 2505.15966 • Published May 21 • 53