Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2507.19849

Towards General-Purpose Model-Free Reinforcement Learning

Paper • 2501.16142 • Published Jan 27 • 30
DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Paper • 2503.14476 • Published Mar 18 • 142
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

Paper • 2504.13837 • Published Apr 18 • 138
Learning to Reason under Off-Policy Guidance

Paper • 2504.14945 • Published Apr 21 • 88

Agentic Reinforced Policy Optimization

Paper • 2507.19849 • Published Jul 26 • 158

Finetuning Strategies

MaPPO: Maximum a Posteriori Preference Optimization with Prior Knowledge

Paper • 2507.21183 • Published Jul 27 • 14
MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE

Paper • 2507.21802 • Published Jul 29 • 17
EDGE-GRPO: Entropy-Driven GRPO with Guided Error Correction for Advantage Diversity

Paper • 2507.21848 • Published Jul 29 • 8
Agentic Reinforced Policy Optimization

Paper • 2507.19849 • Published Jul 26 • 158

Daily high rank paper

Agentic Reinforced Policy Optimization

Paper • 2507.19849 • Published Jul 26 • 158
Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance

Paper • 2507.22448 • Published Jul 30 • 66
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Paper • 2508.18265 • Published Aug 25 • 208
R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning

Paper • 2508.21113 • Published Aug 28 • 110

Agentic Reinforced Policy Optimization

Paper • 2507.19849 • Published Jul 26 • 158

Agentic Reinforced Policy Optimization

Paper • 2507.19849 • Published Jul 26 • 158
In-the-Flow Agentic System Optimization for Effective Planning and Tool Use

Paper • 2510.05592 • Published Oct 7 • 105

Agentic Reinforced Policy Optimization

Paper • 2507.19849 • Published Jul 26 • 158

GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning

Paper • 2507.19457 • Published Jul 25 • 28
Agentic Reinforced Policy Optimization

Paper • 2507.19849 • Published Jul 26 • 158
Group Sequence Policy Optimization

Paper • 2507.18071 • Published Jul 24 • 314
Cache-to-Cache: Direct Semantic Communication Between Large Language Models

Paper • 2510.03215 • Published Oct 3 • 97

Agentic Reinforced Policy Optimization

Paper • 2507.19849 • Published Jul 26 • 158
The Geometry of LLM Quantization: GPTQ as Babai's Nearest Plane Algorithm

Paper • 2507.18553 • Published Jul 24 • 40

The official datasets and model checkpoints of ARPO

Agentic Reinforced Policy Optimization

Paper • 2507.19849 • Published Jul 26 • 158
dongguanting/Qwen3-8B-ARPO-DeepSearch

8B • Updated Jul 29 • 9 • 2
dongguanting/Qwen3-14B-ARPO-DeepSearch

Text Generation • 15B • Updated Aug 12 • 18 • 5
dongguanting/Qwen2.5-7B-ARPO

Text Generation • 8B • Updated Aug 19 • 925 • 2

Towards General-Purpose Model-Free Reinforcement Learning

Paper • 2501.16142 • Published Jan 27 • 30
DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Paper • 2503.14476 • Published Mar 18 • 142
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

Paper • 2504.13837 • Published Apr 18 • 138
Learning to Reason under Off-Policy Guidance

Paper • 2504.14945 • Published Apr 21 • 88

Agentic Reinforced Policy Optimization

Paper • 2507.19849 • Published Jul 26 • 158
In-the-Flow Agentic System Optimization for Effective Planning and Tool Use

Paper • 2510.05592 • Published Oct 7 • 105

Agentic Reinforced Policy Optimization

Paper • 2507.19849 • Published Jul 26 • 158

Agentic Reinforced Policy Optimization

Paper • 2507.19849 • Published Jul 26 • 158

Finetuning Strategies

MaPPO: Maximum a Posteriori Preference Optimization with Prior Knowledge

Paper • 2507.21183 • Published Jul 27 • 14
MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE

Paper • 2507.21802 • Published Jul 29 • 17
EDGE-GRPO: Entropy-Driven GRPO with Guided Error Correction for Advantage Diversity

Paper • 2507.21848 • Published Jul 29 • 8
Agentic Reinforced Policy Optimization

Paper • 2507.19849 • Published Jul 26 • 158

GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning

Paper • 2507.19457 • Published Jul 25 • 28
Agentic Reinforced Policy Optimization

Paper • 2507.19849 • Published Jul 26 • 158
Group Sequence Policy Optimization

Paper • 2507.18071 • Published Jul 24 • 314
Cache-to-Cache: Direct Semantic Communication Between Large Language Models

Paper • 2510.03215 • Published Oct 3 • 97

Daily high rank paper

Agentic Reinforced Policy Optimization

Paper • 2507.19849 • Published Jul 26 • 158
Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance

Paper • 2507.22448 • Published Jul 30 • 66
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Paper • 2508.18265 • Published Aug 25 • 208
R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning

Paper • 2508.21113 • Published Aug 28 • 110

Agentic Reinforced Policy Optimization

Paper • 2507.19849 • Published Jul 26 • 158
The Geometry of LLM Quantization: GPTQ as Babai's Nearest Plane Algorithm

Paper • 2507.18553 • Published Jul 24 • 40

Agentic Reinforced Policy Optimization

Paper • 2507.19849 • Published Jul 26 • 158

The official datasets and model checkpoints of ARPO

Agentic Reinforced Policy Optimization

Paper • 2507.19849 • Published Jul 26 • 158
dongguanting/Qwen3-8B-ARPO-DeepSearch

8B • Updated Jul 29 • 9 • 2
dongguanting/Qwen3-14B-ARPO-DeepSearch

Text Generation • 15B • Updated Aug 12 • 18 • 5
dongguanting/Qwen2.5-7B-ARPO

Text Generation • 8B • Updated Aug 19 • 925 • 2

Previous
1
2
3
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs