-
Group Sequence Policy Optimization
Paper • 2507.18071 • Published • 314 -
LAPO: Internalizing Reasoning Efficiency via Length-Adaptive Policy Optimization
Paper • 2507.15758 • Published • 35 -
Hierarchical Budget Policy Optimization for Adaptive Reasoning
Paper • 2507.15844 • Published • 16 -
Semi-off-Policy Reinforcement Learning for Vision-Language Slow-thinking Reasoning
Paper • 2507.16814 • Published • 21
Collections
Discover the best community collections!
Collections including paper arxiv:2507.15758
-
LAPO: Internalizing Reasoning Efficiency via Length-Adaptive Policy Optimization
Paper • 2507.15758 • Published • 35 -
Hierarchical Budget Policy Optimization for Adaptive Reasoning
Paper • 2507.15844 • Published • 16 -
DriftMoE: A Mixture of Experts Approach to Handle Concept Drifts
Paper • 2507.18464 • Published • 11 -
Finding Dori: Memorization in Text-to-Image Diffusion Models Is Less Local Than Assumed
Paper • 2507.16880 • Published • 6
-
Low-Rank Adapters Meet Neural Architecture Search for LLM Compression
Paper • 2501.16372 • Published • 12 -
TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models
Paper • 2501.16937 • Published • 7 -
Matryoshka Quantization
Paper • 2502.06786 • Published • 32 -
Identifying Sensitive Weights via Post-quantization Integral
Paper • 2503.01901 • Published • 8
-
Snowflake/Arctic-Text2SQL-R1-7B
8B • Updated • 12k • 56 -
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper • 2505.24726 • Published • 277 -
Reinforcement Pre-Training
Paper • 2506.08007 • Published • 263 -
Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights
Paper • 2506.16406 • Published • 127
-
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
Paper • 2501.18585 • Published • 61 -
LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters!
Paper • 2502.07374 • Published • 40 -
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling
Paper • 2502.06703 • Published • 152 -
S*: Test Time Scaling for Code Generation
Paper • 2502.14382 • Published • 63
-
Group Sequence Policy Optimization
Paper • 2507.18071 • Published • 314 -
LAPO: Internalizing Reasoning Efficiency via Length-Adaptive Policy Optimization
Paper • 2507.15758 • Published • 35 -
Hierarchical Budget Policy Optimization for Adaptive Reasoning
Paper • 2507.15844 • Published • 16 -
Semi-off-Policy Reinforcement Learning for Vision-Language Slow-thinking Reasoning
Paper • 2507.16814 • Published • 21
-
Snowflake/Arctic-Text2SQL-R1-7B
8B • Updated • 12k • 56 -
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper • 2505.24726 • Published • 277 -
Reinforcement Pre-Training
Paper • 2506.08007 • Published • 263 -
Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights
Paper • 2506.16406 • Published • 127
-
LAPO: Internalizing Reasoning Efficiency via Length-Adaptive Policy Optimization
Paper • 2507.15758 • Published • 35 -
Hierarchical Budget Policy Optimization for Adaptive Reasoning
Paper • 2507.15844 • Published • 16 -
DriftMoE: A Mixture of Experts Approach to Handle Concept Drifts
Paper • 2507.18464 • Published • 11 -
Finding Dori: Memorization in Text-to-Image Diffusion Models Is Less Local Than Assumed
Paper • 2507.16880 • Published • 6
-
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
Paper • 2501.18585 • Published • 61 -
LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters!
Paper • 2502.07374 • Published • 40 -
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling
Paper • 2502.06703 • Published • 152 -
S*: Test Time Scaling for Code Generation
Paper • 2502.14382 • Published • 63
-
Low-Rank Adapters Meet Neural Architecture Search for LLM Compression
Paper • 2501.16372 • Published • 12 -
TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models
Paper • 2501.16937 • Published • 7 -
Matryoshka Quantization
Paper • 2502.06786 • Published • 32 -
Identifying Sensitive Weights via Post-quantization Integral
Paper • 2503.01901 • Published • 8