REG: A Regularization Optimizer for Robust Training Dynamics Paper • 2510.03691 • Published Oct 4 • 1
ROOT: Robust Orthogonalized Optimizer for Neural Network Training Paper • 2511.20626 • Published 12 days ago • 169
SSA: Sparse Sparse Attention by Aligning Full and Sparse Attention Outputs in Feature Space Paper • 2511.20102 • Published 13 days ago • 26
Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B Paper • 2511.06221 • Published 29 days ago • 128
Kimi Linear: An Expressive, Efficient Attention Architecture Paper • 2510.26692 • Published Oct 30 • 115
Parallel Loop Transformer for Efficient Test-Time Computation Scaling Paper • 2510.24824 • Published Oct 28 • 15
Efficient Parallel Samplers for Recurrent-Depth Models and Their Connection to Diffusion Language Models Paper • 2510.14961 • Published Oct 16 • 7
StockBench: Can LLM Agents Trade Stocks Profitably In Real-world Markets? Paper • 2510.02209 • Published Oct 2 • 52
Sparse Query Attention (SQA): A Computationally Efficient Attention Mechanism with Query Heads Reduction Paper • 2510.01817 • Published Oct 2 • 15
Benchmarking Optimizers for Large Language Model Pretraining Paper • 2509.01440 • Published Sep 1 • 24