Generalizable End-to-End Tool-Use RL with Synthetic CodeGym Paper • 2509.17325 • Published Sep 22 • 1
STAMP: Spatial-Temporal Adapter with Multi-Head Pooling Paper • 2511.10848 • Published 24 days ago • 1
OpenBEATs: A Fully Open-Source General-Purpose Audio Encoder Paper • 2507.14129 • Published Jul 18 • 9
POWSM: A Phonetic Open Whisper-Style Speech Foundation Model Paper • 2510.24992 • Published Oct 28 • 2
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities Paper • 2507.06261 • Published Jul 7 • 64
ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning Paper • 2505.22094 • Published May 28 • 3
DeepResearchGym: A Free, Transparent, and Reproducible Evaluation Sandbox for Deep Research Paper • 2505.19253 • Published May 25 • 32
Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia Paper • 2503.07920 • Published Mar 10 • 101
SkillWeaver: Web Agents can Self-Improve by Discovering and Honing Skills Paper • 2504.07079 • Published Apr 9 • 12
VisualPuzzles: Decoupling Multimodal Reasoning Evaluation from Domain Knowledge Paper • 2504.10342 • Published Apr 14 • 10
CrossWordBench: Evaluating the Reasoning Capabilities of LLMs and LVLMs with Controllable Puzzle Generation Paper • 2504.00043 • Published Mar 30 • 9
S$^{2}$FT: Efficient, Scalable and Generalizable LLM Fine-tuning by Structured Sparsity Paper • 2412.06289 • Published Dec 9, 2024
Evaluating Language Models as Synthetic Data Generators Paper • 2412.03679 • Published Dec 4, 2024 • 48
IndicGenBench: A Multilingual Benchmark to Evaluate Generation Capabilities of LLMs on Indic Languages Paper • 2404.16816 • Published Apr 25, 2024 • 3
Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages Paper • 2410.16153 • Published Oct 21, 2024 • 44
Taming Overconfidence in LLMs: Reward Calibration in RLHF Paper • 2410.09724 • Published Oct 13, 2024 • 3