Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2503.20215

Qwen2.5-Omni Technical Report

Paper • 2503.20215 • Published Mar 26 • 166
Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO

Paper • 2505.22453 • Published May 28 • 46
UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning

Paper • 2505.23380 • Published May 29 • 22
More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models

Paper • 2505.21523 • Published May 23 • 13

VLM RL Reasoning

OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement

Paper • 2503.17352 • Published Mar 21 • 24
When Less is Enough: Adaptive Token Reduction for Efficient Image Representation

Paper • 2503.16660 • Published Mar 20 • 72
CoMP: Continual Multimodal Pre-training for Vision Foundation Models

Paper • 2503.18931 • Published Mar 24 • 30
MDocAgent: A Multi-Modal Multi-Agent Framework for Document Understanding

Paper • 2503.13964 • Published Mar 18 • 20

R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization

Paper • 2503.10615 • Published Mar 13 • 17
UniGoal: Towards Universal Zero-shot Goal-oriented Navigation

Paper • 2503.10630 • Published Mar 13 • 6
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Paper • 2503.09516 • Published Mar 12 • 36
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL

Paper • 2503.07536 • Published Mar 10 • 88

Hephaestus: Improving Fundamental Agent Capabilities of Large Language Models through Continual Pre-Training

Paper • 2502.06589 • Published Feb 10 • 21
LIMO: Less is More for Reasoning

Paper • 2502.03387 • Published Feb 5 • 62
Control LLM: Controlled Evolution for Intelligence Retention in LLM

Paper • 2501.10979 • Published Jan 19 • 6
LLMQuoter: Enhancing RAG Capabilities Through Efficient Quote Extraction From Large Contexts

Paper • 2501.05554 • Published Jan 9 • 1

MLLM-as-a-Judge for Image Safety without Human Labeling

Paper • 2501.00192 • Published Dec 31, 2024 • 31
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

Paper • 2501.00958 • Published Jan 1 • 107
Xmodel-2 Technical Report

Paper • 2412.19638 • Published Dec 27, 2024 • 26
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs

Paper • 2412.18925 • Published Dec 25, 2024 • 104

Reinforcement Learning: An Overview

Paper • 2412.05265 • Published Dec 6, 2024 • 8
Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis

Paper • 2411.01156 • Published Nov 2, 2024 • 11
VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness

Paper • 2503.21755 • Published Mar 27 • 33
Qwen2.5-Omni Technical Report

Paper • 2503.20215 • Published Mar 26 • 166

End-to-End Omni (text, audio, image, video, and natural speech interaction) model based Qwen2.5

Running

Featured

363

Qwen2.5 Omni 7B Demo

🏆

363

Generate text and speech from text, audio, images, and videos
Qwen2.5-Omni Technical Report

Paper • 2503.20215 • Published Mar 26 • 166
Qwen/Qwen2.5-Omni-7B

Any-to-Any • 11B • Updated Apr 30 • 138k • 1.83k
Qwen/Qwen2.5-Omni-3B

Any-to-Any • 6B • Updated Apr 30 • 295k • 311

Research Papers

Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters

Paper • 2408.03314 • Published Aug 6, 2024 • 63
TAG: A Decentralized Framework for Multi-Agent Hierarchical Reinforcement Learning

Paper • 2502.15425 • Published Feb 21 • 9
EgoLife: Towards Egocentric Life Assistant

Paper • 2503.03803 • Published Mar 5 • 46
Visual-RFT: Visual Reinforcement Fine-Tuning

Paper • 2503.01785 • Published Mar 3 • 85

My reading list!

RobustFT: Robust Supervised Fine-tuning for Large Language Models under Noisy Response

Paper • 2412.14922 • Published Dec 19, 2024 • 88
Qwen2.5 Technical Report

Paper • 2412.15115 • Published Dec 19, 2024 • 376
Progressive Multimodal Reasoning via Active Retrieval

Paper • 2412.14835 • Published Dec 19, 2024 • 73
Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps

Paper • 2501.09732 • Published Jan 16 • 71

2025 LLM Papers on Hugging Face with Japanese Memos

MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models

Paper • 2501.02955 • Published Jan 6 • 44
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

Paper • 2501.00958 • Published Jan 1 • 107
MMVU: Measuring Expert-Level Multi-Discipline Video Understanding

Paper • 2501.12380 • Published Jan 21 • 85
VideoWorld: Exploring Knowledge Learning from Unlabeled Videos

Paper • 2501.09781 • Published Jan 16 • 28

Qwen2.5-Omni Technical Report

Paper • 2503.20215 • Published Mar 26 • 166
Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO

Paper • 2505.22453 • Published May 28 • 46
UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning

Paper • 2505.23380 • Published May 29 • 22
More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models

Paper • 2505.21523 • Published May 23 • 13

Reinforcement Learning: An Overview

Paper • 2412.05265 • Published Dec 6, 2024 • 8
Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis

Paper • 2411.01156 • Published Nov 2, 2024 • 11
VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness

Paper • 2503.21755 • Published Mar 27 • 33
Qwen2.5-Omni Technical Report

Paper • 2503.20215 • Published Mar 26 • 166

VLM RL Reasoning

OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement

Paper • 2503.17352 • Published Mar 21 • 24
When Less is Enough: Adaptive Token Reduction for Efficient Image Representation

Paper • 2503.16660 • Published Mar 20 • 72
CoMP: Continual Multimodal Pre-training for Vision Foundation Models

Paper • 2503.18931 • Published Mar 24 • 30
MDocAgent: A Multi-Modal Multi-Agent Framework for Document Understanding

Paper • 2503.13964 • Published Mar 18 • 20

End-to-End Omni (text, audio, image, video, and natural speech interaction) model based Qwen2.5

Running

Featured

363

Qwen2.5 Omni 7B Demo

🏆

363

Generate text and speech from text, audio, images, and videos
Qwen2.5-Omni Technical Report

Paper • 2503.20215 • Published Mar 26 • 166
Qwen/Qwen2.5-Omni-7B

Any-to-Any • 11B • Updated Apr 30 • 138k • 1.83k
Qwen/Qwen2.5-Omni-3B

Any-to-Any • 6B • Updated Apr 30 • 295k • 311

R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization

Paper • 2503.10615 • Published Mar 13 • 17
UniGoal: Towards Universal Zero-shot Goal-oriented Navigation

Paper • 2503.10630 • Published Mar 13 • 6
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Paper • 2503.09516 • Published Mar 12 • 36
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL

Paper • 2503.07536 • Published Mar 10 • 88

Research Papers

Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters

Paper • 2408.03314 • Published Aug 6, 2024 • 63
TAG: A Decentralized Framework for Multi-Agent Hierarchical Reinforcement Learning

Paper • 2502.15425 • Published Feb 21 • 9
EgoLife: Towards Egocentric Life Assistant

Paper • 2503.03803 • Published Mar 5 • 46
Visual-RFT: Visual Reinforcement Fine-Tuning

Paper • 2503.01785 • Published Mar 3 • 85

Hephaestus: Improving Fundamental Agent Capabilities of Large Language Models through Continual Pre-Training

Paper • 2502.06589 • Published Feb 10 • 21
LIMO: Less is More for Reasoning

Paper • 2502.03387 • Published Feb 5 • 62
Control LLM: Controlled Evolution for Intelligence Retention in LLM

Paper • 2501.10979 • Published Jan 19 • 6
LLMQuoter: Enhancing RAG Capabilities Through Efficient Quote Extraction From Large Contexts

Paper • 2501.05554 • Published Jan 9 • 1

My reading list!

RobustFT: Robust Supervised Fine-tuning for Large Language Models under Noisy Response

Paper • 2412.14922 • Published Dec 19, 2024 • 88
Qwen2.5 Technical Report

Paper • 2412.15115 • Published Dec 19, 2024 • 376
Progressive Multimodal Reasoning via Active Retrieval

Paper • 2412.14835 • Published Dec 19, 2024 • 73
Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps

Paper • 2501.09732 • Published Jan 16 • 71

MLLM-as-a-Judge for Image Safety without Human Labeling

Paper • 2501.00192 • Published Dec 31, 2024 • 31
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

Paper • 2501.00958 • Published Jan 1 • 107
Xmodel-2 Technical Report

Paper • 2412.19638 • Published Dec 27, 2024 • 26
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs

Paper • 2412.18925 • Published Dec 25, 2024 • 104

2025 LLM Papers on Hugging Face with Japanese Memos

MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models

Paper • 2501.02955 • Published Jan 6 • 44
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

Paper • 2501.00958 • Published Jan 1 • 107
MMVU: Measuring Expert-Level Multi-Discipline Video Understanding

Paper • 2501.12380 • Published Jan 21 • 85
VideoWorld: Exploring Knowledge Learning from Unlabeled Videos

Paper • 2501.09781 • Published Jan 16 • 28

Previous
1
2
3
4
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs