-
Back to Basics: Let Denoising Generative Models Denoise
Paper • 2511.13720 • Published • 65 -
Virtual Width Networks
Paper • 2511.11238 • Published • 35 -
Routing Manifold Alignment Improves Generalization of Mixture-of-Experts LLMs
Paper • 2511.07419 • Published • 25 -
When Modalities Conflict: How Unimodal Reasoning Uncertainty Governs Preference Dynamics in MLLMs
Paper • 2511.02243 • Published • 24
Collections
Discover the best community collections!
Collections including paper arxiv:2511.11238
-
Arbitrary-steps Image Super-resolution via Diffusion Inversion
Paper • 2412.09013 • Published • 13 -
Deep Researcher with Test-Time Diffusion
Paper • 2507.16075 • Published • 67 -
nablaNABLA: Neighborhood Adaptive Block-Level Attention
Paper • 2507.13546 • Published • 124 -
Yume: An Interactive World Generation Model
Paper • 2507.17744 • Published • 87
-
Contrastive Learning for Many-to-many Multilingual Neural Machine Translation
Paper • 2105.09501 • Published -
Cross-modal Contrastive Learning for Speech Translation
Paper • 2205.02444 • Published -
ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs
Paper • 2210.03052 • Published -
Diffusion Glancing Transformer for Parallel Sequence to Sequence Learning
Paper • 2212.10240 • Published • 1
-
Seed1.5-Thinking: Advancing Superb Reasoning Models with Reinforcement Learning
Paper • 2504.13914 • Published • 4 -
Seed1.5-VL Technical Report
Paper • 2505.07062 • Published • 153 -
ByteDance-Seed/Seed-OSS-36B-Base
Text Generation • 36B • Updated • 4.87k • 56 -
ByteDance-Seed/Seed-OSS-36B-Base-woSyn
Text Generation • 36B • Updated • 105 • 50
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 510 • 98 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 36 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88
-
Back to Basics: Let Denoising Generative Models Denoise
Paper • 2511.13720 • Published • 65 -
Virtual Width Networks
Paper • 2511.11238 • Published • 35 -
Routing Manifold Alignment Improves Generalization of Mixture-of-Experts LLMs
Paper • 2511.07419 • Published • 25 -
When Modalities Conflict: How Unimodal Reasoning Uncertainty Governs Preference Dynamics in MLLMs
Paper • 2511.02243 • Published • 24
-
Seed1.5-Thinking: Advancing Superb Reasoning Models with Reinforcement Learning
Paper • 2504.13914 • Published • 4 -
Seed1.5-VL Technical Report
Paper • 2505.07062 • Published • 153 -
ByteDance-Seed/Seed-OSS-36B-Base
Text Generation • 36B • Updated • 4.87k • 56 -
ByteDance-Seed/Seed-OSS-36B-Base-woSyn
Text Generation • 36B • Updated • 105 • 50
-
Arbitrary-steps Image Super-resolution via Diffusion Inversion
Paper • 2412.09013 • Published • 13 -
Deep Researcher with Test-Time Diffusion
Paper • 2507.16075 • Published • 67 -
nablaNABLA: Neighborhood Adaptive Block-Level Attention
Paper • 2507.13546 • Published • 124 -
Yume: An Interactive World Generation Model
Paper • 2507.17744 • Published • 87
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 510 • 98 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 36 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88
-
Contrastive Learning for Many-to-many Multilingual Neural Machine Translation
Paper • 2105.09501 • Published -
Cross-modal Contrastive Learning for Speech Translation
Paper • 2205.02444 • Published -
ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs
Paper • 2210.03052 • Published -
Diffusion Glancing Transformer for Parallel Sequence to Sequence Learning
Paper • 2212.10240 • Published • 1