-
Self-Forcing++: Towards Minute-Scale High-Quality Video Generation
Paper • 2510.02283 • Published • 95 -
Paper2Video: Automatic Video Generation from Scientific Papers
Paper • 2510.05096 • Published • 116 -
LongLive: Real-time Interactive Long Video Generation
Paper • 2509.22622 • Published • 184 -
HuMo: Human-Centric Video Generation via Collaborative Multi-Modal Conditioning
Paper • 2509.08519 • Published • 128
Collections
Discover the best community collections!
Collections including paper arxiv:2508.03694
-
openai/gpt-oss-120b
Text Generation • 120B • Updated • 4.48M • • 4.22k -
LongVie: Multimodal-Guided Controllable Ultra-Long Video Generation
Paper • 2508.03694 • Published • 50 -
Tool-integrated Reinforcement Learning for Repo Deep Search
Paper • 2508.03012 • Published • 20 -
QuantTrio/DeepSeek-V3.2-Exp-AWQ
Text Generation • Updated • 4.66k • 4
-
Animate-X: Universal Character Image Animation with Enhanced Motion Representation
Paper • 2410.10306 • Published • 56 -
ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning
Paper • 2411.05003 • Published • 71 -
TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation
Paper • 2411.04709 • Published • 26 -
IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation
Paper • 2410.07171 • Published • 43
-
WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens
Paper • 2401.09985 • Published • 18 -
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects
Paper • 2401.09962 • Published • 9 -
Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution
Paper • 2401.10404 • Published • 10 -
ActAnywhere: Subject-Aware Video Background Generation
Paper • 2401.10822 • Published • 13
-
LongVie: Multimodal-Guided Controllable Ultra-Long Video Generation
Paper • 2508.03694 • Published • 50 -
On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification
Paper • 2508.05629 • Published • 180 -
Improving Video Generation with Human Feedback
Paper • 2501.13918 • Published • 52 -
Unified Reward Model for Multimodal Understanding and Generation
Paper • 2503.05236 • Published • 122
-
Locality-aware Parallel Decoding for Efficient Autoregressive Image Generation
Paper • 2507.01957 • Published • 21 -
LongVie: Multimodal-Guided Controllable Ultra-Long Video Generation
Paper • 2508.03694 • Published • 50 -
Building and better understanding vision-language models: insights and future directions
Paper • 2408.12637 • Published • 133
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 14 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
-
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 190 -
COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training
Paper • 2401.00849 • Published • 17 -
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper • 2311.05437 • Published • 51 -
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing
Paper • 2311.00571 • Published • 43
-
Self-Forcing++: Towards Minute-Scale High-Quality Video Generation
Paper • 2510.02283 • Published • 95 -
Paper2Video: Automatic Video Generation from Scientific Papers
Paper • 2510.05096 • Published • 116 -
LongLive: Real-time Interactive Long Video Generation
Paper • 2509.22622 • Published • 184 -
HuMo: Human-Centric Video Generation via Collaborative Multi-Modal Conditioning
Paper • 2509.08519 • Published • 128
-
LongVie: Multimodal-Guided Controllable Ultra-Long Video Generation
Paper • 2508.03694 • Published • 50 -
On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification
Paper • 2508.05629 • Published • 180 -
Improving Video Generation with Human Feedback
Paper • 2501.13918 • Published • 52 -
Unified Reward Model for Multimodal Understanding and Generation
Paper • 2503.05236 • Published • 122
-
openai/gpt-oss-120b
Text Generation • 120B • Updated • 4.48M • • 4.22k -
LongVie: Multimodal-Guided Controllable Ultra-Long Video Generation
Paper • 2508.03694 • Published • 50 -
Tool-integrated Reinforcement Learning for Repo Deep Search
Paper • 2508.03012 • Published • 20 -
QuantTrio/DeepSeek-V3.2-Exp-AWQ
Text Generation • Updated • 4.66k • 4
-
Locality-aware Parallel Decoding for Efficient Autoregressive Image Generation
Paper • 2507.01957 • Published • 21 -
LongVie: Multimodal-Guided Controllable Ultra-Long Video Generation
Paper • 2508.03694 • Published • 50 -
Building and better understanding vision-language models: insights and future directions
Paper • 2408.12637 • Published • 133
-
Animate-X: Universal Character Image Animation with Enhanced Motion Representation
Paper • 2410.10306 • Published • 56 -
ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning
Paper • 2411.05003 • Published • 71 -
TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation
Paper • 2411.04709 • Published • 26 -
IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation
Paper • 2410.07171 • Published • 43
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 14 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
-
WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens
Paper • 2401.09985 • Published • 18 -
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects
Paper • 2401.09962 • Published • 9 -
Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution
Paper • 2401.10404 • Published • 10 -
ActAnywhere: Subject-Aware Video Background Generation
Paper • 2401.10822 • Published • 13
-
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 190 -
COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training
Paper • 2401.00849 • Published • 17 -
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper • 2311.05437 • Published • 51 -
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing
Paper • 2311.00571 • Published • 43