Canvas-to-Image: Compositional Image Generation with Multimodal Controls Paper β’ 2511.21691 β’ Published 11 days ago β’ 32
VFXMaster: Unlocking Dynamic Visual Effect Generation via In-Context Learning Paper β’ 2510.25772 β’ Published Oct 29 β’ 32
Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation Paper β’ 2510.08673 β’ Published Oct 9 β’ 125
EditVerse: Unifying Image and Video Editing and Generation with In-Context Learning Paper β’ 2509.20360 β’ Published Sep 24 β’ 17
Ultra3D: Efficient and High-Fidelity 3D Generation with Part Attention Paper β’ 2507.17745 β’ Published Jul 23 β’ 35
Pixels, Patterns, but No Poetry: To See The World like Humans Paper β’ 2507.16863 β’ Published Jul 21 β’ 68
EasyText: Controllable Diffusion Transformer for Multilingual Text Rendering Paper β’ 2505.24417 β’ Published May 30 β’ 13
Alchemist: Turning Public Text-to-Image Data into Generative Gold Paper β’ 2505.19297 β’ Published May 25 β’ 84
TEMPURA: Temporal Event Masked Prediction and Understanding for Reasoning in Action Paper β’ 2505.01583 β’ Published May 2 β’ 8
YoChameleon: Personalized Vision and Language Generation Paper β’ 2504.20998 β’ Published Apr 29 β’ 12
VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model Paper β’ 2504.07615 β’ Published Apr 10 β’ 35
Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model Paper β’ 2504.08685 β’ Published Apr 11 β’ 130
Less-to-More Generalization: Unlocking More Controllability by In-Context Generation Paper β’ 2504.02160 β’ Published Apr 2 β’ 37