Collections
Discover the best community collections!
Collections including paper arxiv:2501.05441
-
The GAN is dead; long live the GAN! A Modern GAN Baseline
Paper • 2501.05441 • Published • 95 -
Towards Stability of Autoregressive Neural Operators
Paper • 2306.10619 • Published -
A Neural Operator based on Dynamic Mode Decomposition
Paper • 2507.01117 • Published • 1 -
GNOT: A General Neural Operator Transformer for Operator Learning
Paper • 2302.14376 • Published • 1
-
GenEx: Generating an Explorable World
Paper • 2412.09624 • Published • 97 -
IamCreateAI/Ruyi-Mini-7B
Image-to-Video • Updated • 239 • 610 -
Track4Gen: Teaching Video Diffusion Models to Track Points Improves Video Generation
Paper • 2412.06016 • Published • 20 -
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper • 2412.09871 • Published • 108
-
The GAN is dead; long live the GAN! A Modern GAN Baseline
Paper • 2501.05441 • Published • 95 -
PLADIS: Pushing the Limits of Attention in Diffusion Models at Inference Time by Leveraging Sparsity
Paper • 2503.07677 • Published • 86 -
OmniPaint: Mastering Object-Oriented Editing via Disentangled Insertion-Removal Inpainting
Paper • 2503.08677 • Published • 29 -
Alias-Free Latent Diffusion Models:Improving Fractional Shift Equivariance of Diffusion Latent Space
Paper • 2503.09419 • Published • 6
-
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Paper • 2501.00958 • Published • 107 -
CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings
Paper • 2501.01257 • Published • 52 -
Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models
Paper • 2501.01423 • Published • 44 -
REDUCIO! Generating 1024times1024 Video within 16 Seconds using Extremely Compressed Motion Latents
Paper • 2411.13552 • Published
-
CompCap: Improving Multimodal Large Language Models with Composite Captions
Paper • 2412.05243 • Published • 20 -
LiFT: Leveraging Human Feedback for Text-to-Video Model Alignment
Paper • 2412.04814 • Published • 47 -
MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale
Paper • 2412.05237 • Published • 46 -
Exploring Multi-Grained Concept Annotations for Multimodal Large Language Models
Paper • 2412.05939 • Published • 16
-
The GAN is dead; long live the GAN! A Modern GAN Baseline
Paper • 2501.05441 • Published • 95 -
Towards Stability of Autoregressive Neural Operators
Paper • 2306.10619 • Published -
A Neural Operator based on Dynamic Mode Decomposition
Paper • 2507.01117 • Published • 1 -
GNOT: A General Neural Operator Transformer for Operator Learning
Paper • 2302.14376 • Published • 1
-
The GAN is dead; long live the GAN! A Modern GAN Baseline
Paper • 2501.05441 • Published • 95 -
PLADIS: Pushing the Limits of Attention in Diffusion Models at Inference Time by Leveraging Sparsity
Paper • 2503.07677 • Published • 86 -
OmniPaint: Mastering Object-Oriented Editing via Disentangled Insertion-Removal Inpainting
Paper • 2503.08677 • Published • 29 -
Alias-Free Latent Diffusion Models:Improving Fractional Shift Equivariance of Diffusion Latent Space
Paper • 2503.09419 • Published • 6
-
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Paper • 2501.00958 • Published • 107 -
CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings
Paper • 2501.01257 • Published • 52 -
Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models
Paper • 2501.01423 • Published • 44 -
REDUCIO! Generating 1024times1024 Video within 16 Seconds using Extremely Compressed Motion Latents
Paper • 2411.13552 • Published
-
GenEx: Generating an Explorable World
Paper • 2412.09624 • Published • 97 -
IamCreateAI/Ruyi-Mini-7B
Image-to-Video • Updated • 239 • 610 -
Track4Gen: Teaching Video Diffusion Models to Track Points Improves Video Generation
Paper • 2412.06016 • Published • 20 -
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper • 2412.09871 • Published • 108
-
CompCap: Improving Multimodal Large Language Models with Composite Captions
Paper • 2412.05243 • Published • 20 -
LiFT: Leveraging Human Feedback for Text-to-Video Model Alignment
Paper • 2412.04814 • Published • 47 -
MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale
Paper • 2412.05237 • Published • 46 -
Exploring Multi-Grained Concept Annotations for Multimodal Large Language Models
Paper • 2412.05939 • Published • 16