Collections
Discover the best community collections!
Collections including paper arxiv:2405.02246
-
DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning
Paper • 2503.15265 • Published • 46 -
What matters when building vision-language models?
Paper • 2405.02246 • Published • 103 -
SANA-Sprint: One-Step Diffusion with Continuous-Time Consistency Distillation
Paper • 2503.09641 • Published • 41
-
Exploring the Potential of Encoder-free Architectures in 3D LMMs
Paper • 2502.09620 • Published • 26 -
The Evolution of Multimodal Model Architectures
Paper • 2405.17927 • Published • 1 -
What matters when building vision-language models?
Paper • 2405.02246 • Published • 103 -
Efficient Architectures for High Resolution Vision-Language Models
Paper • 2501.02584 • Published
-
Chain-of-Thought Reasoning Without Prompting
Paper • 2402.10200 • Published • 109 -
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing
Paper • 2404.12253 • Published • 55 -
Make Your LLM Fully Utilize the Context
Paper • 2404.16811 • Published • 55 -
ReFT: Representation Finetuning for Language Models
Paper • 2404.03592 • Published • 101
-
PaliGemma: A versatile 3B VLM for transfer
Paper • 2407.07726 • Published • 72 -
Vision language models are blind
Paper • 2407.06581 • Published • 84 -
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning
Paper • 2404.16994 • Published • 36 -
DeepSeek-VL: Towards Real-World Vision-Language Understanding
Paper • 2403.05525 • Published • 48
-
The Evolution of Multimodal Model Architectures
Paper • 2405.17927 • Published • 1 -
What matters when building vision-language models?
Paper • 2405.02246 • Published • 103 -
Efficient Architectures for High Resolution Vision-Language Models
Paper • 2501.02584 • Published -
Building and better understanding vision-language models: insights and future directions
Paper • 2408.12637 • Published • 133
-
What matters when building vision-language models?
Paper • 2405.02246 • Published • 103 -
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 90 -
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Paper • 2407.03320 • Published • 95 -
Building and better understanding vision-language models: insights and future directions
Paper • 2408.12637 • Published • 133
-
DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning
Paper • 2503.15265 • Published • 46 -
What matters when building vision-language models?
Paper • 2405.02246 • Published • 103 -
SANA-Sprint: One-Step Diffusion with Continuous-Time Consistency Distillation
Paper • 2503.09641 • Published • 41
-
The Evolution of Multimodal Model Architectures
Paper • 2405.17927 • Published • 1 -
What matters when building vision-language models?
Paper • 2405.02246 • Published • 103 -
Efficient Architectures for High Resolution Vision-Language Models
Paper • 2501.02584 • Published -
Building and better understanding vision-language models: insights and future directions
Paper • 2408.12637 • Published • 133
-
Exploring the Potential of Encoder-free Architectures in 3D LMMs
Paper • 2502.09620 • Published • 26 -
The Evolution of Multimodal Model Architectures
Paper • 2405.17927 • Published • 1 -
What matters when building vision-language models?
Paper • 2405.02246 • Published • 103 -
Efficient Architectures for High Resolution Vision-Language Models
Paper • 2501.02584 • Published
-
Chain-of-Thought Reasoning Without Prompting
Paper • 2402.10200 • Published • 109 -
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing
Paper • 2404.12253 • Published • 55 -
Make Your LLM Fully Utilize the Context
Paper • 2404.16811 • Published • 55 -
ReFT: Representation Finetuning for Language Models
Paper • 2404.03592 • Published • 101
-
PaliGemma: A versatile 3B VLM for transfer
Paper • 2407.07726 • Published • 72 -
Vision language models are blind
Paper • 2407.06581 • Published • 84 -
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning
Paper • 2404.16994 • Published • 36 -
DeepSeek-VL: Towards Real-World Vision-Language Understanding
Paper • 2403.05525 • Published • 48
-
What matters when building vision-language models?
Paper • 2405.02246 • Published • 103 -
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 90 -
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Paper • 2407.03320 • Published • 95 -
Building and better understanding vision-language models: insights and future directions
Paper • 2408.12637 • Published • 133