-
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD
Paper • 2404.06512 • Published • 30 -
Adapting LLaMA Decoder to Vision Transformer
Paper • 2404.06773 • Published • 18 -
Quantized Visual Geometry Grounded Transformer
Paper • 2509.21302 • Published • 8 -
Hyperspherical Latents Improve Continuous-Token Autoregressive Generation
Paper • 2509.24335 • Published • 8
Collections
Discover the best community collections!
Collections including paper arxiv:2404.06773
-
Event Camera Demosaicing via Swin Transformer and Pixel-focus Loss
Paper • 2404.02731 • Published • 1 -
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models
Paper • 2309.12284 • Published • 18 -
RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis
Paper • 2404.03204 • Published • 10 -
Adapting LLaMA Decoder to Vision Transformer
Paper • 2404.06773 • Published • 18
-
LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders
Paper • 2404.05961 • Published • 66 -
OmniFusion Technical Report
Paper • 2404.06212 • Published • 77 -
Adapting LLaMA Decoder to Vision Transformer
Paper • 2404.06773 • Published • 18 -
BRAVE: Broadening the visual encoding of vision-language models
Paper • 2404.07204 • Published • 19
-
Realism in Action: Anomaly-Aware Diagnosis of Brain Tumors from Medical Images Using YOLOv8 and DeiT
Paper • 2401.03302 • Published • 1 -
MLP Can Be A Good Transformer Learner
Paper • 2404.05657 • Published • 1 -
Detecting and recognizing characters in Greek papyri with YOLOv8, DeiT and SimCLR
Paper • 2401.12513 • Published • 1 -
DeiT-LT Distillation Strikes Back for Vision Transformer Training on Long-Tailed Datasets
Paper • 2404.02900 • Published • 1
-
CameraCtrl: Enabling Camera Control for Text-to-Video Generation
Paper • 2404.02101 • Published • 24 -
Adapting LLaMA Decoder to Vision Transformer
Paper • 2404.06773 • Published • 18 -
Interactive3D: Create What You Want by Interactive 3D Generation
Paper • 2404.16510 • Published • 21 -
Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B
Paper • 2406.07394 • Published • 29
-
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD
Paper • 2404.06512 • Published • 30 -
Adapting LLaMA Decoder to Vision Transformer
Paper • 2404.06773 • Published • 18 -
Quantized Visual Geometry Grounded Transformer
Paper • 2509.21302 • Published • 8 -
Hyperspherical Latents Improve Continuous-Token Autoregressive Generation
Paper • 2509.24335 • Published • 8
-
LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders
Paper • 2404.05961 • Published • 66 -
OmniFusion Technical Report
Paper • 2404.06212 • Published • 77 -
Adapting LLaMA Decoder to Vision Transformer
Paper • 2404.06773 • Published • 18 -
BRAVE: Broadening the visual encoding of vision-language models
Paper • 2404.07204 • Published • 19
-
Realism in Action: Anomaly-Aware Diagnosis of Brain Tumors from Medical Images Using YOLOv8 and DeiT
Paper • 2401.03302 • Published • 1 -
MLP Can Be A Good Transformer Learner
Paper • 2404.05657 • Published • 1 -
Detecting and recognizing characters in Greek papyri with YOLOv8, DeiT and SimCLR
Paper • 2401.12513 • Published • 1 -
DeiT-LT Distillation Strikes Back for Vision Transformer Training on Long-Tailed Datasets
Paper • 2404.02900 • Published • 1
-
Event Camera Demosaicing via Swin Transformer and Pixel-focus Loss
Paper • 2404.02731 • Published • 1 -
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models
Paper • 2309.12284 • Published • 18 -
RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis
Paper • 2404.03204 • Published • 10 -
Adapting LLaMA Decoder to Vision Transformer
Paper • 2404.06773 • Published • 18
-
CameraCtrl: Enabling Camera Control for Text-to-Video Generation
Paper • 2404.02101 • Published • 24 -
Adapting LLaMA Decoder to Vision Transformer
Paper • 2404.06773 • Published • 18 -
Interactive3D: Create What You Want by Interactive 3D Generation
Paper • 2404.16510 • Published • 21 -
Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B
Paper • 2406.07394 • Published • 29