Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2404.06773

InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD

Paper • 2404.06512 • Published Apr 9, 2024 • 30
Adapting LLaMA Decoder to Vision Transformer

Paper • 2404.06773 • Published Apr 10, 2024 • 18
Quantized Visual Geometry Grounded Transformer

Paper • 2509.21302 • Published Sep 25 • 8
Hyperspherical Latents Improve Continuous-Token Autoregressive Generation

Paper • 2509.24335 • Published Sep 29 • 8

Papers - Training - Detailed Appendices

Adapting LLaMA Decoder to Vision Transformer

Paper • 2404.06773 • Published Apr 10, 2024 • 18

Papers - Training - Image - Causal Self Attention

Adapting LLaMA Decoder to Vision Transformer

Paper • 2404.06773 • Published Apr 10, 2024 • 18

Papers - Image - Decoders

Adapting LLaMA Decoder to Vision Transformer

Paper • 2404.06773 • Published Apr 10, 2024 • 18

Papers - University - Hong Kong University of Science and Te

Event Camera Demosaicing via Swin Transformer and Pixel-focus Loss

Paper • 2404.02731 • Published Apr 3, 2024 • 1
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models

Paper • 2309.12284 • Published Sep 21, 2023 • 18
RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis

Paper • 2404.03204 • Published Apr 4, 2024 • 10
Adapting LLaMA Decoder to Vision Transformer

Paper • 2404.06773 • Published Apr 10, 2024 • 18

LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders

Paper • 2404.05961 • Published Apr 9, 2024 • 66
OmniFusion Technical Report

Paper • 2404.06212 • Published Apr 9, 2024 • 77
Adapting LLaMA Decoder to Vision Transformer

Paper • 2404.06773 • Published Apr 10, 2024 • 18
BRAVE: Broadening the visual encoding of vision-language models

Paper • 2404.07204 • Published Apr 10, 2024 • 19

Papers - Image - Training - AS2D RoPE and SwiGLU

Adapting LLaMA Decoder to Vision Transformer

Paper • 2404.06773 • Published Apr 10, 2024 • 18

Papers - Image - Decoders - ViT

Adapting LLaMA Decoder to Vision Transformer

Paper • 2404.06773 • Published Apr 10, 2024 • 18
No More Adam: Learning Rate Scaling at Initialization is All You Need

Paper • 2412.11768 • Published Dec 16, 2024 • 43

Papers - Image - DeiT

Realism in Action: Anomaly-Aware Diagnosis of Brain Tumors from Medical Images Using YOLOv8 and DeiT

Paper • 2401.03302 • Published Jan 6, 2024 • 1
MLP Can Be A Good Transformer Learner

Paper • 2404.05657 • Published Apr 8, 2024 • 1
Detecting and recognizing characters in Greek papyri with YOLOv8, DeiT and SimCLR

Paper • 2401.12513 • Published Jan 23, 2024 • 1
DeiT-LT Distillation Strikes Back for Vision Transformer Training on Long-Tailed Datasets

Paper • 2404.02900 • Published Apr 3, 2024 • 1

Papers - Shanghai AI Laboratory

CameraCtrl: Enabling Camera Control for Text-to-Video Generation

Paper • 2404.02101 • Published Apr 2, 2024 • 24
Adapting LLaMA Decoder to Vision Transformer

Paper • 2404.06773 • Published Apr 10, 2024 • 18
Interactive3D: Create What You Want by Interactive 3D Generation

Paper • 2404.16510 • Published Apr 25, 2024 • 21
Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B

Paper • 2406.07394 • Published Jun 11, 2024 • 29

InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD

Paper • 2404.06512 • Published Apr 9, 2024 • 30
Adapting LLaMA Decoder to Vision Transformer

Paper • 2404.06773 • Published Apr 10, 2024 • 18
Quantized Visual Geometry Grounded Transformer

Paper • 2509.21302 • Published Sep 25 • 8
Hyperspherical Latents Improve Continuous-Token Autoregressive Generation

Paper • 2509.24335 • Published Sep 29 • 8

LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders

Paper • 2404.05961 • Published Apr 9, 2024 • 66
OmniFusion Technical Report

Paper • 2404.06212 • Published Apr 9, 2024 • 77
Adapting LLaMA Decoder to Vision Transformer

Paper • 2404.06773 • Published Apr 10, 2024 • 18
BRAVE: Broadening the visual encoding of vision-language models

Paper • 2404.07204 • Published Apr 10, 2024 • 19

Papers - Training - Detailed Appendices

Adapting LLaMA Decoder to Vision Transformer

Paper • 2404.06773 • Published Apr 10, 2024 • 18

Papers - Image - Training - AS2D RoPE and SwiGLU

Adapting LLaMA Decoder to Vision Transformer

Paper • 2404.06773 • Published Apr 10, 2024 • 18

Papers - Training - Image - Causal Self Attention

Adapting LLaMA Decoder to Vision Transformer

Paper • 2404.06773 • Published Apr 10, 2024 • 18

Papers - Image - Decoders - ViT

Adapting LLaMA Decoder to Vision Transformer

Paper • 2404.06773 • Published Apr 10, 2024 • 18
No More Adam: Learning Rate Scaling at Initialization is All You Need

Paper • 2412.11768 • Published Dec 16, 2024 • 43

Papers - Image - Decoders

Adapting LLaMA Decoder to Vision Transformer

Paper • 2404.06773 • Published Apr 10, 2024 • 18

Papers - Image - DeiT

Realism in Action: Anomaly-Aware Diagnosis of Brain Tumors from Medical Images Using YOLOv8 and DeiT

Paper • 2401.03302 • Published Jan 6, 2024 • 1
MLP Can Be A Good Transformer Learner

Paper • 2404.05657 • Published Apr 8, 2024 • 1
Detecting and recognizing characters in Greek papyri with YOLOv8, DeiT and SimCLR

Paper • 2401.12513 • Published Jan 23, 2024 • 1
DeiT-LT Distillation Strikes Back for Vision Transformer Training on Long-Tailed Datasets

Paper • 2404.02900 • Published Apr 3, 2024 • 1

Papers - University - Hong Kong University of Science and Te

Event Camera Demosaicing via Swin Transformer and Pixel-focus Loss

Paper • 2404.02731 • Published Apr 3, 2024 • 1
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models

Paper • 2309.12284 • Published Sep 21, 2023 • 18
RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis

Paper • 2404.03204 • Published Apr 4, 2024 • 10
Adapting LLaMA Decoder to Vision Transformer

Paper • 2404.06773 • Published Apr 10, 2024 • 18

Papers - Shanghai AI Laboratory

CameraCtrl: Enabling Camera Control for Text-to-Video Generation

Paper • 2404.02101 • Published Apr 2, 2024 • 24
Adapting LLaMA Decoder to Vision Transformer

Paper • 2404.06773 • Published Apr 10, 2024 • 18
Interactive3D: Create What You Want by Interactive 3D Generation

Paper • 2404.16510 • Published Apr 25, 2024 • 21
Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B

Paper • 2406.07394 • Published Jun 11, 2024 • 29

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs