Collection of Mixed Precision LLaMA and Qwen Models
-
inference-optimization/Llama-3.1-8B-Instruct-Mixed-NVFP4-FP8_BLOCK-out_proj-all
5B • Updated • 15 -
inference-optimization/Llama-3.1-8B-Instruct-Mixed-NVFP4-FP8_BLOCK-qkv_proj-all
5B • Updated • 14 -
inference-optimization/Llama-3.1-8B-Instruct-Mixed-NVFP4-FP8_BLOCK-down_proj-all
6B • Updated • 15 -
inference-optimization/Llama-3.1-8B-Instruct-Mixed-NVFP4-FP8_BLOCK-gate_up_proj-all
7B • Updated • 16