is this W8A16 or W8A8?

by ehartford - opened Apr 30

Discussion

ehartford

Qwen org Apr 30

W8A16 is compatible with Ampere with Marlin kernel

W8A8 is only compatible with Hopper.

Which is this?

jklj077

Qwen org Apr 30

The quantization scheme is compatible with finegrained_fp8 in Transformers.
You should be able to run it with W8A8.

ehartford

Qwen org Apr 30

Ok. Thank you
Ampere (A100) cant run it, I'll work on a W8A16

ehartford changed discussion status to closed Apr 30

zakcali

Sep 1

I can run FP8 models on 4x rtx 3090 cards.

uv run vllm serve Qwen/Qwen3-30B-A3B-Thinking-2507-FP8 --reasoning-parser deepseek_r1 --tensor-parallel-size 4 --enable-expert-parallel --async-scheduling
uv run vllm serve Qwen/Qwen3-30B-A3B-Instruct-2507-FP8 --reasoning-parser deepseek_r1 --tensor-parallel-size 4 --enable-expert-parallel --async-scheduling

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment