Spaces:

wjbmattingly
/

NuMarkdown-8B-Thinking-Demo

Running on Zero

William Mattingly commited on Aug 8

Commit

e2c034d

1 Parent(s): 1cb798b

trying to fix flash attention

Files changed (2) hide show

app.py CHANGED Viewed

@@ -17,6 +17,7 @@ processor = AutoProcessor.from_pretrained(
 model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
     model_id,
     torch_dtype=torch.bfloat16,
     device_map="auto",
     trust_remote_code=True,
 )

 model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
     model_id,
     torch_dtype=torch.bfloat16,
+    attn_implementation="flash_attention_2",
     device_map="auto",
     trust_remote_code=True,
 )

requirements.txt CHANGED Viewed

@@ -6,4 +6,5 @@ accelerate
 pillow
 safetensors
 huggingface-hub
-pydantic==2.10.6

 pillow
 safetensors
 huggingface-hub
+pydantic==2.10.6
+https://github.com/mjun0812/flash-attention-prebuild-wheels/releases/download/v0.0.8/flash_attn-2.7.4.post1+cu126torch2.7-cp310-cp310-linux_x86_64.whl