ubergarm
/

GigaChat3-10B-A1.8B-GGUF

Text Generation

Model card Files Files and versions

ubergarm commited on 20 days ago

Commit

32dca8d

·

1 Parent(s): 6893d74

add usage tips

Files changed (1) hide show

README.md +7 -4

README.md CHANGED Viewed

@@ -255,11 +255,14 @@ custom=$(
     --threads 8 \
     --host 127.0.0.1 \
     --port 8080 \
-    --no-mmap
-# for full offload onto GPU just add -ngl 99 and set threads to 1
 ```
-If you have a properly fixed chat template, you can use it like this `--jinja --chat-template-file ./myFixedTemplate.jinja`.
 ## References
 * [ik_llama.cpp PR#995](https://github.com/ikawrakow/ik_llama.cpp/pull/995)

     --threads 8 \
     --host 127.0.0.1 \
     --port 8080 \
+    --no-mmap \
+    --jinja
 ```
+Tips:
+* for full offload onto GPU just add `-ngl 99` and use one thread with `--threads 1`
+* to save space on kv-cache use `-ctk q8_0` which is all you need given this is MLA
+* bring your own jinja chat template with `--jinja --chat-template-file ./myFixedTemplate.jinja`
 ## References
 * [ik_llama.cpp PR#995](https://github.com/ikawrakow/ik_llama.cpp/pull/995)