ubergarm commited on
Commit
32dca8d
·
1 Parent(s): 6893d74

add usage tips

Browse files
Files changed (1) hide show
  1. README.md +7 -4
README.md CHANGED
@@ -255,11 +255,14 @@ custom=$(
255
  --threads 8 \
256
  --host 127.0.0.1 \
257
  --port 8080 \
258
- --no-mmap
259
-
260
- # for full offload onto GPU just add -ngl 99 and set threads to 1
261
  ```
262
- If you have a properly fixed chat template, you can use it like this `--jinja --chat-template-file ./myFixedTemplate.jinja`.
 
 
 
 
263
 
264
  ## References
265
  * [ik_llama.cpp PR#995](https://github.com/ikawrakow/ik_llama.cpp/pull/995)
 
255
  --threads 8 \
256
  --host 127.0.0.1 \
257
  --port 8080 \
258
+ --no-mmap \
259
+ --jinja
 
260
  ```
261
+
262
+ Tips:
263
+ * for full offload onto GPU just add `-ngl 99` and use one thread with `--threads 1`
264
+ * to save space on kv-cache use `-ctk q8_0` which is all you need given this is MLA
265
+ * bring your own jinja chat template with `--jinja --chat-template-file ./myFixedTemplate.jinja`
266
 
267
  ## References
268
  * [ik_llama.cpp PR#995](https://github.com/ikawrakow/ik_llama.cpp/pull/995)