add usage tips
Browse files
README.md
CHANGED
|
@@ -255,11 +255,14 @@ custom=$(
|
|
| 255 |
--threads 8 \
|
| 256 |
--host 127.0.0.1 \
|
| 257 |
--port 8080 \
|
| 258 |
-
--no-mmap
|
| 259 |
-
|
| 260 |
-
# for full offload onto GPU just add -ngl 99 and set threads to 1
|
| 261 |
```
|
| 262 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 263 |
|
| 264 |
## References
|
| 265 |
* [ik_llama.cpp PR#995](https://github.com/ikawrakow/ik_llama.cpp/pull/995)
|
|
|
|
| 255 |
--threads 8 \
|
| 256 |
--host 127.0.0.1 \
|
| 257 |
--port 8080 \
|
| 258 |
+
--no-mmap \
|
| 259 |
+
--jinja
|
|
|
|
| 260 |
```
|
| 261 |
+
|
| 262 |
+
Tips:
|
| 263 |
+
* for full offload onto GPU just add `-ngl 99` and use one thread with `--threads 1`
|
| 264 |
+
* to save space on kv-cache use `-ctk q8_0` which is all you need given this is MLA
|
| 265 |
+
* bring your own jinja chat template with `--jinja --chat-template-file ./myFixedTemplate.jinja`
|
| 266 |
|
| 267 |
## References
|
| 268 |
* [ik_llama.cpp PR#995](https://github.com/ikawrakow/ik_llama.cpp/pull/995)
|