Skip to content

Commit 22cdab3

Browse files
authored
llama-bench : accept ranges for integer parameters (#13410)
1 parent a71a407 commit 22cdab3

File tree

2 files changed

+404
-345
lines changed

2 files changed

+404
-345
lines changed

tools/llama-bench/README.md

Lines changed: 19 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -20,10 +20,20 @@ Performance testing tool for llama.cpp.
2020
## Syntax
2121

2222
```
23-
usage: ./llama-bench [options]
23+
usage: llama-bench [options]
2424
2525
options:
2626
-h, --help
27+
--numa <distribute|isolate|numactl> numa mode (default: disabled)
28+
-r, --repetitions <n> number of times to repeat each test (default: 5)
29+
--prio <0|1|2|3> process/thread priority (default: 0)
30+
--delay <0...N> (seconds) delay between each test (default: 0)
31+
-o, --output <csv|json|jsonl|md|sql> output format printed to stdout (default: md)
32+
-oe, --output-err <csv|json|jsonl|md|sql> output format printed to stderr (default: none)
33+
-v, --verbose verbose output
34+
--progress print test progress indicators
35+
36+
test parameters:
2737
-m, --model <filename> (default: models/7B/ggml-model-q4_0.gguf)
2838
-p, --n-prompt <n> (default: 512)
2939
-n, --n-gen <n> (default: 128)
@@ -33,7 +43,7 @@ options:
3343
-ub, --ubatch-size <n> (default: 512)
3444
-ctk, --cache-type-k <t> (default: f16)
3545
-ctv, --cache-type-v <t> (default: f16)
36-
-t, --threads <n> (default: 8)
46+
-t, --threads <n> (default: 16)
3747
-C, --cpu-mask <hex,hex> (default: 0x0)
3848
--cpu-strict <0|1> (default: 0)
3949
--poll <0...100> (default: 50)
@@ -44,17 +54,15 @@ options:
4454
-nkvo, --no-kv-offload <0|1> (default: 0)
4555
-fa, --flash-attn <0|1> (default: 0)
4656
-mmp, --mmap <0|1> (default: 1)
47-
--numa <distribute|isolate|numactl> (default: disabled)
4857
-embd, --embeddings <0|1> (default: 0)
4958
-ts, --tensor-split <ts0/ts1/..> (default: 0)
50-
-r, --repetitions <n> (default: 5)
51-
--prio <0|1|2|3> (default: 0)
52-
--delay <0...N> (seconds) (default: 0)
53-
-o, --output <csv|json|jsonl|md|sql> (default: md)
54-
-oe, --output-err <csv|json|jsonl|md|sql> (default: none)
55-
-v, --verbose (default: 0)
56-
57-
Multiple values can be given for each parameter by separating them with ',' or by specifying the parameter multiple times.
59+
-ot --override-tensors <tensor name pattern>=<buffer type>;...
60+
(default: disabled)
61+
-nopo, --no-op-offload <0|1> (default: 0)
62+
63+
Multiple values can be given for each parameter by separating them with ','
64+
or by specifying the parameter multiple times. Ranges can be given as
65+
'start-end' or 'start-end+step' or 'start-end*mult'.
5866
```
5967

6068
llama-bench can perform three types of tests:

0 commit comments

Comments
 (0)