@@ -20,10 +20,20 @@ Performance testing tool for llama.cpp.
20
20
## Syntax
21
21
22
22
```
23
- usage: ./ llama-bench [options]
23
+ usage: llama-bench [options]
24
24
25
25
options:
26
26
-h, --help
27
+ --numa <distribute|isolate|numactl> numa mode (default: disabled)
28
+ -r, --repetitions <n> number of times to repeat each test (default: 5)
29
+ --prio <0|1|2|3> process/thread priority (default: 0)
30
+ --delay <0...N> (seconds) delay between each test (default: 0)
31
+ -o, --output <csv|json|jsonl|md|sql> output format printed to stdout (default: md)
32
+ -oe, --output-err <csv|json|jsonl|md|sql> output format printed to stderr (default: none)
33
+ -v, --verbose verbose output
34
+ --progress print test progress indicators
35
+
36
+ test parameters:
27
37
-m, --model <filename> (default: models/7B/ggml-model-q4_0.gguf)
28
38
-p, --n-prompt <n> (default: 512)
29
39
-n, --n-gen <n> (default: 128)
@@ -33,7 +43,7 @@ options:
33
43
-ub, --ubatch-size <n> (default: 512)
34
44
-ctk, --cache-type-k <t> (default: f16)
35
45
-ctv, --cache-type-v <t> (default: f16)
36
- -t, --threads <n> (default: 8 )
46
+ -t, --threads <n> (default: 16 )
37
47
-C, --cpu-mask <hex,hex> (default: 0x0)
38
48
--cpu-strict <0|1> (default: 0)
39
49
--poll <0...100> (default: 50)
@@ -44,17 +54,15 @@ options:
44
54
-nkvo, --no-kv-offload <0|1> (default: 0)
45
55
-fa, --flash-attn <0|1> (default: 0)
46
56
-mmp, --mmap <0|1> (default: 1)
47
- --numa <distribute|isolate|numactl> (default: disabled)
48
57
-embd, --embeddings <0|1> (default: 0)
49
58
-ts, --tensor-split <ts0/ts1/..> (default: 0)
50
- -r, --repetitions <n> (default: 5)
51
- --prio <0|1|2|3> (default: 0)
52
- --delay <0...N> (seconds) (default: 0)
53
- -o, --output <csv|json|jsonl|md|sql> (default: md)
54
- -oe, --output-err <csv|json|jsonl|md|sql> (default: none)
55
- -v, --verbose (default: 0)
56
-
57
- Multiple values can be given for each parameter by separating them with ',' or by specifying the parameter multiple times.
59
+ -ot --override-tensors <tensor name pattern>=<buffer type>;...
60
+ (default: disabled)
61
+ -nopo, --no-op-offload <0|1> (default: 0)
62
+
63
+ Multiple values can be given for each parameter by separating them with ','
64
+ or by specifying the parameter multiple times. Ranges can be given as
65
+ 'start-end' or 'start-end+step' or 'start-end*mult'.
58
66
```
59
67
60
68
llama-bench can perform three types of tests:
0 commit comments