Skip to content

Eval bug: llama-bench seems to be broken #13169

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
electroglyph opened this issue Apr 29, 2025 · 4 comments · Fixed by #13183
Closed

Eval bug: llama-bench seems to be broken #13169

electroglyph opened this issue Apr 29, 2025 · 4 comments · Fixed by #13183

Comments

@electroglyph
Copy link

electroglyph commented Apr 29, 2025

Name and Version

version: 5215 (5f5e39e)
built with MSVC 19.43.34808.0 for x64

i've tested CPU, Vulkan, and SYCL, llama-bench either crashes and burns, or outputs the following 2 lines and then exits:

| model                          |       size |     params | backend    | ngl |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |

Operating systems

Windows

GGML backends

CPU

Hardware

Ryzen 7900X + Intel A770

Models

i tried several which are currently working with llama-server

Problem description & steps to reproduce

.\llama-bench.exe -m model

First Bad Commit

No response

Relevant log output

PS C:\Users\ANON\repos\AI_Grotto\llama.cpp\Windows\CPU\AVX512> .\llama-bench.exe -m C:\LLM\google-gemma-3-12b-it-qat-q4_0-gguf-small\gemma-3-12b-it-q4_0_s.gguf
| model                          |       size |     params | backend    | ngl |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
PS C:\Users\ANON\repos\AI_Grotto\llama.cpp\Windows\CPU\AVX512>
@gnusupport
Copy link

llama-bench -m Qwen3-0.6B-Q8_0.gguf
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes

model size params backend ngl test t/s
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
Aborted

@pt13762104
Copy link

./bin/llama-bench -ot ".ffn_.*_exps.=CPU" -m ~/Qwen_Qwen3-30B-A3B-Q4_K_L.gguf -ngl 99                                                                                                           
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce GTX 1660 Ti, compute capability 7.5, VMM: yes
| model                          |       size |     params | backend    | ngl | ot                    |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------------- | --------------: | -------------------: |
[1]    28437 segmentation fault (core dumped)  ./bin/llama-bench -ot ".ffn_.*_exps.=CPU" -m ~/Qwen_Qwen3-30B-A3B-Q4_K_L.gguf

@Alcpz
Copy link
Collaborator

Alcpz commented Apr 29, 2025

I've had problems as well since the merge of #13096. I found an inconsistency between the values and the fields which caused llama-bench to crash when printing the output of the tests. Opened a PR, not sure if that is the correct fix, but I hope it at least gets help to bring visibility to the issue

@CISC CISC linked a pull request Apr 29, 2025 that will close this issue
@pt13762104
Copy link

It worked:

| model                          |       size |     params | backend    | ngl | ot                    |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------------- | --------------: | -------------------: |
| qwen3moe 30B.A3B Q4_K - Medium |  16.49 GiB |    30.53 B | CUDA       |  99 | .ffn_.*_exps.=CPU     |           pp512 |        107.80 ± 6.18 |
| qwen3moe 30B.A3B Q4_K - Medium |  16.49 GiB |    30.53 B | CUDA       |  99 | .ffn_.*_exps.=CPU     |           tg128 |         15.05 ± 0.41 |

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants