Add llama.cpp backend #231

baptistecolle · 2024-07-19T11:05:05Z

Add llama.cpp as Backend for Optimum Benchmark

Overview

This PR introduces llama.cpp as a backend for the Optimum benchmark (see issue #117).

Changes

Added an example in the examples folder demonstrating how to run the llama.cpp backend:

defaults:
  - benchmark
  - scenario: inference
  - launcher: inline
  - backend: llama_cpp
  - _base_
  - _self_

name: llama_cpp_llama

backend:
  device: mps
  model: TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF
  task: text-generation
  filename: tinyllama-1.1b-chat-v1.0.Q4_0.gguf

scenario:
  input_shapes:
    batch_size: 1
    sequence_length: 256
    vocab_size: 32000
  generate_kwargs:
    max_new_tokens: 100
    min_new_tokens: 100

Current limitations:

Benchmarking is limited to a batch size of 1 due to limitations of the llama-cpp-python binding.
MPS process isolation is not working (see issue Process isolation is not working on MPS device #230).
No weights mode has not been implemented

Performance

The metrics were tested by comparing the results of the benchmark with a PyTorch backend and a llama.cpp backend. The performance results are close, and I can provide the full .json files if needed. Due to their low readability, they are not included here directly.

CLI output: (tested on M3 Pro CPU)

Performance with llama.cpp backend

Performance with the pytorch backend

The performance difference might be due to the significant amount of copying between devices with PyTorch, as shown below:

Furthermore, llama.cpp is optimized for Mac, which could explain the higher performance. Let me know if you want me to investigate the performance difference further

added llama.cpp backend

IlyasMoutawwakil

Thanks a lot ! Very awesome work !
Only missing test configs and GitHub workflows 🤗
Hopefully I fix the process launcher with mps by the time the mps workflows start running

optimum_benchmark/backends/base.py

optimum_benchmark/backends/llama_cpp/backend.py

optimum_benchmark/backends/llama_cpp/config.py

optimum_benchmark/import_utils.py

optimum_benchmark/task_utils.py

optimum_benchmark/backends/llama_cpp/backend.py

regisss

Very clean PR!
I left a couple comments.

Additionally, since benchmarking for llama.cpp is limited to a batch size of 1, I think it may be good to add a comment right above the batch size field in the 3 example configs. Unless an error/warning is raised somewhere in the code but I didn't see it.

optimum_benchmark/backends/llama_cpp/config.py

optimum_benchmark/backends/llama_cpp/backend.py

baptistecolle · 2024-07-25T06:30:58Z

Very clean PR! I left a couple comments.

Additionally, since benchmarking for llama.cpp is limited to a batch size of 1, I think it may be good to add a comment right above the batch size field in the 3 example configs. Unless an error/warning is raised somewhere in the code but I didn't see it.

It is done here
https://github.com/huggingface/optimum-benchmark/pull/231/files#r1690885402

baptistecolle · 2024-07-25T06:42:34Z

Thanks for the review. I implemented the necessary changes

I added:

fix different formatting issue
structure of the code, absolute import, forgotten debug print statement...)
added support for embedding models with llama.cpp
added tests and github workflow for llama.cpp backend

(also two of the runners are currently offline, so i am unable to run the CI on them)

API ROCm Tests / build_image_and_run_api_rocm_tests (pull_request)
CLI ROCm Pytorch Single-GPU Tests / run_cli_rocm_pytorch_single_gpu_tests (pull_request)

Let me know if you have more remarks

optimum_benchmark/task_utils.py

.github/workflows/test_cli_llama_cpp.yaml

tests/configs/_llama_cpp_.yaml

optimum_benchmark/backends/base.py

examples/llama_tiny_mps.yaml

IlyasMoutawwakil · 2024-07-25T12:05:11Z

one or two examples are enough, there's a lot of repetition there

baptistecolle · 2024-07-25T12:57:15Z

Indeed, I created multiple config during development and forgot to remove it from the pr. I fixed it now

IlyasMoutawwakil · 2024-07-30T06:59:10Z

Great PR @baptistecolle 🤗

llama.cpp backend

9441985

added llama.cpp backend

baptistecolle requested review from regisss and IlyasMoutawwakil July 22, 2024 12:59

baptistecolle marked this pull request as ready for review July 22, 2024 13:00

baptistecolle changed the title ~~WIP: Add llama.cpp backend~~ Add llama.cpp backend Jul 22, 2024

baptistecolle changed the title ~~Add llama.cpp backend~~ WIP: Add llama.cpp backend Jul 22, 2024

Merge branch 'main' into llama_cpp

b2e14aa

baptistecolle changed the title ~~WIP: Add llama.cpp backend~~ Add llama.cpp backend Jul 22, 2024

fix style

b8465ee

IlyasMoutawwakil requested changes Jul 23, 2024

View reviewed changes

IlyasMoutawwakil reviewed Jul 23, 2024

View reviewed changes

optimum_benchmark/backends/llama_cpp/backend.py Outdated Show resolved Hide resolved

baptistecolle added 2 commits July 23, 2024 14:18

fix different comments for pr review

60661dc

remove output files

eda1839

baptistecolle changed the title ~~Add llama.cpp backend~~ WIP: Add llama.cpp backend Jul 23, 2024

baptistecolle added 5 commits July 23, 2024 20:31

fixing github workflow

fd45365

fix git workflow

193bd7f

fix git workflow

2127906

added embedding model

6179aab

fix style

305fd1c

regisss reviewed Jul 24, 2024

View reviewed changes

optimum_benchmark/backends/llama_cpp/config.py Outdated Show resolved Hide resolved

optimum_benchmark/backends/llama_cpp/backend.py Outdated Show resolved Hide resolved

add regis requested changes

dbe1253

baptistecolle commented Jul 25, 2024

View reviewed changes

optimum_benchmark/backends/llama_cpp/backend.py Show resolved Hide resolved

remove task validation not needed

d010916

IlyasMoutawwakil reviewed Jul 25, 2024

View reviewed changes

optimum_benchmark/task_utils.py Outdated Show resolved Hide resolved

IlyasMoutawwakil reviewed Jul 25, 2024

View reviewed changes

.github/workflows/test_cli_llama_cpp.yaml Show resolved Hide resolved

baptistecolle added 2 commits July 25, 2024 12:24

rename llama.cpp test as cpu

0c45ae6

make the llama.cpp backend select its library directly

8bdc9b0

baptistecolle requested a review from IlyasMoutawwakil July 25, 2024 10:51

baptistecolle changed the title ~~WIP: Add llama.cpp backend~~ Add llama.cpp backend Jul 25, 2024

IlyasMoutawwakil reviewed Jul 25, 2024

View reviewed changes

tests/configs/_llama_cpp_.yaml Outdated Show resolved Hide resolved

IlyasMoutawwakil reviewed Jul 25, 2024

View reviewed changes

optimum_benchmark/backends/base.py Outdated Show resolved Hide resolved

IlyasMoutawwakil reviewed Jul 25, 2024

View reviewed changes

examples/llama_tiny_mps.yaml Outdated Show resolved Hide resolved

remove useless examples

16c57b5

baptistecolle requested a review from IlyasMoutawwakil July 25, 2024 13:00

IlyasMoutawwakil approved these changes Jul 30, 2024

View reviewed changes

IlyasMoutawwakil merged commit 0aac010 into main Jul 30, 2024
25 of 27 checks passed

IlyasMoutawwakil mentioned this pull request Jul 30, 2024

What other library that optimum-benchmark support other than transformer #117

Closed

baptistecolle deleted the llama_cpp branch July 31, 2024 10:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add llama.cpp backend #231

Add llama.cpp backend #231

baptistecolle commented Jul 19, 2024 •

edited

Loading

IlyasMoutawwakil left a comment

regisss left a comment

baptistecolle commented Jul 25, 2024 •

edited

Loading

baptistecolle commented Jul 25, 2024 •

edited

Loading

IlyasMoutawwakil commented Jul 25, 2024

baptistecolle commented Jul 25, 2024

IlyasMoutawwakil commented Jul 30, 2024

Add llama.cpp backend #231

Add llama.cpp backend #231

Conversation

baptistecolle commented Jul 19, 2024 • edited Loading

Add llama.cpp as Backend for Optimum Benchmark

Overview

Changes

Current limitations:

Performance

CLI output: (tested on M3 Pro CPU)

IlyasMoutawwakil left a comment

Choose a reason for hiding this comment

regisss left a comment

Choose a reason for hiding this comment

baptistecolle commented Jul 25, 2024 • edited Loading

baptistecolle commented Jul 25, 2024 • edited Loading

IlyasMoutawwakil commented Jul 25, 2024

baptistecolle commented Jul 25, 2024

IlyasMoutawwakil commented Jul 30, 2024

baptistecolle commented Jul 19, 2024 •

edited

Loading

baptistecolle commented Jul 25, 2024 •

edited

Loading

baptistecolle commented Jul 25, 2024 •

edited

Loading