Refactor llm perf backend handling #258

baptistecolle · 2024-09-10T13:34:59Z

This PR refactors the llm-perf leaderboard logic to be more extensible and allows adding more hardware benchmarks without code duplication:

Main changes:

Creation of a BenchmarkRunner, which allows new hardware to inherit from it and enables a global benchmark logic

class NewHardwareBenchmarkRunner(BenchmarkRunner):
  
    def is_benchmark_supported(self, weights_config: str, attn_implementation: str) -> bool:
        # Check if certain config is supported
        pass

    def get_benchmark_config(self, model: str, attn_implementation: str, weights_config: str) -> BenchmarkConfig:
       # Return an optimum-benchmark BenchmarkConfig to run
       pass

    def get_weights_configs(self, subset) -> Dict[str, Dict[str, Any]]:
      # Return different weights configs for quantization
      pass
      
    def get_attention_configs(self) -> List[str]:
        return ["eager", "sdpa", "flash_attention_2"]

Addition of a config file to list all the hardware and the different parameters. This prevents the current hardcoding and CUDA-specific logic in the update_llm_perf_leaderboard.py file

- machine: 1xA10
  hardware: cuda
  subsets:
    - unquantized
    - awq
    - bnb
    - gptq
  backends:
    - pytorch

…nd-handling

baptistecolle · 2024-09-23T07:15:35Z

I also added the new label system to the leaderboard, you can trigger it via the leaderboard label

IlyasMoutawwakil · 2024-09-23T07:23:15Z

.github/workflows/update_llm_perf_cpu_pytorch.yaml

+    if: ${{
+      (github.event_name == 'push') ||
+      (github.event_name == 'workflow_dispatch') ||
+      contains( github.event.pull_request.labels.*.name, 'leaderboard')}}


you can probably add more specifications here to be able to run specific benchmarks, like cuda/cpu
didn't try it, but you might also be able to add conditions on matrix arguments, like || contains( github.event.pull_request.labels.*.name, matrix.subset)}} to run specific subsets or specific machines

IlyasMoutawwakil · 2024-09-23T07:26:27Z

llm_perf/benchmark_runners/update_llm_perf_cuda_pytorch.py

+
+class CUDAPyTorchBenchmarkRunner(BenchmarkRunner):
+    def __init__(self):
+        super().__init__(backend="pytorch", hardware="cuda")


I would prefer device="cuda" to not confuse terminologies, optimum-benchmark already has a terminology and it's better to not make it ambiguous.

IlyasMoutawwakil · 2024-09-23T07:31:14Z

llm_perf/common/benchmark_runner.py

+from optimum_benchmark.logging_utils import setup_logging
+
+
+class BenchmarkRunner(ABC):


not a fan of using the keyword runner here, maybe an LLMPerfBenchmarkManager, no strong opinion on this tho

IlyasMoutawwakil · 2024-09-23T07:31:41Z

llm_perf/common/benchmark_runner.py

+        self.attention_configs = self._get_attention_configs()
+        self.weights_configs = self._get_weights_configs(self.subset)


this is very specific for pytorch backend (only backend with attention imp control), I don't see it holding once more backends are introduced

IlyasMoutawwakil · 2024-09-23T07:36:07Z

llm_perf/common/benchmark_runner.py

+    def run_benchmark(self, model: str, attn_implementation: str, weights_config: str):
+        benchmark_name = f"{weights_config}-{attn_implementation}"
+        subfolder = f"{benchmark_name}/{model.replace('/', '--')}"
+
+        if not self.is_benchmark_supported(weights_config, attn_implementation):
+            self.logger.info(f"Skipping benchmark {benchmark_name} with model {model} since it is not supported")
+            return


won't hold for other backends as their configuration will be in term of other optimizations to use

IlyasMoutawwakil

I like the hardware class which registers info about the hardware configs we deal with.
The BenchmarkRunner class seems to me that it'll make things convoluted a bit, with less readability. The benefit of the one file script design is that it showcases how simple and linear it is to conduct benchmarks (like a model fine-tuning recipe), similar to transformers examples. One might argue that we could group examples in a class called ExamplesRunner but for a user who wanna reproduce results by reading and running a simple script, it complicates things.
But if this is truly what will make it simpler for you to add more benchmarks to the leaderboard, I have no issue with it.
@regisss

baptistecolle · 2024-09-23T08:55:25Z

I like the hardware class which registers info about the hardware configs we deal with. The BenchmarkRunner class seems to me that it'll make things convoluted a bit, with less readability. The benefit of the one file script design is that it showcases how simple and linear it is to conduct benchmarks (like a model fine-tuning recipe), similar to transformers examples. One might argue that we could group examples in a class called ExamplesRunner but for a user who wanna reproduce results by reading and running a simple script, it complicates things. But if this is truly what will make it simpler for you to add more benchmarks to the leaderboard, I have no issue with it. @regisss

The main idea of abstracting some of the logic for the llm-perf leaderboard code is to prevent code duplication between all the different benchmarking codes. This should speed up development and also facilitate maintainability.

I agree it would be nice to run a simple script to get the results from the leaderboard. It is still possible to run the benchmarks after cloning the repo and doing python llm_perf/benchmarks_runners/update_llm_perf_cpu_pytorch.py. A later version of the llm-perf code could include a CLI like optimum-benchmark for the best user experience. The code could include a list of .yaml examples for the llm-perf to define a way to run the tests of the leaderboard on your own hardware e.g. llm_perf --config-dir examples/ --config-name cuda_pytorch. Lastly, the current scripts are only used for internal purposes (to update the leaderboard) so I think making the current setup a bit more complex is fine

IlyasMoutawwakil · 2024-09-23T09:06:53Z

yeh i see your point, it's just that I moved the llm-perf into a folder in optimum-benchmark based on the idea that it's just a couple of scripts that don't require a lot of maintenance and work as an example of usage of optimum-benchmark, but if the plan is to make it a more complex project then maybe it's time for it to have its own repo again, to ease its development for you. wdyt ?
old repo: https://github.com/IlyasMoutawwakil/llm-perf-backend/tree/00340abbdcd3ba96c07e2ac3c615f00cb6053c53

IlyasMoutawwakil · 2024-09-23T09:11:31Z

there's also the fact that its compute is scaling and I think it'd better if it's contained so that it wouldn't slow down or throttle the development CI of optimum-benchmark.

baptistecolle · 2024-09-23T09:30:19Z

yeh i see your point, it's just that I moved the llm-perf into a folder in optimum-benchmark based on the idea that it's just a couple of scripts that don't require a lot of maintenance and work as an example of usage of optimum-benchmark, but if the plan is to make it a more complex project then maybe it's time for it to have its own repo again, to ease its development for you. wdyt ? old repo: https://github.com/IlyasMoutawwakil/llm-perf-backend/tree/00340abbdcd3ba96c07e2ac3c615f00cb6053c53

Good question...
From a repo perceptive if the llm-perf leaderboard becomes bigger and more complex it add a lot of complexity inside optimum-benchmark which is not necessary. You can also view the leaderboard as a extension of optimum-benchmark as being the optimum-benchmark-leaderboard thus having both project in the same repo could make sense to keep them aligned.
What are the main reasons you stopped having a separate repo and merge it into this one? @IlyasMoutawwakil As the main maintainer of optimum-benchmark do you have a preference?

(I like working in this repo as i learn from your code and reviews quite a bit 😄 but i don't mind changing)

Also @regisss do you have an opinion on the matter?

IlyasMoutawwakil · 2024-09-23T09:54:25Z

What are the main reasons you stopped having a separate repo and merge it into this one?

I moved the llm-perf into a folder in optimum-benchmark based on the idea that it's just a couple of scripts that don't require a lot of maintenance and work as an example of usage of optimum-benchmark

it was for the reason above, and for faster development as optimum-benchmark was changing at a higher rate (before pypi release and transformers adoption).
I also wanted to have access to the runners in the optimum-benchmark repo which were not accessible in my personal profile.

You can also view the leaderboard as a extension of optimum-benchmark as being the optimum-benchmark-leaderboard thus having both project in the same repo could make sense to keep them aligned.

It's mostly using the cli/api interface which we're keeping consistent across releases, so I don't see any alignment issues.

@IlyasMoutawwakil As the main maintainer of optimum-benchmark do you have a preference?

I think a separation of package and application makes more sense, like lighteval and llm-leaderboard-backend

Ubuntu and others added 30 commits August 21, 2024 13:17

add intel pytorch ort and openvino to leaderboard

92a5cea

add intel pytorch ort and openvino to leaderboard

0168063

Add support for intel in leaderboard

0bc416f

Update update_llm_perf_intel_pytorch.yml

85f62e6

Update update_llm_perf_intel_pytorch.yml

7151e01

Merge branch 'add-intel-hardware-to-leaderboard' into intel-leaderboard

4afc529

add new llm_perf_tests

c92f818

fix workflow

c31e6cf

fix failing tests

d406440

fix failing tests

20b96b2

fix failing tests

c7e0ec0

fix failing tests

6d7bf69

refractoring

7048df5

intel with multiple backends

db88b2a

parallelize intel llm-perf

1246d28

parallelize intel llm-perf

2d6830e

parallelize intel llm-perf

801c5bf

parallelize intel llm-perf

2e9526c

parallelize intel llm-perf

6d87d31

parallelize intel llm-perf

62266a6

parallelize intel llm-perf

0a39667

parallelize intel llm-perf

caf7b67

parallelize intel llm-perf

5890457

update leaderboard collection to support more hardware

50bd1a2

update leaderboard collection to support more hardware

f93cc7c

update leaderboard collection to support more hardware

2fad593

update leaderboard collection to support more hardware

6f2885c

update leaderboard collection to support more hardware

a59e554

update leaderboard collection to support more hardware

a193748

update leaderboard collection to support more hardware

31f1ff6

baptistecolle added 3 commits September 17, 2024 14:40

Merge branch 'fix-broken-canonical-list' into refactor-llm-perf-backe…

842645e

…nd-handling

Merge branch 'main' into refactor-llm-perf-backend-handling

d0804a7

merge main

f3bc069

baptistecolle marked this pull request as ready for review September 20, 2024 12:03

baptistecolle marked this pull request as draft September 23, 2024 06:51

baptistecolle added 3 commits September 23, 2024 06:55

merge main into branch

602a9d0

merge main into branch

b2d5f12

merge main into branch

08f70e2

baptistecolle marked this pull request as ready for review September 23, 2024 06:57

merge main into branch

ab1710a

baptistecolle marked this pull request as draft September 23, 2024 07:01

baptistecolle added 2 commits September 23, 2024 07:09

add new label system

2512827

add new label system

defc78a

baptistecolle marked this pull request as ready for review September 23, 2024 07:10

baptistecolle added the leaderboard [CI] Requires and enables running all llm-perf leaderboard workflows label Sep 23, 2024

baptistecolle requested a review from IlyasMoutawwakil September 23, 2024 07:17

IlyasMoutawwakil reviewed Sep 23, 2024

View reviewed changes

IlyasMoutawwakil requested changes Sep 23, 2024

View reviewed changes

baptistecolle added 2 commits September 23, 2024 09:34

add new chnages from review

89b6a97

add new chnages from review

3130c87

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor llm perf backend handling #258

Refactor llm perf backend handling #258

baptistecolle commented Sep 10, 2024 •

edited

Loading

baptistecolle commented Sep 23, 2024

IlyasMoutawwakil Sep 23, 2024

IlyasMoutawwakil Sep 23, 2024

IlyasMoutawwakil Sep 23, 2024

IlyasMoutawwakil Sep 23, 2024 •

edited

Loading

IlyasMoutawwakil Sep 23, 2024

IlyasMoutawwakil left a comment •

edited

Loading

baptistecolle commented Sep 23, 2024

IlyasMoutawwakil commented Sep 23, 2024 •

edited

Loading

IlyasMoutawwakil commented Sep 23, 2024 •

edited

Loading

baptistecolle commented Sep 23, 2024 •

edited

Loading

IlyasMoutawwakil commented Sep 23, 2024

		from optimum_benchmark.logging_utils import setup_logging


		class BenchmarkRunner(ABC):

		self.attention_configs = self._get_attention_configs()
		self.weights_configs = self._get_weights_configs(self.subset)

Refactor llm perf backend handling #258

Are you sure you want to change the base?

Refactor llm perf backend handling #258

Conversation

baptistecolle commented Sep 10, 2024 • edited Loading

baptistecolle commented Sep 23, 2024

IlyasMoutawwakil Sep 23, 2024

Choose a reason for hiding this comment

IlyasMoutawwakil Sep 23, 2024

Choose a reason for hiding this comment

IlyasMoutawwakil Sep 23, 2024

Choose a reason for hiding this comment

IlyasMoutawwakil Sep 23, 2024 • edited Loading

Choose a reason for hiding this comment

IlyasMoutawwakil Sep 23, 2024

Choose a reason for hiding this comment

IlyasMoutawwakil left a comment • edited Loading

Choose a reason for hiding this comment

baptistecolle commented Sep 23, 2024

IlyasMoutawwakil commented Sep 23, 2024 • edited Loading

IlyasMoutawwakil commented Sep 23, 2024 • edited Loading

baptistecolle commented Sep 23, 2024 • edited Loading

IlyasMoutawwakil commented Sep 23, 2024

baptistecolle commented Sep 10, 2024 •

edited

Loading

IlyasMoutawwakil Sep 23, 2024 •

edited

Loading

IlyasMoutawwakil left a comment •

edited

Loading

IlyasMoutawwakil commented Sep 23, 2024 •

edited

Loading

IlyasMoutawwakil commented Sep 23, 2024 •

edited

Loading

baptistecolle commented Sep 23, 2024 •

edited

Loading