Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor llm perf backend handling #258

Open
wants to merge 79 commits into
base: main
Choose a base branch
from

Conversation

baptistecolle
Copy link
Collaborator

@baptistecolle baptistecolle commented Sep 10, 2024

This PR refactors the llm-perf leaderboard logic to be more extensible and allows adding more hardware benchmarks without code duplication:

Main changes:

  1. Creation of a BenchmarkRunner, which allows new hardware to inherit from it and enables a global benchmark logic
class NewHardwareBenchmarkRunner(BenchmarkRunner):
  
    def is_benchmark_supported(self, weights_config: str, attn_implementation: str) -> bool:
        # Check if certain config is supported
        pass

    def get_benchmark_config(self, model: str, attn_implementation: str, weights_config: str) -> BenchmarkConfig:
       # Return an optimum-benchmark BenchmarkConfig to run
       pass

    def get_weights_configs(self, subset) -> Dict[str, Dict[str, Any]]:
      # Return different weights configs for quantization
      pass
      
    def get_attention_configs(self) -> List[str]:
        return ["eager", "sdpa", "flash_attention_2"]
  1. Addition of a config file to list all the hardware and the different parameters. This prevents the current hardcoding and CUDA-specific logic in the update_llm_perf_leaderboard.py file
- machine: 1xA10
  hardware: cuda
  subsets:
    - unquantized
    - awq
    - bnb
    - gptq
  backends:
    - pytorch

@baptistecolle baptistecolle marked this pull request as ready for review September 20, 2024 12:03
@baptistecolle baptistecolle marked this pull request as draft September 23, 2024 06:51
@baptistecolle baptistecolle marked this pull request as ready for review September 23, 2024 06:57
@baptistecolle baptistecolle marked this pull request as draft September 23, 2024 07:01
@baptistecolle baptistecolle marked this pull request as ready for review September 23, 2024 07:10
@baptistecolle baptistecolle added the leaderboard [CI] Requires and enables running all llm-perf leaderboard workflows label Sep 23, 2024
@baptistecolle
Copy link
Collaborator Author

I also added the new label system to the leaderboard, you can trigger it via the leaderboard label

Comment on lines +29 to +32
if: ${{
(github.event_name == 'push') ||
(github.event_name == 'workflow_dispatch') ||
contains( github.event.pull_request.labels.*.name, 'leaderboard')}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can probably add more specifications here to be able to run specific benchmarks, like cuda/cpu
didn't try it, but you might also be able to add conditions on matrix arguments, like || contains( github.event.pull_request.labels.*.name, matrix.subset)}} to run specific subsets or specific machines


class CUDAPyTorchBenchmarkRunner(BenchmarkRunner):
def __init__(self):
super().__init__(backend="pytorch", hardware="cuda")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer device="cuda" to not confuse terminologies, optimum-benchmark already has a terminology and it's better to not make it ambiguous.

from optimum_benchmark.logging_utils import setup_logging


class BenchmarkRunner(ABC):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a fan of using the keyword runner here, maybe an LLMPerfBenchmarkManager, no strong opinion on this tho

Comment on lines 36 to 37
self.attention_configs = self._get_attention_configs()
self.weights_configs = self._get_weights_configs(self.subset)
Copy link
Member

@IlyasMoutawwakil IlyasMoutawwakil Sep 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is very specific for pytorch backend (only backend with attention imp control), I don't see it holding once more backends are introduced

Comment on lines 83 to 89
def run_benchmark(self, model: str, attn_implementation: str, weights_config: str):
benchmark_name = f"{weights_config}-{attn_implementation}"
subfolder = f"{benchmark_name}/{model.replace('/', '--')}"

if not self.is_benchmark_supported(weights_config, attn_implementation):
self.logger.info(f"Skipping benchmark {benchmark_name} with model {model} since it is not supported")
return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

won't hold for other backends as their configuration will be in term of other optimizations to use

Copy link
Member

@IlyasMoutawwakil IlyasMoutawwakil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the hardware class which registers info about the hardware configs we deal with.
The BenchmarkRunner class seems to me that it'll make things convoluted a bit, with less readability. The benefit of the one file script design is that it showcases how simple and linear it is to conduct benchmarks (like a model fine-tuning recipe), similar to transformers examples. One might argue that we could group examples in a class called ExamplesRunner but for a user who wanna reproduce results by reading and running a simple script, it complicates things.
But if this is truly what will make it simpler for you to add more benchmarks to the leaderboard, I have no issue with it.
@regisss

@baptistecolle
Copy link
Collaborator Author

I like the hardware class which registers info about the hardware configs we deal with. The BenchmarkRunner class seems to me that it'll make things convoluted a bit, with less readability. The benefit of the one file script design is that it showcases how simple and linear it is to conduct benchmarks (like a model fine-tuning recipe), similar to transformers examples. One might argue that we could group examples in a class called ExamplesRunner but for a user who wanna reproduce results by reading and running a simple script, it complicates things. But if this is truly what will make it simpler for you to add more benchmarks to the leaderboard, I have no issue with it. @regisss

The main idea of abstracting some of the logic for the llm-perf leaderboard code is to prevent code duplication between all the different benchmarking codes. This should speed up development and also facilitate maintainability.

I agree it would be nice to run a simple script to get the results from the leaderboard. It is still possible to run the benchmarks after cloning the repo and doing python llm_perf/benchmarks_runners/update_llm_perf_cpu_pytorch.py. A later version of the llm-perf code could include a CLI like optimum-benchmark for the best user experience. The code could include a list of .yaml examples for the llm-perf to define a way to run the tests of the leaderboard on your own hardware e.g. llm_perf --config-dir examples/ --config-name cuda_pytorch. Lastly, the current scripts are only used for internal purposes (to update the leaderboard) so I think making the current setup a bit more complex is fine

@IlyasMoutawwakil
Copy link
Member

IlyasMoutawwakil commented Sep 23, 2024

yeh i see your point, it's just that I moved the llm-perf into a folder in optimum-benchmark based on the idea that it's just a couple of scripts that don't require a lot of maintenance and work as an example of usage of optimum-benchmark, but if the plan is to make it a more complex project then maybe it's time for it to have its own repo again, to ease its development for you. wdyt ?
old repo: https://github.com/IlyasMoutawwakil/llm-perf-backend/tree/00340abbdcd3ba96c07e2ac3c615f00cb6053c53

@IlyasMoutawwakil
Copy link
Member

IlyasMoutawwakil commented Sep 23, 2024

there's also the fact that its compute is scaling and I think it'd better if it's contained so that it wouldn't slow down or throttle the development CI of optimum-benchmark.

@baptistecolle
Copy link
Collaborator Author

baptistecolle commented Sep 23, 2024

yeh i see your point, it's just that I moved the llm-perf into a folder in optimum-benchmark based on the idea that it's just a couple of scripts that don't require a lot of maintenance and work as an example of usage of optimum-benchmark, but if the plan is to make it a more complex project then maybe it's time for it to have its own repo again, to ease its development for you. wdyt ? old repo: https://github.com/IlyasMoutawwakil/llm-perf-backend/tree/00340abbdcd3ba96c07e2ac3c615f00cb6053c53

Good question...
From a repo perceptive if the llm-perf leaderboard becomes bigger and more complex it add a lot of complexity inside optimum-benchmark which is not necessary. You can also view the leaderboard as a extension of optimum-benchmark as being the optimum-benchmark-leaderboard thus having both project in the same repo could make sense to keep them aligned.
What are the main reasons you stopped having a separate repo and merge it into this one? @IlyasMoutawwakil As the main maintainer of optimum-benchmark do you have a preference?

(I like working in this repo as i learn from your code and reviews quite a bit 😄 but i don't mind changing)

Also @regisss do you have an opinion on the matter?

@IlyasMoutawwakil
Copy link
Member

What are the main reasons you stopped having a separate repo and merge it into this one?

I moved the llm-perf into a folder in optimum-benchmark based on the idea that it's just a couple of scripts that don't require a lot of maintenance and work as an example of usage of optimum-benchmark

it was for the reason above, and for faster development as optimum-benchmark was changing at a higher rate (before pypi release and transformers adoption).
I also wanted to have access to the runners in the optimum-benchmark repo which were not accessible in my personal profile.

You can also view the leaderboard as a extension of optimum-benchmark as being the optimum-benchmark-leaderboard thus having both project in the same repo could make sense to keep them aligned.

It's mostly using the cli/api interface which we're keeping consistent across releases, so I don't see any alignment issues.

@IlyasMoutawwakil As the main maintainer of optimum-benchmark do you have a preference?

I think a separation of package and application makes more sense, like lighteval and llm-leaderboard-backend

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
leaderboard [CI] Requires and enables running all llm-perf leaderboard workflows
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants