-
Notifications
You must be signed in to change notification settings - Fork 37
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[tuner] Add unified benchmarking and compilation for models + dispatc…
…hes (#704) This PR adds `benchmark()` and `compile()` functions to the tuner that can be used for both model and dispatch tuning. The new functions will replace the split benchmark/compile_models and benchmark/compile_dispatches functions. The new benchmarking and compilation functions now use the iree_runtime/iree_compiler python bindings which makes much of the code simpler. Particularly, benchmark results are now mostly parsed by the bindings, and the parse_*_benchmark_results functions are no longer needed. The new compilation and benchmarking flows are described below. ### Compilation ### 1. Populate each CandidateTracker with the input and output filepaths. The input filepaths can be overridden by an optional function argument to the compile() function. This argument can be used for model tuning, passing the model filepath as the new input file. 2. For each candidate, strip the compilation info using iree-opt, and compile to a vmfb with the iree compiler python bindings. Set the candidate's TD spec file (generated during candidate generation), and add any additional iree-compile flags that came from the TuningClient. The extra flags are taken from a new abstract TuningClient function called get_iree_compile_flags. 3. For all successful compilations, save the vmfbs to the designated output path, and skip any failed compilation. For any failed compilation, a failure dump is saved instead of the vmfb. 4. Remove duplicate vmfbs, and return the ids of all unique candidates. ### Benchmarking ### 1. Create benchmark task structs for each candidate with its CandidateTracker and the TuningClient 2. Run the candidate benchmarks on the available devices. Each benchmark task will benchmark the vmfb from the CandidateTracker using the iree_runtime python bindings, and return a benchmark result containing the candidate_id, benchmark time, and device_id. 3. Then the same benchmarking is done on the untuned baseline configuration once for each available device. 4. The results from the candidate benchmarks are compared with the baseline benchmarks from the same device, and the fastest candidates are logged and returned. The number of candidates returned is determined by an optional argument to the benchmark function, and all candidates will be returned by default. --------- Signed-off-by: Max Dawkins <[email protected]>
- Loading branch information
Showing
10 changed files
with
646 additions
and
28 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
# Example Tuner Test | ||
|
||
Example of tuning a dispatch and full model. | ||
|
||
## Environments | ||
Follow instructions in [`/tuner/README.md`](../README.md) | ||
|
||
## Running the Tuner | ||
|
||
### Choose a model to tune | ||
This example uses the simple `double_mmt.mlir` file. | ||
|
||
### Generate a benchmark file | ||
Use the usual `iree-compile` command for your model, add | ||
`--iree-hal-dump-executable-files-to=dump --iree-config-add-tuner-attributes`, | ||
and get the dispatch benchmark that you want to tune. For example: | ||
```shell | ||
iree-compile double_mmt.mlir --iree-hal-target-backends=rocm \ | ||
--iree-hip-target=gfx942 --iree-hal-dump-executable-files-to=dump \ | ||
--iree-config-add-tuner-attributes -o /dev/null | ||
|
||
cp dump/module_main_dispatch_0_rocm_hsaco_fb_benchmark.mlir mmt_benchmark.mlir | ||
``` | ||
|
||
### Recommended Trial Run | ||
For an initial trial to test the tuning loop, use: | ||
```shell | ||
python -m examples.test double_mmt.mlir mmt_benchmark.mlir \ | ||
--test_num_dispatch_candidates=5 --test_num_model_candidates=3 \ | ||
--num-candidates=30 | ||
``` | ||
|
||
### Basic Usage | ||
```shell | ||
python -m examples.test <model_file_path> <benchmark_file_path> \ | ||
--test_num_dispatch_candidates=<num_dispatch_candidates> \ | ||
--test_num_model_candidates=<num_model_candidates> \ | ||
--test_hip_target=<hip_target> \ --num-candidates=<num_generated_candidates> | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
!matA_0 = tensor<2048x2048xf16> | ||
!matB_0 = tensor<2048x2048xf16> | ||
!matC_0 = tensor<2048x2048xf32> | ||
|
||
!matC_1 = tensor<2048x2048xf32> | ||
|
||
func.func @main(%arg0: !matA_0, %arg1: !matB_0) -> !matC_1 { | ||
%cst = arith.constant 0.000000e+00 : f32 | ||
%5 = tensor.empty() : !matC_0 | ||
%6 = linalg.fill ins(%cst : f32) outs(%5 : !matC_0) -> !matC_0 | ||
%7 = linalg.matmul_transpose_b ins(%arg0, %arg1 : !matA_0, !matB_0) outs(%6 : !matC_0) -> !matC_0 | ||
%8 = tensor.empty() : !matC_1 | ||
%9 = linalg.fill ins(%cst : f32) outs(%8 : !matC_1) -> !matC_1 | ||
%10 = linalg.matmul_transpose_b ins(%7, %7 : !matC_0, !matC_0) outs(%9 : !matC_1) -> !matC_1 | ||
return %10 : !matC_1 | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.