Skip to content

Commit e38db3a

Browse files
ianaylsarnex
andauthored
[CI] Tune nightly benchmarking job for better reliability (#17122)
This PR tunes the nightly benchmarking job to produce more consistent results: - Lowers the tolerance threshold of benchmarking results accepted from 50% to 8% - Nightly was flaking before even with a 50% tolerance threshold - Raises the iterations to 5000 - Using 10,000 iterations did not result in significantly more stable performance, although this may change as we obtain more data - However, the PVC benchmarking job in the overall nightly workflow now takes about ~47 minutes, whereas before the PVC benchmarking job took ~14 minutes - This should not have major impact on execution time however, considering the E2E tests take ~42 minutes: Since both these jobs run in parallel on different machines, the theoretical effect on the overall workflow should only be about 5 minutes, although this would depend on whether or not machines are able to be scheduled in time. - Changes the benchmarking workflows in sycl-nightly.yml to use the tuned PERF_PVC runner - Untuned machines are exhibiting large variations when running compute-benchmarks (20-25%, up to 50% in the worst case scenario): These are unacceptable variations and not particularly useful. - Disables nightly benchmarking on gen12: - Gen12 machines are currently untuned. Similar to PVC machines, these results are not accurate and not worth serious nightly benchmarking. - Adds guards for benchmarking jobs to prevent benchmark runs in forks #14454 (comment) --------- Co-authored-by: Nick Sarnie <[email protected]>
1 parent e0eda7e commit e38db3a

File tree

4 files changed

+32
-18
lines changed

4 files changed

+32
-18
lines changed

Diff for: .github/workflows/sycl-linux-run-tests.yml

+4-6
Original file line numberDiff line numberDiff line change
@@ -126,6 +126,7 @@ on:
126126
- '["cts-cpu"]'
127127
- '["Linux", "build"]'
128128
- '["cuda"]'
129+
- '["PVC_PERF"]'
129130
image:
130131
type: choice
131132
options:
@@ -170,17 +171,14 @@ on:
170171
Extra options to be added to LIT_OPTS.
171172
default: ''
172173

173-
install_igc_driver:
174+
reset_intel_gpu:
175+
description: |
176+
Reset Intel GPUs
174177
type: choice
175178
options:
176179
- false
177180
- true
178181

179-
install_dev_igc_driver:
180-
type: choice
181-
options:
182-
- false
183-
- true
184182
e2e_testing_mode:
185183
type: choice
186184
options:

Diff for: .github/workflows/sycl-nightly.yml

+2-7
Original file line numberDiff line numberDiff line change
@@ -247,7 +247,7 @@ jobs:
247247
sycl_cts_artifact: sycl_cts_bin
248248

249249
aggregate_benchmark_results:
250-
if: always() && !cancelled()
250+
if: github.repository == 'intel/llvm' && !cancelled()
251251
name: Aggregate benchmark results and produce historical averages
252252
uses: ./.github/workflows/sycl-benchmark-aggregate.yml
253253
secrets:
@@ -262,13 +262,8 @@ jobs:
262262
fail-fast: false
263263
matrix:
264264
include:
265-
- name: Run compute-benchmarks on L0 Gen12
266-
runner: '["Linux", "gen12"]'
267-
image_options: -u 1001 --device=/dev/dri -v /dev/dri/by-path:/dev/dri/by-path --privileged --cap-add SYS_ADMIN
268-
target_devices: level_zero:gpu
269-
reset_intel_gpu: true
270265
- name: Run compute-benchmarks on L0 PVC
271-
runner: '["Linux", "pvc"]'
266+
runner: '["PVC_PERF"]'
272267
image_options: -u 1001 --device=/dev/dri -v /dev/dri/by-path:/dev/dri/by-path --privileged --cap-add SYS_ADMIN
273268
target_devices: level_zero:gpu
274269
reset_intel_gpu: true

Diff for: devops/actions/run-tests/benchmark/action.yml

+22-1
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,27 @@ runs:
4646
echo "# This workflow is not guaranteed to work with other backends."
4747
echo "#" ;;
4848
esac
49+
- name: Compute CPU core range to run benchmarks on
50+
shell: bash
51+
run: |
52+
# Taken from ur-benchmark-reusable.yml:
53+
54+
# Compute the core range for the first NUMA node; second node is used by
55+
# UMF. Skip the first 4 cores as the kernel is likely to schedule more
56+
# work on these.
57+
CORES="$(lscpu | awk '
58+
/NUMA node0 CPU|On-line CPU/ {line=$0}
59+
END {
60+
split(line, a, " ")
61+
split(a[4], b, ",")
62+
sub(/^0/, "4", b[1])
63+
print b[1]
64+
}')"
65+
echo "CPU core range to use: $CORES"
66+
echo "CORES=$CORES" >> $GITHUB_ENV
67+
68+
ZE_AFFINITY_MASK=0
69+
echo "ZE_AFFINITY_MASK=$ZE_AFFINITY_MASK" >> $GITHUB_ENV
4970
- name: Run compute-benchmarks
5071
shell: bash
5172
run: |
@@ -69,7 +90,7 @@ runs:
6990
echo "-----"
7091
sycl-ls
7192
echo "-----"
72-
./devops/scripts/benchmarking/benchmark.sh -n '${{ runner.name }}' -s || exit 1
93+
taskset -c "$CORES" ./devops/scripts/benchmarking/benchmark.sh -n '${{ runner.name }}' -s || exit 1
7394
- name: Push compute-benchmarks results
7495
if: always()
7596
shell: bash

Diff for: devops/benchmarking/config.ini

+4-4
Original file line numberDiff line numberDiff line change
@@ -10,9 +10,9 @@
1010
; Compute-benchmark compile/run options
1111
[compute_bench]
1212
; Value for -j during compilation of compute-benchmarks
13-
compile_jobs = 2
13+
compile_jobs = 40
1414
; Number of iterations to run compute-benchmark tests
15-
iterations = 100
15+
iterations = 5000
1616

1717
; Options for benchmark result metrics (to record/compare against)
1818
[metrics]
@@ -23,15 +23,15 @@ recorded = Median,StdDev
2323
; the historical average. Metrics not included here are not compared against
2424
; when passing/failing benchmark results.
2525
; Format: comma-separated list of <metric>:<deviation percentage in decimals>
26-
tolerances = Median:0.5
26+
tolerances = Median:0.08
2727

2828
; Options for computing historical averages
2929
[average]
3030
; Number of days (from today) to look back for results when computing historical
3131
; average
3232
cutoff_range = 7
3333
; Minimum number of samples required to compute a historical average
34-
min_threshold = 3
34+
min_threshold = 10
3535

3636
; ONEAPI_DEVICE_SELECTOR linting/options
3737
[device_selector]

0 commit comments

Comments
 (0)