Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[0.9] Forbid divergent execution of work-group barriers #564

Merged
merged 1 commit into from
Feb 11, 2025

Conversation

vchuravy
Copy link
Member

@vchuravy vchuravy commented Feb 11, 2025

#558 without the POCL backend, so hopefully we can have that as a non-breaking change.

Copy link
Member Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

@vchuravy vchuravy changed the title Forbid divergent execution of work-group barriers [0.9] Forbid divergent execution of work-group barriers Feb 11, 2025
@vchuravy vchuravy marked this pull request as ready for review February 11, 2025 10:30
Copy link
Contributor

github-actions bot commented Feb 11, 2025

Benchmark Results

main 993ef0b... main/993ef0bf315b2a...
saxpy/default/Float16/1024 0.737 ± 0.0067 μs 0.733 ± 0.0077 μs 1.01
saxpy/default/Float16/1048576 0.177 ± 0.0083 ms 0.173 ± 0.0063 ms 1.02
saxpy/default/Float16/16384 3.33 ± 0.025 μs 3.32 ± 0.031 μs 1
saxpy/default/Float16/2048 0.916 ± 0.011 μs 0.908 ± 0.012 μs 1.01
saxpy/default/Float16/256 0.597 ± 0.0055 μs 0.588 ± 0.0072 μs 1.01
saxpy/default/Float16/262144 0.0439 ± 0.0004 ms 0.0438 ± 0.0003 ms 1
saxpy/default/Float16/32768 6.01 ± 0.052 μs 6.03 ± 0.095 μs 0.997
saxpy/default/Float16/4096 1.31 ± 0.023 μs 1.3 ± 0.025 μs 1.01
saxpy/default/Float16/512 0.651 ± 0.0063 μs 0.649 ± 0.0065 μs 1
saxpy/default/Float16/64 0.561 ± 0.0048 μs 0.559 ± 0.0058 μs 1
saxpy/default/Float16/65536 11.6 ± 0.1 μs 11.6 ± 0.15 μs 1
saxpy/default/Float32/1024 0.647 ± 0.011 μs 0.634 ± 0.01 μs 1.02
saxpy/default/Float32/1048576 0.195 ± 0.035 ms 0.227 ± 0.026 ms 0.86
saxpy/default/Float32/16384 2.76 ± 0.14 μs 2.81 ± 0.6 μs 0.981
saxpy/default/Float32/2048 0.76 ± 0.065 μs 0.756 ± 0.026 μs 1.01
saxpy/default/Float32/256 0.583 ± 0.0065 μs 0.562 ± 0.017 μs 1.04
saxpy/default/Float32/262144 0.0559 ± 0.0042 ms 0.0574 ± 0.0047 ms 0.974
saxpy/default/Float32/32768 5.28 ± 0.26 μs 5.39 ± 1 μs 0.98
saxpy/default/Float32/4096 1.13 ± 0.093 μs 1.11 ± 0.042 μs 1.01
saxpy/default/Float32/512 0.614 ± 0.0077 μs 0.596 ± 0.0092 μs 1.03
saxpy/default/Float32/64 0.568 ± 0.0046 μs 0.547 ± 0.0065 μs 1.04
saxpy/default/Float32/65536 12.5 ± 0.72 μs 12.4 ± 0.96 μs 1.01
saxpy/default/Float64/1024 0.747 ± 0.04 μs 0.743 ± 0.031 μs 1.01
saxpy/default/Float64/1048576 0.474 ± 0.052 ms 0.493 ± 0.026 ms 0.962
saxpy/default/Float64/16384 5.26 ± 0.31 μs 5.37 ± 0.6 μs 0.98
saxpy/default/Float64/2048 1.15 ± 0.097 μs 1.12 ± 0.084 μs 1.03
saxpy/default/Float64/256 0.583 ± 0.0066 μs 0.564 ± 0.011 μs 1.03
saxpy/default/Float64/262144 0.0931 ± 0.0098 ms 0.115 ± 0.0079 ms 0.812
saxpy/default/Float64/32768 11.9 ± 0.91 μs 12.8 ± 1.3 μs 0.931
saxpy/default/Float64/4096 1.69 ± 0.19 μs 1.68 ± 0.13 μs 1.01
saxpy/default/Float64/512 0.634 ± 0.011 μs 0.614 ± 0.013 μs 1.03
saxpy/default/Float64/64 0.559 ± 0.0059 μs 0.546 ± 0.0096 μs 1.02
saxpy/default/Float64/65536 24 ± 2.1 μs 28.7 ± 1.8 μs 0.838
saxpy/static workgroup=(1024,)/Float16/1024 2.19 ± 0.025 μs 2.17 ± 0.027 μs 1.01
saxpy/static workgroup=(1024,)/Float16/1048576 0.157 ± 0.0073 ms 0.158 ± 0.0088 ms 0.999
saxpy/static workgroup=(1024,)/Float16/16384 4.43 ± 0.083 μs 4.41 ± 0.07 μs 1
saxpy/static workgroup=(1024,)/Float16/2048 2.36 ± 0.026 μs 2.35 ± 0.031 μs 1
saxpy/static workgroup=(1024,)/Float16/256 2.85 ± 0.032 μs 2.82 ± 0.033 μs 1.01
saxpy/static workgroup=(1024,)/Float16/262144 0.0418 ± 0.00094 ms 0.0419 ± 0.00082 ms 0.998
saxpy/static workgroup=(1024,)/Float16/32768 6.85 ± 0.18 μs 6.84 ± 0.17 μs 1
saxpy/static workgroup=(1024,)/Float16/4096 2.68 ± 0.039 μs 2.67 ± 0.039 μs 1
saxpy/static workgroup=(1024,)/Float16/512 3.29 ± 0.033 μs 3.26 ± 0.035 μs 1.01
saxpy/static workgroup=(1024,)/Float16/64 2.54 ± 0.21 μs 2.51 ± 0.21 μs 1.01
saxpy/static workgroup=(1024,)/Float16/65536 12.5 ± 0.39 μs 12.4 ± 0.25 μs 1.01
saxpy/static workgroup=(1024,)/Float32/1024 2.23 ± 0.029 μs 2.21 ± 0.036 μs 1.01
saxpy/static workgroup=(1024,)/Float32/1048576 0.239 ± 0.019 ms 0.237 ± 0.02 ms 1.01
saxpy/static workgroup=(1024,)/Float32/16384 4.34 ± 0.23 μs 4.43 ± 0.32 μs 0.978
saxpy/static workgroup=(1024,)/Float32/2048 2.37 ± 0.059 μs 2.38 ± 0.062 μs 0.995
saxpy/static workgroup=(1024,)/Float32/256 2.69 ± 0.053 μs 2.68 ± 0.053 μs 1
saxpy/static workgroup=(1024,)/Float32/262144 0.0605 ± 0.0035 ms 0.0603 ± 0.0038 ms 1
saxpy/static workgroup=(1024,)/Float32/32768 7.39 ± 0.38 μs 7.49 ± 0.49 μs 0.987
saxpy/static workgroup=(1024,)/Float32/4096 2.66 ± 0.079 μs 2.65 ± 0.069 μs 1
saxpy/static workgroup=(1024,)/Float32/512 2.7 ± 0.032 μs 2.69 ± 0.034 μs 1
saxpy/static workgroup=(1024,)/Float32/64 2.73 ± 5.1 μs 2.71 ± 4.5 μs 1.01
saxpy/static workgroup=(1024,)/Float32/65536 15.3 ± 0.98 μs 15.4 ± 1.2 μs 0.991
saxpy/static workgroup=(1024,)/Float64/1024 2.33 ± 0.059 μs 2.32 ± 0.078 μs 1
saxpy/static workgroup=(1024,)/Float64/1048576 0.518 ± 0.029 ms 0.504 ± 0.025 ms 1.03
saxpy/static workgroup=(1024,)/Float64/16384 7.31 ± 0.39 μs 7.31 ± 0.35 μs 1
saxpy/static workgroup=(1024,)/Float64/2048 2.62 ± 0.09 μs 2.6 ± 0.078 μs 1.01
saxpy/static workgroup=(1024,)/Float64/256 2.66 ± 0.058 μs 2.65 ± 0.061 μs 1.01
saxpy/static workgroup=(1024,)/Float64/262144 0.0985 ± 0.0094 ms 0.0948 ± 0.0098 ms 1.04
saxpy/static workgroup=(1024,)/Float64/32768 15.3 ± 1 μs 14.6 ± 1.4 μs 1.05
saxpy/static workgroup=(1024,)/Float64/4096 3.17 ± 0.19 μs 3.16 ± 0.22 μs 1
saxpy/static workgroup=(1024,)/Float64/512 2.67 ± 0.065 μs 2.66 ± 0.071 μs 1
saxpy/static workgroup=(1024,)/Float64/64 2.62 ± 0.065 μs 2.6 ± 0.06 μs 1.01
saxpy/static workgroup=(1024,)/Float64/65536 29.2 ± 4.8 μs 26.7 ± 2.6 μs 1.1
time_to_load 0.316 ± 0.0061 s 0.315 ± 0.0059 s 1

Benchmark Plots

A plot of the benchmark results have been uploaded as an artifact to the workflow run for this PR.
Go to "Actions"->"Benchmark a pull request"->[the most recent run]->"Artifacts" (at the bottom).

@vchuravy vchuravy merged commit 31d5b44 into main Feb 11, 2025
35 of 38 checks passed
@vchuravy vchuravy deleted the vc/barries_0.9 branch February 11, 2025 11:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant