Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow opt-out of implicit bounds-checking #563

Merged
merged 1 commit into from
Feb 13, 2025

Conversation

vchuravy
Copy link
Member

@vchuravy vchuravy commented Feb 7, 2025

KernelAbstractions currently creates kernels that look like:

if __validindex(ctx)
   # Body
end

This is problematic due to the convergence requirement on
@synchronize.

This was referenced Feb 7, 2025
Copy link
Member Author

vchuravy commented Feb 7, 2025

This stack of pull requests is managed by Graphite. Learn more about stacking.

@vchuravy vchuravy marked this pull request as ready for review February 7, 2025 11:31
Copy link

codecov bot commented Feb 7, 2025

Codecov Report

Attention: Patch coverage is 0% with 888 lines in your changes missing coverage. Please review.

Project coverage is 0.00%. Comparing base (b435bb2) to head (4dd0acc).
Report is 2 commits behind head on vc/pocl.

Files with missing lines Patch % Lines
src/pocl/nanoOpenCL.jl 0.00% 520 Missing ⚠️
src/pocl/device/array.jl 0.00% 101 Missing ⚠️
src/pocl/backend.jl 0.00% 93 Missing ⚠️
src/pocl/compiler/execution.jl 0.00% 43 Missing ⚠️
src/pocl/compiler/compilation.jl 0.00% 32 Missing ⚠️
src/pocl/device/quirks.jl 0.00% 24 Missing ⚠️
src/pocl/compiler/reflection.jl 0.00% 23 Missing ⚠️
src/pocl/pocl.jl 0.00% 20 Missing ⚠️
src/pocl/device/runtime.jl 0.00% 13 Missing ⚠️
src/macros.jl 0.00% 12 Missing ⚠️
... and 2 more
Additional details and impacted files
@@           Coverage Diff            @@
##           vc/pocl    #563    +/-   ##
========================================
  Coverage     0.00%   0.00%            
========================================
  Files           12      21     +9     
  Lines          777    1513   +736     
========================================
- Misses         777    1513   +736     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

github-actions bot commented Feb 7, 2025

Benchmark Results

main 0aa6a5b... main/0aa6a5b19490b4...
saxpy/default/Float16/1024 0.74 ± 0.0071 μs 0.779 ± 0.0067 μs 0.949
saxpy/default/Float16/1048576 0.175 ± 0.0086 ms 0.174 ± 0.0087 ms 1
saxpy/default/Float16/16384 3.34 ± 0.02 μs 3.38 ± 0.031 μs 0.99
saxpy/default/Float16/2048 0.918 ± 0.012 μs 0.961 ± 0.015 μs 0.955
saxpy/default/Float16/256 0.592 ± 0.0057 μs 0.63 ± 0.0057 μs 0.94
saxpy/default/Float16/262144 0.0442 ± 0.00063 ms 0.0442 ± 0.00074 ms 1
saxpy/default/Float16/32768 6.01 ± 0.042 μs 6.06 ± 0.062 μs 0.992
saxpy/default/Float16/4096 1.32 ± 0.027 μs 1.35 ± 0.027 μs 0.977
saxpy/default/Float16/512 0.653 ± 0.0068 μs 0.694 ± 0.0055 μs 0.94
saxpy/default/Float16/64 0.562 ± 0.0053 μs 0.601 ± 0.0052 μs 0.935
saxpy/default/Float16/65536 11.6 ± 0.13 μs 11.8 ± 0.18 μs 0.986
saxpy/default/Float32/1024 0.658 ± 0.013 μs 0.648 ± 0.01 μs 1.01
saxpy/default/Float32/1048576 0.235 ± 0.029 ms 0.21 ± 0.034 ms 1.12
saxpy/default/Float32/16384 2.78 ± 0.15 μs 2.78 ± 0.17 μs 1
saxpy/default/Float32/2048 0.756 ± 0.057 μs 0.775 ± 0.038 μs 0.976
saxpy/default/Float32/256 0.593 ± 0.0086 μs 0.58 ± 0.0056 μs 1.02
saxpy/default/Float32/262144 0.0569 ± 0.005 ms 0.0452 ± 0.0042 ms 1.26
saxpy/default/Float32/32768 5.36 ± 0.35 μs 5.31 ± 0.49 μs 1.01
saxpy/default/Float32/4096 1.13 ± 0.082 μs 1.13 ± 0.079 μs 0.996
saxpy/default/Float32/512 0.625 ± 0.011 μs 0.615 ± 0.0068 μs 1.02
saxpy/default/Float32/64 0.585 ± 0.0059 μs 0.569 ± 0.006 μs 1.03
saxpy/default/Float32/65536 12.6 ± 1 μs 12 ± 1.3 μs 1.05
saxpy/default/Float64/1024 0.76 ± 0.052 μs 0.77 ± 0.059 μs 0.987
saxpy/default/Float64/1048576 0.513 ± 0.048 ms 0.5 ± 0.047 ms 1.03
saxpy/default/Float64/16384 5.27 ± 0.33 μs 5.36 ± 0.39 μs 0.983
saxpy/default/Float64/2048 1.16 ± 0.091 μs 1.15 ± 0.097 μs 1.01
saxpy/default/Float64/256 0.588 ± 0.0082 μs 0.585 ± 0.0056 μs 1.01
saxpy/default/Float64/262144 0.0919 ± 0.01 ms 0.0962 ± 0.014 ms 0.956
saxpy/default/Float64/32768 12 ± 1.3 μs 12.2 ± 1.2 μs 0.983
saxpy/default/Float64/4096 1.7 ± 0.14 μs 1.7 ± 0.26 μs 0.999
saxpy/default/Float64/512 0.643 ± 0.014 μs 0.64 ± 0.012 μs 1
saxpy/default/Float64/64 0.558 ± 0.011 μs 0.562 ± 0.0054 μs 0.993
saxpy/default/Float64/65536 24.1 ± 2.8 μs 24.3 ± 3.4 μs 0.989
saxpy/static workgroup=(1024,)/Float16/1024 2.21 ± 0.03 μs 2.2 ± 0.027 μs 1
saxpy/static workgroup=(1024,)/Float16/1048576 0.164 ± 0.01 ms 0.161 ± 0.0095 ms 1.02
saxpy/static workgroup=(1024,)/Float16/16384 4.47 ± 0.089 μs 4.45 ± 0.13 μs 1
saxpy/static workgroup=(1024,)/Float16/2048 2.39 ± 0.029 μs 2.38 ± 0.028 μs 1.01
saxpy/static workgroup=(1024,)/Float16/256 2.8 ± 0.038 μs 2.81 ± 0.033 μs 0.996
saxpy/static workgroup=(1024,)/Float16/262144 0.0432 ± 0.0025 ms 0.0423 ± 0.0017 ms 1.02
saxpy/static workgroup=(1024,)/Float16/32768 6.87 ± 0.21 μs 6.86 ± 0.22 μs 1
saxpy/static workgroup=(1024,)/Float16/4096 2.72 ± 0.039 μs 2.69 ± 0.036 μs 1.01
saxpy/static workgroup=(1024,)/Float16/512 3.25 ± 0.041 μs 3.26 ± 0.034 μs 0.998
saxpy/static workgroup=(1024,)/Float16/64 2.52 ± 0.25 μs 2.51 ± 0.21 μs 1
saxpy/static workgroup=(1024,)/Float16/65536 12.5 ± 0.35 μs 12.5 ± 0.26 μs 1.01
saxpy/static workgroup=(1024,)/Float32/1024 2.21 ± 0.034 μs 2.22 ± 0.032 μs 0.994
saxpy/static workgroup=(1024,)/Float32/1048576 0.209 ± 0.027 ms 0.203 ± 0.026 ms 1.03
saxpy/static workgroup=(1024,)/Float32/16384 4.34 ± 0.2 μs 4.37 ± 0.24 μs 0.993
saxpy/static workgroup=(1024,)/Float32/2048 2.37 ± 0.068 μs 2.37 ± 0.053 μs 1
saxpy/static workgroup=(1024,)/Float32/256 2.68 ± 0.041 μs 2.68 ± 0.07 μs 1
saxpy/static workgroup=(1024,)/Float32/262144 0.0606 ± 0.0049 ms 0.0484 ± 0.0043 ms 1.25
saxpy/static workgroup=(1024,)/Float32/32768 7.52 ± 0.38 μs 7.55 ± 0.46 μs 0.996
saxpy/static workgroup=(1024,)/Float32/4096 2.67 ± 0.087 μs 2.66 ± 0.07 μs 1
saxpy/static workgroup=(1024,)/Float32/512 2.7 ± 0.034 μs 2.69 ± 0.036 μs 1
saxpy/static workgroup=(1024,)/Float32/64 2.71 ± 5.3 μs 2.72 ± 5.7 μs 0.998
saxpy/static workgroup=(1024,)/Float32/65536 15.6 ± 1.3 μs 14.9 ± 1.6 μs 1.05
saxpy/static workgroup=(1024,)/Float64/1024 2.31 ± 0.068 μs 2.35 ± 0.057 μs 0.982
saxpy/static workgroup=(1024,)/Float64/1048576 0.536 ± 0.061 ms 0.519 ± 0.047 ms 1.03
saxpy/static workgroup=(1024,)/Float64/16384 7.2 ± 0.45 μs 7.42 ± 0.4 μs 0.97
saxpy/static workgroup=(1024,)/Float64/2048 2.59 ± 0.073 μs 2.63 ± 0.066 μs 0.986
saxpy/static workgroup=(1024,)/Float64/256 2.64 ± 0.058 μs 2.65 ± 0.056 μs 0.997
saxpy/static workgroup=(1024,)/Float64/262144 0.113 ± 0.022 ms 0.0994 ± 0.015 ms 1.14
saxpy/static workgroup=(1024,)/Float64/32768 15.5 ± 1.3 μs 15 ± 1.6 μs 1.03
saxpy/static workgroup=(1024,)/Float64/4096 3.12 ± 0.13 μs 3.18 ± 0.15 μs 0.981
saxpy/static workgroup=(1024,)/Float64/512 2.67 ± 0.077 μs 2.67 ± 0.071 μs 1
saxpy/static workgroup=(1024,)/Float64/64 2.6 ± 0.069 μs 2.62 ± 0.065 μs 0.992
saxpy/static workgroup=(1024,)/Float64/65536 31.3 ± 2.3 μs 26.7 ± 3.4 μs 1.17
time_to_load 0.319 ± 0.0035 s 0.317 ± 0.0019 s 1.01

Benchmark Plots

A plot of the benchmark results have been uploaded as an artifact to the workflow run for this PR.
Go to "Actions"->"Benchmark a pull request"->[the most recent run]->"Artifacts" (at the bottom).

@vchuravy vchuravy force-pushed the 02-07-allow_opt-out_of_implicit_bounds-checking branch from 48e3752 to e565304 Compare February 7, 2025 13:51
@vchuravy vchuravy force-pushed the 02-07-allow_opt-out_of_implicit_bounds-checking branch from e565304 to 4dd0acc Compare February 10, 2025 15:08
@vchuravy vchuravy force-pushed the vc/pocl branch 2 times, most recently from 777c099 to 3bb80ac Compare February 12, 2025 15:23
@vchuravy vchuravy closed this Feb 12, 2025
@vchuravy vchuravy deleted the 02-07-allow_opt-out_of_implicit_bounds-checking branch February 12, 2025 15:24
@vchuravy vchuravy restored the 02-07-allow_opt-out_of_implicit_bounds-checking branch February 13, 2025 07:27
@vchuravy vchuravy reopened this Feb 13, 2025
@vchuravy vchuravy force-pushed the 02-07-allow_opt-out_of_implicit_bounds-checking branch from 4dd0acc to 5e03ecf Compare February 13, 2025 08:07
@vchuravy vchuravy changed the base branch from vc/pocl to main February 13, 2025 08:07
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some suggestions could not be made:

  • src/KernelAbstractions.jl
    • lines 97-98

@vchuravy vchuravy force-pushed the 02-07-allow_opt-out_of_implicit_bounds-checking branch from 5e03ecf to e0c44ee Compare February 13, 2025 10:05
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some suggestions could not be made:

  • src/KernelAbstractions.jl
    • lines 97-98

@vchuravy vchuravy force-pushed the 02-07-allow_opt-out_of_implicit_bounds-checking branch from e0c44ee to a73ea1f Compare February 13, 2025 10:30
@vchuravy vchuravy force-pushed the 02-07-allow_opt-out_of_implicit_bounds-checking branch from a73ea1f to ed45e9b Compare February 13, 2025 16:02
KernelAbstractions currently creates kernels that look like:

```
if __validindex(ctx)
   # Body
end
```

This is problematic due to the convergence requirement on
`@synchronize`.
@vchuravy vchuravy force-pushed the 02-07-allow_opt-out_of_implicit_bounds-checking branch from ed45e9b to 0aa6a5b Compare February 13, 2025 16:25
Copy link
Member Author

vchuravy commented Feb 13, 2025

Merge activity

  • Feb 13, 12:53 PM EST: A user started a stack merge that includes this pull request via Graphite.
  • Feb 13, 12:53 PM EST: A user merged this pull request with Graphite.

@vchuravy vchuravy merged commit 14eef78 into main Feb 13, 2025
35 of 38 checks passed
@vchuravy vchuravy deleted the 02-07-allow_opt-out_of_implicit_bounds-checking branch February 13, 2025 17:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant