Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build alpaka code in GPU-only mode [14.0.x] #9125

Conversation

fwyzard
Copy link
Contributor

@fwyzard fwyzard commented Apr 7, 2024

Build alpaka device code for the CUDA and ROCm back-ends in "GPU only" mode. In this mode, functions marked as ALPAKA_FN_ACC are marked as __device__ functions, and are compiled only for the corresponding GPU device back-ends.

Currently, functions marked as ALPAKA_FN_ACC are marked as __host__ __device__ functions, and may be compiled for both device and host back-ends. The latter leads to linker errors in kernels that use device symbols like threadIdx, blockIdx, etc. that are compiled for the ROCm back-end.

No impact on the HLT performance, as expected.
CMSSW_14_0_4:

Running 4 times over all events with 8 jobs, each with 32 threads, 24 streams and 1 GPUs
   525.9 ±   0.6 ev/s (11800 events, 99.8% overlap)
   522.7 ±   0.5 ev/s (11800 events, 99.9% overlap)
   534.2 ±   0.8 ev/s (11800 events, 99.8% overlap)
   541.6 ±   0.6 ev/s (11800 events, 99.9% overlap)
 --------------------
   531.1 ±   8.5 ev/s

CMSSW_14_0_4 with cms-sw/cmssw#44650 and #9125:

Running 4 times over all events with 8 jobs, each with 32 threads, 24 streams and 1 GPUs
   536.3 ±   0.6 ev/s (11800 events, 99.8% overlap)
   531.2 ±   0.5 ev/s (11800 events, 99.7% overlap)
   558.7 ±   0.4 ev/s (11800 events, 99.5% overlap)
   520.9 ±   0.8 ev/s (11800 events, 99.8% overlap)
 --------------------
   536.8 ±  16.0 ev/s

Backport #9121 to CMSSW 14.0.x for data taking.

Build alpaka device code for the CUDA and ROCm back-ends in "GPU only" mode.
In this mode, functions marked as ALPAKA_FN_ACC are marked as __device__
functions, and are compiled only for the corresponding GPU device back-ends.

Currently, functions marked as ALPAKA_FN_ACC are marked as __host__ __device__
functions, and may be compiled for both device and host back-ends. The latter
leads to linker errors in kernels that use device symbols like threadIdx,
blockIdx, etc. that are compiled for the ROCm back-end.
@fwyzard
Copy link
Contributor Author

fwyzard commented Apr 7, 2024

backport #9121

@cmsbuild
Copy link
Contributor

cmsbuild commented Apr 7, 2024

A new Pull Request was created by @fwyzard for branch IB/CMSSW_14_0_X/master.

@aandvalenzuela, @smuzaffar, @cmsbuild, @iarspider can you please review it and eventually sign? Thanks.
@sextonkennedy, @rappoccio, @antoniovilela you are the release manager for this.
cms-bot commands are listed here

@cmsbuild
Copy link
Contributor

cmsbuild commented Apr 7, 2024

cms-bot internal usage

@fwyzard
Copy link
Contributor Author

fwyzard commented Apr 7, 2024

enable gpu

@fwyzard
Copy link
Contributor Author

fwyzard commented Apr 7, 2024

please test with cms-sw/cmssw#44650

@cmsbuild
Copy link
Contributor

cmsbuild commented Apr 7, 2024

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-70672a/38660/summary.html
COMMIT: efea939
CMSSW: CMSSW_14_0_X_2024-04-07-0000/el8_amd64_gcc12
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/9125/38660/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

GPU Comparison Summary

Summary:

@smuzaffar
Copy link
Contributor

+externals

@cms-sw/orp-l2 , feel free to include it for next 14.0.X IB/release

@cmsbuild
Copy link
Contributor

cmsbuild commented Apr 8, 2024

This pull request is fully signed and it will be integrated in one of the next IB/CMSSW_14_0_X/master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @rappoccio, @sextonkennedy, @antoniovilela (and backports should be raised in the release meeting by the corresponding L2)
Notice This PR was tested with additional Pull Request(s), please also merge them if necessary: cms-sw/cmssw#44650

@fwyzard
Copy link
Contributor Author

fwyzard commented Apr 8, 2024

@cms-sw/orp-l2 , feel free to include it for next 14.0.X IB/release

However, cms-sw/cmssw#44650 needs to be merged before (or at the same time as) this PR.

@antoniovilela
Copy link

Will merge it with cms-sw/cmssw#44650, after passing an IB in master.

@antoniovilela
Copy link

+1

@cmsbuild cmsbuild merged commit db6ad92 into cms-sw:IB/CMSSW_14_0_X/master Apr 9, 2024
13 checks passed
@fwyzard fwyzard deleted the IB/CMSSW_14_0_X/master_alpaka_gpu_only_mode branch April 9, 2024 15:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants