Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ECAL RecHit producer Alpaka migration #46453

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

thomreis
Copy link
Contributor

PR description:

Migration of the ECAL RecHit producer from CUDA to Alpaka, including the required portable data and conditions formats and an extension of the DQM module to compare RecHits produced on the CPU or GPU.
While being a direct replacement of the existing CUDA RecHit producer for the most part, the migrated Alpaka version adds the RecHit time variable, which the CUDA version did not calculate. In addition the Alpaka version adds support for Phase 2, where no inputs from the endcaps will be existing anymore.

In comparison with the legacy CPU producer the Alpaka algorithm still lacks the recovery of dead channels and can therefore not yet be used to replace the legacy producer in production.

PR validation:

A comparison of the legacy CPU code vs. CUDA comparison (12834.513) with the legacy CPU code vs. Alpaka (on GPU) comparison (13834.413) with 9k TTbar events shows almost identical results between CUDA and Alpaka (with the exception of the time variables as mentioned above) and very good agreement with the legacy CPU version for both implementations.
In addition, a comparison of the Alpaka module running on CPU gives almost identical results to the module running on GPU (nvidia).

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 19, 2024

cms-bot internal usage

@thomreis
Copy link
Contributor Author

type ecal

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-46453/42315

@thomreis
Copy link
Contributor Author

enable gpu

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @thomreis for master.

It involves the following packages:

  • CondFormats/DataRecord (db, alca)
  • CondFormats/EcalObjects (db, alca)
  • Configuration/ProcessModifiers (operations)
  • Configuration/PyReleaseValidation (pdmv, upgrade)
  • DQM/EcalMonitorTasks (dqm)
  • DataFormats/EcalRecHit (reconstruction)
  • EventFilter/EcalRawToDigi (reconstruction)
  • RecoLocalCalo/EcalRecProducers (reconstruction)

@AdrianoDee, @Moanwar, @antoniovagnerini, @antoniovilela, @atpathak, @cmsbuild, @consuegs, @davidlange6, @fabiocos, @francescobrivio, @jfernan2, @kskovpen, @mandrenguyen, @miquork, @nothingface0, @perrotta, @rappoccio, @rvenditti, @srimanob, @subirsarkar, @sunilUIET, @syuvivida, @tjavaid can you please review it and eventually sign? Thanks.
@JanChyczynski, @Martin-Grunewald, @PonIlya, @ReyerBand, @apsallid, @argiro, @fabiocos, @makortel, @missirol, @mmusich, @rchatter, @rovere, @rsreds, @seemasharmafnal, @slomeo, @thomreis, @tocheng, @wang0jin, @youyingli, @yuanchao this is something you requested to watch as well.
@antoniovilela, @mandrenguyen, @rappoccio, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

@thomreis
Copy link
Contributor Author

please test

@cmsbuild
Copy link
Contributor

+1

Size: This PR adds an extra 132KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-891ac5/42309/summary.html
COMMIT: b693b23
CMSSW: CMSSW_14_2_X_2024-10-19-1100/el8_amd64_gcc12
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/46453/42309/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

GPU Comparison Summary

Summary:

@mmusich
Copy link
Contributor

mmusich commented Oct 21, 2024

Hello, just out of curiosity, I imagine this development will eventually enter the HLT menu for 2025.
Is the plan to introduce a customization function to make use the alpaka ECAL rechit producer in the HLT menu in this PR or will ECAL just open a ticket with the confDB configuration for integration?

@thomreis
Copy link
Contributor Author

Hi @mmusich the portable rechit produer in this PR does not have the full functionality of the CPU producer currently used in the HLT menu. Just like the CUDA version it lacks the algorithms for recovery of dead channels. However, in the current pp HLT menu there does not seem to be any energy recovery done neither so it may be possible to actually use this in 2025 already. This needs to be checked in more detail however.

We can add a customization function to this PR but I think we should only activate it once it is confirmed that the portable producer gives the same results than the currently used one at the HLT. Of course we could also do this in a separate PR if that is preferred.

@missirol
Copy link
Contributor

assign heterogeneous ?

@mmusich
Copy link
Contributor

mmusich commented Oct 22, 2024

assign heterogeneous

@cmsbuild
Copy link
Contributor

New categories assigned: heterogeneous

@fwyzard,@makortel you have been requested to review this Pull request/Issue and eventually sign? Thanks

@thomreis
Copy link
Contributor Author

Hi @mmusich the portable rechit produer in this PR does not have the full functionality of the CPU producer currently used in the HLT menu. Just like the CUDA version it lacks the algorithms for recovery of dead channels. However, in the current pp HLT menu there does not seem to be any energy recovery done neither so it may be possible to actually use this in 2025 already. This needs to be checked in more detail however.

We can add a customization function to this PR but I think we should only activate it once it is confirmed that the portable producer gives the same results than the currently used one at the HLT. Of course we could also do this in a separate PR if that is preferred.

Looking again in more detail, the removal of dead channels is actually done in the current HLT menu so the Alpaka module is not yet able to produce the same results than the legacy module since that part has not been implemented. In this case it does not seem to make sense to provide a customisation function in this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment