-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[backport] Improve handling of TF CUDA tests for 14_0_X #44375
[backport] Improve handling of TF CUDA tests for 14_0_X #44375
Conversation
A new Pull Request was created by @valsdav for CMSSW_14_0_X. It involves the following packages:
@valsdav, @wpmccormack, @cmsbuild can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
cms-bot internal usage |
hold
|
Pull request has been put on hold by @antoniovilela |
a8063f6
to
fcc3e4f
Compare
Pull request #44375 was updated. @valsdav, @cmsbuild, @wpmccormack can you please check and sign again. |
enable gpu |
please test |
+1 Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b349d4/38238/summary.html Comparison SummarySummary:
GPU Comparison SummarySummary:
|
+ml technical. Avoid running cuda tests if tensorflow is not compiled for Cuda. |
@antoniovilela I don't remember any more why this was put on hold. Do you think we can proceed with it ? |
REMINDER @antoniovilela, @rappoccio, @sextonkennedy: This PR was tested with #45143, please check if they should be merged together |
enable gpu |
please test |
+1 Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b349d4/40076/summary.html Comparison SummarySummary:
GPU Comparison SummarySummary:
|
unhold |
This pull request is fully signed and it will be integrated in one of the next CMSSW_14_0_X IBs (tests are also fine) and once validation in the development release cycle CMSSW_14_1_X is complete. This pull request will now be reviewed by the release team before it's merged. @antoniovilela, @rappoccio, @sextonkennedy (and backports should be raised in the release meeting by the corresponding L2) |
+1 |
PR description:
This PR improves the handling of CUDA unit tests for the TensorFlow package, using a new
tf_cuda_support
tool from scram , which checks if the GPU support is enabled in TensorFlow compilation.The PR also makes the TF cuda tests more strict by checking explicitely if a CUDA device is visible to TF and not only to cmssw.
The test
testTFVisibleDevicesCUDA
is in fact run by the framework as a CUDA device is registered, but then TF does not recognize the device and the test fails. The othertestTF*CUDA
tests were passing silently using the CPU to run the test. After this PR all the TF sessions usingtf::backend::cuda
, but not finding a GPU will fail explicitly.