-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix CI offload build failure and add checks upon CMake return failure #4839
Conversation
e115309
to
3a4f0f5
Compare
3a4f0f5
to
885a85d
Compare
Test this please |
Test this please |
AFQMC shows some issue as https://cdash.qmcpack.org/CDash/testDetails.php?test=28616963&build=447804 when using CUDA 12.1. Revert back to 11.2. |
We should make some issues to track the problems found here. e.g. We could easily add a couple of different labeled CUDA versions in the nightlies, labeling the CI builds etc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for spending all the time investigating this.
I am compiling with these flags and compilers, would it be meaningful test if I ran it: $ mpicc --version
gcc (Ubuntu 13.2.0-4ubuntu3) 13.2.0
correaa@cuk:~/qmcpack/build.cuda12.0$ mpicxx --version
g++ (Ubuntu 13.2.0-4ubuntu3) 13.2.0
correaa@cuk:~/qmcpack/build.cuda12.0$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Fri_Jan__6_16:45:21_PST_2023
Cuda compilation tools, release 12.0, V12.0.140
Build cuda_12.0.r12.0/compiler.32267302_0
$ cmake -DCMAKE_C_COMPILER=mpicc -DCMAKE_CXX_COMPILER=mpicxx -DMPIEXEC_EXECUTABLE=mpirun -DBUILD_AFQMC=ON -DENABLE_CUDA=ON -DQMC_GPU_ARCHS=sm_75 -DENABLE_OFFLOAD=ON -DQMC_COMPLEX=0 -DQMC_MIXED_PRECISION=1 -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_CXX_FLAGS="-fno-lto" .. --fresh |
Not understanding what your question is and why you are asking in this PR. |
@prckent asked me to look into this PR. I interpret it is a regression. |
Still not understanding... What regression? |
Do you intend to use GCC for offload? |
not sure what is the problem.
i think it is about compiling with cuda 12.x.
For offload also it seems I need gcc 12, right?
…On Tue, Dec 5, 2023 at 6:55 PM Ye Luo ***@***.***> wrote:
Do you intend to use GCC for offload?
—
Reply to this email directly, view it on GitHub
<#4839 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAXICU2XP2NFLQNOF3ASDYLYH7NALAVCNFSM6AAAAAA7QLYUXCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNBSGAYDGMRZGM>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
The request is to investigate + make the afqmc tests work. https://cdash.qmcpack.org/CDash/testSummary.php?project=1&name=deterministic-unit_test_afqmc_numerics&date=2023-12-06 The afqmc numerics check appears to be broken in some circumstances. It should work with any recent compiler and CUDA, and at worst we should understand where problems in the toolchain are. Real space is now working with 12.3 due to bug fixes by NV. If this is some quirk of the nightly builds, we can tolerate that, but we do need to understand it. |
I see, Ye-EPYC-server Clang17-Offload-Real-Mixed-Release 20231206-0300-Deterministic Failed 2.70 To be systematic, where can I find the compilation options for "Clang17-Offload-Real-Mixed-Release" for example? (Searching the repository for this string, doesn't give a result https://github.com/search?q=repo%3AQMCPACK%2Fqmcpack%20%22Clang17-Offload-Real-Mixed-Release%22&type=code ) We have seen compilation problems with 12.3 in other codes. |
This is not obvious, but for anything reporting to cdash, you can check under the "nightly" test category and selecting the build name or the configure and build entries in the same row, e.g. https://cdash.qmcpack.org/CDash/buildSummary.php?buildid=451448 . This gives access to the cmake output. |
Please review the developer documentation
on the wiki of this project that contains help and requirements.
Proposed changes
Describe what this PR changes and why. If it closes an issue, link to it here
with a supported keyword.
What type(s) of changes does this code introduce?
Delete the items that do not apply
Does this introduce a breaking change?
What systems has this change been tested on?
Checklist
Update the following with a yes where the items apply. If you're unsure about any of them, don't hesitate to ask. This is
simply a reminder of what we are going to look for before merging your code.