-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove CUDA 11.0/11.1 support and upgrade to CCCL 2.2.0+ #1155
base: master
Are you sure you want to change the base?
Conversation
using CCCL with 11.0 is DOA:
To be honest, I don't strictly mind dropping 11.0 support (or 11.1, so we can just rely on 11.2 +'s stability promises). I could make 11.0 work by using old cub/thrust from their respective locations, but eventually that will break. Will discuss this before I do anything else. |
This seems like the best plan, assuming you're not aware of any HPC with only super old CUDA available. |
Bessemer's central install is 11.0 or 11.1 iirc, but the driver is 12.2/3 compatible, so we can always grab the toolkit from conda (assuming cmake agrees...) (or open a ticket). Google colab is 11.2+ as well (the reason we were producing 11.0 wheels previously). 11.2 was released December 2020, 11.1 September 2020, and 11.0 June 2020. Turns out 11.0 on linux compiles fine too, just windows it doesn't, so even if we don't "support" it, it currently works. so it could be "11.2+ is supported, 11.0 & 11.1 may work under linux, but are not supported" |
Your choice tbh, you manage it. 11.2+ only feels simpler. |
During meeting concluded that dropping CUDA 11.0 support in favour of future CCCL support is worthwhile / CUDA 11.2 is old enough to be a minimum. I'll adjust CI to reflect this and test windows on 11.1+ at some point. We don't need to drop 11.1, but probably simpler to just say 11.2+ as then our python wheels will be consistent with what we support. |
df8f583
to
e947497
Compare
Looks like the previous windows cuda 11.0 CI errors were actually a github actions windows-2019 vs windows-2022 difference. i.e. visual studio 2019 issue?
|
I have that on my home desktop, can try fighting it at some point. Or we just sack off VS2019, given it doesnt built current jitify2-preprocess branch either. |
It's still supported by CUDA, which libcucxx/CCCL claim to support the same platforms as so afiak it should work / be supported. I think I've got vs 2019 installed too, just need to spend the time in windows at some point. Looking at the full draft-release CI log, there is a windows-2019 CUDA 11.8 job which did pass, so its an older CUDA + vs2019 thing when doing things with libcu++'s type_traits (via cub should be our only inclusion of libcu++ headers currently). Might be worth trying cccl on its own to see if we can repro it on vs2019 as well when testing locally (if the main issue can be repro'd locally, otherwise it'll be CI fun). |
I do have visual studio 2019 installed, but only cuda 11.7 and 12.0 on windows currently. Can select the cuda toolkit via $ cmake .. -A x64 -G "Visual Studio 16 2019" -T cuda=11.7 -DCMAKE_CUDA_ARCHITECTURES=86 -DFLAMEGPU_BUILD_TESTS=ON
-- Selecting Windows SDK version 10.0.19041.0 to target Windows 10.0.22621.
...
-- The CXX compiler identification is MSVC 19.29.30139.0
...
-- Looking for a CUDA compiler - C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.7/bin/nvcc.exe
... And building with: $ cmake --build . --target flamegpu tests -j 8 --config Release This correctly configured, and built the current state of the CCCL branch (though an env var may be needed for RTC if the cuda on path older than the one selected) Installing 11.2 to try and repro the issue, but unlikely to actualyl dig into any failures this evening if it does repro the error. |
11.2 + visual studio 2019 has reproduced the error locally. A quick attempt to configure cccl standalone failed as I'm missing a test suite dependency. |
Reproduced the error using the CCCL example locally with visual studio 2019.
I'll set up a quick reproducer CI repo to pin down the affected CUDA versions and report it upstream. |
CUDA 12.3 vs 2022 tests pass. CI sweep pinning down vs2019 + CUDA 11.x versions which exhibit the libcu++ compilation error, will report upstream tomorrow once versions are known. 11.7 works, 11.2 doesn't. CCCL/libcu++ includes some msvc conditions for using the offending symbol for an older vs 2019 version(s) + other combos, so it could be that a more recent vs 2019 sub-update (locally have 1929, 1924 was the prev version with a workaround) so that might be relevant. https://github.com/ptheywood/cccl-is-constant-evaluated-mwe/actions/runs/7105395494/job/19342497821 |
CUDA 11.3+ is fine with visual studio 2019, so its just 11.2 (and 11.1) which breaks for us. This would prevent us from producing 11.2 wheels. We can't drop 11.2 yet, as its the version installed on google colab iirc (and I'd rather not either). I've reported this upstream: NVIDIA/cccl#1179 |
Upstream has a PR in to fix this. The simplest way to incorporate this into the cmake logic would be to just make our minimum the next release post merge, but I'm not sure when that would be. Additionally, if we depend on something newer than 2.2.0 CCCL treats itself as system headers, even when not using isystem which will be good for our relatively strong warning levels. |
CCCL 2.3.0 has been released on github: https://github.com/NVIDIA/cccl/releases/tag/v2.3.0 This should include the fixes we require, so making our min CCCL 2.3.0 and fetching newer if not found should be ok, but worth checking that both required fixes made it into this release. |
This is to support using newer CCCL (cuda 11.0 not supported) and simplify the pyflamegpu distribution matrix (11.1). 11.0 is currently builds and passes tests on linux, but does not build on windows. 11.1 currently builds and passes tests on both. Workarounds and warning specific to these versions are not being removed just incase, and camek will only warn but not error if they are used (as the currently work, just incase 11.2+ is not available somwhere). Also fixes some typos as and when encountered
… we need. Waiting for 2.3.2 or 2.4.0
The v2.3.0 tagged commit does not include the cmake fix or msvc fixes, although they were backported to the There's a v2.3.1 tagged commit which also does not include these fixes, so presumably we need to wait for 2.3.2 or 2.4.0 We'll probably just need to keep the first |
CUDA 12.4 has been released, which includes CCCL 2.3.1 according to the release notes.
|
CCCL v2.3.2 has just been tagged / released, which does include the 2 fixes we need, so it should now be possible to switch to this / this shouldn't be blocked any more. |
CCCL 2.5.0 has been released on github, mostly fixes but also some potentially interesting additions (but not yet safe to use). Shouldn't be a need to bump our minimum/fetched version to this in the PR though, 2.3.2 should be fine still (unless i've misssed something) This PR is more or less good to go, just wanting to re-run windows testing with it requiring 2.3.2 just in case (Though I believe it would be fine). And a rebase would prolly be worthwhile. Merging this prior to a non pre-release would be best, due to dropping CUDA 11.0/11.1 support. |
Drops CUDA 11.0 and 11.1 support, and replaces CUB/Thrust with CCCL for future support.
nvidia/CCCL is the new combined home of CUB, thrust and libcudacxx as of CUDA 12.2.
This switches to use CCCL for better support moving forwards, CUB/Thrust 2.x api and to get rid of a number of cub/thrust CMake workarounds. Also implicitly adds libcudacxx as a dependency via cub.
CCCL does not support 11.0, hence the need for removal.
CUDA 11.1 support is being removed to simplify the build matrix for python, due to 11.2+ abi stability.
Closes #1021
Todo
v2.3.02.3.2/2.4.0, require 2.3.2/2.4.0, test CUDA 11.0/11.1/11.2 again.find_package
issue that has also been fixed in branch/2.3.x but not in 2.3.0/2.3.1 tagged commits.