From e23d0c7eaaf7b9858ed58d25dbcdf6ed41ef0b29 Mon Sep 17 00:00:00 2001 From: Michael Schellenberger Costa Date: Thu, 30 Jan 2025 09:46:22 +0100 Subject: [PATCH] Backport PRs #3201, #3523, #3547, #3580 to the 2.8.x branch. (#3536) * [FEA]: Introduce Python module with CCCL headers (#3201) * Add cccl/python/cuda_cccl directory and use from cuda_parallel, cuda_cooperative * Run `copy_cccl_headers_to_aude_include()` before `setup()` * Create python/cuda_cccl/cuda/_include/__init__.py, then simply import cuda._include to find the include path. * Add cuda.cccl._version exactly as for cuda.cooperative and cuda.parallel * Bug fix: cuda/_include only exists after shutil.copytree() ran. * Use `f"cuda-cccl @ file://{cccl_path}/python/cuda_cccl"` in setup.py * Remove CustomBuildCommand, CustomWheelBuild in cuda_parallel/setup.py (they are equivalent to the default functions) * Replace := operator (needs Python 3.8+) * Fix oversights: remove `pip3 install ./cuda_cccl` lines from README.md * Restore original README.md: `pip3 install -e` now works on first pass. * cuda_cccl/README.md: FOR INTERNAL USE ONLY * Remove `$pymajor.$pyminor.` prefix in cuda_cccl _version.py (as suggested under https://github.com/NVIDIA/cccl/pull/3201#discussion_r1894035917) Command used: ci/update_version.sh 2 8 0 * Modernize pyproject.toml, setup.py Trigger for this change: * https://github.com/NVIDIA/cccl/pull/3201#discussion_r1894043178 * https://github.com/NVIDIA/cccl/pull/3201#discussion_r1894044996 * Install CCCL headers under cuda.cccl.include Trigger for this change: * https://github.com/NVIDIA/cccl/pull/3201#discussion_r1894048562 Unexpected accidental discovery: cuda.cooperative unit tests pass without CCCL headers entirely. * Factor out cuda_cccl/cuda/cccl/include_paths.py * Reuse cuda_cccl/cuda/cccl/include_paths.py from cuda_cooperative * Add missing Copyright notice. * Add missing __init__.py (cuda.cccl) * Add `"cuda.cccl"` to `autodoc.mock_imports` * Move cuda.cccl.include_paths into function where it is used. (Attempt to resolve Build and Verify Docs failure.) * Add # TODO: move this to a module-level import * Modernize cuda_cooperative/pyproject.toml, setup.py * Convert cuda_cooperative to use hatchling as build backend. * Revert "Convert cuda_cooperative to use hatchling as build backend." This reverts commit 61637d608da06fcf6851ef6197f88b5e7dbc3bbe. * Move numpy from [build-system] requires -> [project] dependencies * Move pyproject.toml [project] dependencies -> setup.py install_requires, to be able to use CCCL_PATH * Remove copy_license() and use license_files=["../../LICENSE"] instead. * Further modernize cuda_cccl/setup.py to use pathlib * Trivial simplifications in cuda_cccl/pyproject.toml * Further simplify cuda_cccl/pyproject.toml, setup.py: remove inconsequential code * Make cuda_cooperative/pyproject.toml more similar to cuda_cccl/pyproject.toml * Add taplo-pre-commit to .pre-commit-config.yaml * taplo-pre-commit auto-fixes * Use pathlib in cuda_cooperative/setup.py * CCCL_PYTHON_PATH in cuda_cooperative/setup.py * Modernize cuda_parallel/pyproject.toml, setup.py * Use pathlib in cuda_parallel/setup.py * Add `# TOML lint & format` comment. * Replace MANIFEST.in with `[tool.setuptools.package-data]` section in pyproject.toml * Use pathlib in cuda/cccl/include_paths.py * pre-commit autoupdate (EXCEPT clang-format, which was manually restored) * Fixes after git merge main * Resolve warning: AttributeError: '_Reduce' object has no attribute 'build_result' ``` =========================================================================== warnings summary =========================================================================== tests/test_reduce.py::test_reduce_non_contiguous /home/coder/cccl/python/devenv/lib/python3.12/site-packages/_pytest/unraisableexception.py:85: PytestUnraisableExceptionWarning: Exception ignored in: Traceback (most recent call last): File "/home/coder/cccl/python/cuda_parallel/cuda/parallel/experimental/algorithms/reduce.py", line 132, in __del__ bindings.cccl_device_reduce_cleanup(ctypes.byref(self.build_result)) ^^^^^^^^^^^^^^^^^ AttributeError: '_Reduce' object has no attribute 'build_result' warnings.warn(pytest.PytestUnraisableExceptionWarning(msg)) -- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html ============================================================= 1 passed, 93 deselected, 1 warning in 0.44s ============================================================== ``` * Move `copy_cccl_headers_to_cuda_cccl_include()` functionality to `class CustomBuildPy` * Introduce cuda_cooperative/constraints.txt * Also add cuda_parallel/constraints.txt * Add `--constraint constraints.txt` in ci/test_python.sh * Update Copyright dates * Switch to https://github.com/ComPWA/taplo-pre-commit (the other repo has been archived by the owner on Jul 1, 2024) For completeness: The other repo took a long time to install into the pre-commit cache; so long it lead to timeouts in the CCCL CI. * Remove unused cuda_parallel jinja2 dependency (noticed by chance). * Remove constraints.txt files, advertise running `pip install cuda-cccl` first instead. * Make cuda_cooperative, cuda_parallel testing completely independent. * Run only test_python.sh [skip-rapids][skip-matx][skip-docs][skip-vdc] * Try using another runner (because V100 runners seem to be stuck) [skip-rapids][skip-matx][skip-docs][skip-vdc] * Fix sign-compare warning (#3408) [skip-rapids][skip-matx][skip-docs][skip-vdc] * Revert "Try using another runner (because V100 runners seem to be stuck) [skip-rapids][skip-matx][skip-docs][skip-vdc]" This reverts commit ea33a218ed77a075156cd1b332047202adb25aa2. Error message: https://github.com/NVIDIA/cccl/pull/3201#issuecomment-2594012971 * Try using A100 runner (because V100 runners still seem to be stuck) [skip-rapids][skip-matx][skip-docs][skip-vdc] * Also show cuda-cooperative site-packages, cuda-parallel site-packages (after pip install) [skip-rapids][skip-matx][skip-docs][skip-vdc] * Try using l4 runner (because V100 runners still seem to be stuck) [skip-rapids][skip-matx][skip-docs][skip-vdc] * Restore original ci/matrix.yaml [skip-rapids] * Use for loop in test_python.sh to avoid code duplication. * Run only test_python.sh [skip-rapids][skip-matx][skip-docs][skip-vdc][skip pre-commit.ci] * Comment out taplo-lint in pre-commit config [skip-rapids][skip-matx][skip-docs][skip-vdc] * Revert "Run only test_python.sh [skip-rapids][skip-matx][skip-docs][skip-vdc][skip pre-commit.ci]" This reverts commit ec206fd8b50a6a293e00a5825b579e125010b13d. * Implement suggestion by @shwina (https://github.com/NVIDIA/cccl/pull/3201#pullrequestreview-2556918460) * Address feedback by @leofang --------- Co-authored-by: Bernhard Manfred Gruber * cuda.parallel: invoke pytest directly rather than via `python -m pytest` (#3523) Co-authored-by: Ashwin Srinath * Copy file from PR #3547 (bugfix/drop_pipe_in_lit by @wmaxey) * Revert "cuda.parallel: invoke pytest directly rather than via `python -m pytest` (#3523)" This reverts commit a2e21cbdd2fa15a35b3a0df8eb7e2fc84adc46bc. * Replace pipes.quote with shlex.quote in lit config (#3547) * Replace pipes.quote with shlex.quote * Drop TBB run on windows to unblock CI * Update ci/matrix.yaml Co-authored-by: Michael Schellenberger Costa Co-authored-by: Bernhard Manfred Gruber * Remove nvks runners from testing pool. (#3580) --------- Co-authored-by: Bernhard Manfred Gruber Co-authored-by: Ashwin Srinath <3190405+shwina@users.noreply.github.com> Co-authored-by: Ashwin Srinath Co-authored-by: Wesley Maxey <71408887+wmaxey@users.noreply.github.com> Co-authored-by: Michael Schellenberger Costa Co-authored-by: Allison Piper --- ci/matrix.yaml | 14 ++++---- ci/test_pycuda.sh | 33 ++++++++++--------- ci/update_version.sh | 1 + .../test/utils/libcudacxx/test/config.py | 28 ++++++++-------- python/cuda_cooperative/.gitignore | 1 - python/cuda_cooperative/MANIFEST.in | 1 - python/cuda_cooperative/README.md | 1 + python/cuda_cooperative/pyproject.toml | 4 +-- python/cuda_cooperative/setup.py | 2 +- python/cuda_parallel/.gitignore | 1 - python/cuda_parallel/MANIFEST.in | 1 - python/cuda_parallel/README.md | 1 + python/cuda_parallel/pyproject.toml | 4 +-- python/cuda_parallel/setup.py | 9 +++-- 14 files changed, 51 insertions(+), 50 deletions(-) delete mode 100644 python/cuda_cooperative/MANIFEST.in delete mode 100644 python/cuda_parallel/MANIFEST.in diff --git a/ci/matrix.yaml b/ci/matrix.yaml index cfed59751b0..fe1be3f84fd 100644 --- a/ci/matrix.yaml +++ b/ci/matrix.yaml @@ -219,13 +219,13 @@ projects: # testing -> Runner with GPU is in a nv-gh-runners testing pool gpus: - v100: { sm: 70 } # 32 GB, 40 runners - t4: { sm: 75, testing: true } # 16 GB, 8 runners - rtx2080: { sm: 75, testing: true } # 8 GB, 8 runners - rtxa6000: { sm: 86, testing: true } # 48 GB, 12 runners - l4: { sm: 89, testing: true } # 24 GB, 48 runners - rtx4090: { sm: 89, testing: true } # 24 GB, 10 runners - h100: { sm: 90 } # 80 GB, 16 runners + v100: { sm: 70 } # 32 GB, 40 runners + t4: { sm: 75 } # 16 GB, 10 runners + rtx2080: { sm: 75 } # 8 GB, 12 runners + rtxa6000: { sm: 86 } # 48 GB, 12 runners + l4: { sm: 89 } # 24 GB, 48 runners + rtx4090: { sm: 89 } # 24 GB, 10 runners + h100: { sm: 90 } # 80 GB, 16 runners # Tags are used to define a `matrix job` in the workflow section. # diff --git a/ci/test_pycuda.sh b/ci/test_pycuda.sh index bd66cc57716..34900fdb8e0 100755 --- a/ci/test_pycuda.sh +++ b/ci/test_pycuda.sh @@ -8,25 +8,28 @@ print_environment_details fail_if_no_gpu -readonly prefix="${BUILD_DIR}/python/" -export PYTHONPATH="${prefix}:${PYTHONPATH:-}" +begin_group "⚙️ Existing site-packages" +pip freeze +end_group "⚙️ Existing site-packages" -pushd ../python/cuda_cooperative >/dev/null +for module in cuda_parallel cuda_cooperative; do -run_command "⚙️ Pip install cuda_cooperative" pip install --force-reinstall --upgrade --target "${prefix}" .[test] -run_command "🚀 Pytest cuda_cooperative" python -m pytest -v ./tests + pushd "../python/${module}" >/dev/null -popd >/dev/null + TEMP_VENV_DIR="/tmp/${module}_venv" + rm -rf "${TEMP_VENV_DIR}" + python -m venv "${TEMP_VENV_DIR}" + . "${TEMP_VENV_DIR}/bin/activate" + echo 'cuda-cccl @ file:///home/coder/cccl/python/cuda_cccl' > /tmp/cuda-cccl_constraints.txt + run_command "⚙️ Pip install ${module}" pip install -c /tmp/cuda-cccl_constraints.txt .[test] + begin_group "⚙️ ${module} site-packages" + pip freeze + end_group "⚙️ ${module} site-packages" + run_command "🚀 Pytest ${module}" python -m pytest -v ./tests + deactivate -pushd ../python/cuda_parallel >/dev/null + popd >/dev/null -# Temporarily install the package twice to populate include directory as part of the first installation -# and to let manifest discover these includes during the second installation. Do not forget to remove the -# second installation after https://github.com/NVIDIA/cccl/issues/2281 is addressed. -run_command "⚙️ Pip install cuda_parallel once" pip install --force-reinstall --upgrade --target "${prefix}" .[test] -run_command "⚙️ Pip install cuda_parallel twice" pip install --force-reinstall --upgrade --target "${prefix}" .[test] -run_command "🚀 Pytest cuda_parallel" python -m pytest -v ./tests - -popd >/dev/null +done print_time_summary diff --git a/ci/update_version.sh b/ci/update_version.sh index 9184b98e6a9..1f8bc182015 100755 --- a/ci/update_version.sh +++ b/ci/update_version.sh @@ -103,6 +103,7 @@ update_file "$CUDAX_CMAKE_VERSION_FILE" "set(cudax_VERSION_MAJOR \([0-9]\+\))" " update_file "$CUDAX_CMAKE_VERSION_FILE" "set(cudax_VERSION_MINOR \([0-9]\+\))" "set(cudax_VERSION_MINOR $minor)" update_file "$CUDAX_CMAKE_VERSION_FILE" "set(cudax_VERSION_PATCH \([0-9]\+\))" "set(cudax_VERSION_PATCH $patch)" +update_file "$CUDA_CCCL_VERSION_FILE" "^__version__ = \"\([0-9.]\+\)\"" "__version__ = \"$major.$minor.$patch\"" update_file "$CUDA_COOPERATIVE_VERSION_FILE" "^__version__ = \"\([0-9.]\+\)\"" "__version__ = \"$pymajor.$pyminor.$major.$minor.$patch\"" update_file "$CUDA_PARALLEL_VERSION_FILE" "^__version__ = \"\([0-9.]\+\)\"" "__version__ = \"$pymajor.$pyminor.$major.$minor.$patch\"" diff --git a/libcudacxx/test/utils/libcudacxx/test/config.py b/libcudacxx/test/utils/libcudacxx/test/config.py index d1aef968a22..dc1069140f0 100644 --- a/libcudacxx/test/utils/libcudacxx/test/config.py +++ b/libcudacxx/test/utils/libcudacxx/test/config.py @@ -1385,19 +1385,19 @@ def configure_modules(self): def configure_substitutions(self): sub = self.config.substitutions - cxx_path = pipes.quote(self.cxx.path) + cxx_path = shlex.quote(self.cxx.path) # Configure compiler substitutions sub.append(('%cxx', cxx_path)) sub.append(('%libcxx_src_root', self.libcudacxx_src_root)) # Configure flags substitutions - flags_str = ' '.join([pipes.quote(f) for f in self.cxx.flags]) - compile_flags_str = ' '.join([pipes.quote(f) for f in self.cxx.compile_flags]) - link_flags_str = ' '.join([pipes.quote(f) for f in self.cxx.link_flags]) - all_flags = '%s %s %s' % (flags_str, compile_flags_str, link_flags_str) - sub.append(('%flags', flags_str)) - sub.append(('%compile_flags', compile_flags_str)) - sub.append(('%link_flags', link_flags_str)) - sub.append(('%all_flags', all_flags)) + flags_str = " ".join([shlex.quote(f) for f in self.cxx.flags]) + compile_flags_str = " ".join([shlex.quote(f) for f in self.cxx.compile_flags]) + link_flags_str = " ".join([shlex.quote(f) for f in self.cxx.link_flags]) + all_flags = "%s %s %s" % (flags_str, compile_flags_str, link_flags_str) + sub.append(("%flags", flags_str)) + sub.append(("%compile_flags", compile_flags_str)) + sub.append(("%link_flags", link_flags_str)) + sub.append(("%all_flags", all_flags)) if self.cxx.isVerifySupported(): verify_str = ' ' + ' '.join(self.cxx.verify_flags) + ' ' sub.append(('%verify', verify_str)) @@ -1422,11 +1422,11 @@ def configure_substitutions(self): # Configure run env substitution. sub.append(('%run', '%t.exe')) # Configure not program substitutions - not_py = os.path.join(self.libcudacxx_src_root, 'test', 'utils', 'not.py') - not_str = '%s %s ' % (pipes.quote(sys.executable), pipes.quote(not_py)) - sub.append(('not ', not_str)) - if self.get_lit_conf('libcudacxx_gdb'): - sub.append(('%libcxx_gdb', self.get_lit_conf('libcudacxx_gdb'))) + not_py = os.path.join(self.libcudacxx_src_root, "test", "utils", "not.py") + not_str = "%s %s " % (shlex.quote(sys.executable), shlex.quote(not_py)) + sub.append(("not ", not_str)) + if self.get_lit_conf("libcudacxx_gdb"): + sub.append(("%libcxx_gdb", self.get_lit_conf("libcudacxx_gdb"))) def can_use_deployment(self): # Check if the host is on an Apple platform using clang. diff --git a/python/cuda_cooperative/.gitignore b/python/cuda_cooperative/.gitignore index 15c09b246c1..a9904c10554 100644 --- a/python/cuda_cooperative/.gitignore +++ b/python/cuda_cooperative/.gitignore @@ -1,3 +1,2 @@ -cuda/_include env *egg-info diff --git a/python/cuda_cooperative/MANIFEST.in b/python/cuda_cooperative/MANIFEST.in deleted file mode 100644 index 848cbfe2e81..00000000000 --- a/python/cuda_cooperative/MANIFEST.in +++ /dev/null @@ -1 +0,0 @@ -recursive-include cuda/_include * diff --git a/python/cuda_cooperative/README.md b/python/cuda_cooperative/README.md index c202d1d6c17..673e130bbe0 100644 --- a/python/cuda_cooperative/README.md +++ b/python/cuda_cooperative/README.md @@ -7,6 +7,7 @@ Please visit the documentation here: https://nvidia.github.io/cccl/python.html. ## Local development ```bash +pip3 install -e ../cuda_cccl pip3 install -e .[test] pytest -v ./tests/ ``` diff --git a/python/cuda_cooperative/pyproject.toml b/python/cuda_cooperative/pyproject.toml index 4ab52c80318..a30fc856994 100644 --- a/python/cuda_cooperative/pyproject.toml +++ b/python/cuda_cooperative/pyproject.toml @@ -1,7 +1,7 @@ -# Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. ALL RIGHTS RESERVED. +# Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. ALL RIGHTS RESERVED. # # SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception [build-system] -requires = ["packaging", "setuptools>=61.0.0", "wheel"] +requires = ["setuptools>=61.0.0"] build-backend = "setuptools.build_meta" diff --git a/python/cuda_cooperative/setup.py b/python/cuda_cooperative/setup.py index f7eff80bdab..bfb06362c0d 100644 --- a/python/cuda_cooperative/setup.py +++ b/python/cuda_cooperative/setup.py @@ -1,4 +1,4 @@ -# Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. ALL RIGHTS RESERVED. +# Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. ALL RIGHTS RESERVED. # # SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception diff --git a/python/cuda_parallel/.gitignore b/python/cuda_parallel/.gitignore index 8e0d030ff6a..7fc9da1604e 100644 --- a/python/cuda_parallel/.gitignore +++ b/python/cuda_parallel/.gitignore @@ -1,4 +1,3 @@ -cuda/_include env *egg-info *so diff --git a/python/cuda_parallel/MANIFEST.in b/python/cuda_parallel/MANIFEST.in deleted file mode 100644 index 848cbfe2e81..00000000000 --- a/python/cuda_parallel/MANIFEST.in +++ /dev/null @@ -1 +0,0 @@ -recursive-include cuda/_include * diff --git a/python/cuda_parallel/README.md b/python/cuda_parallel/README.md index 98a3a3c92d0..1dad4b0f03e 100644 --- a/python/cuda_parallel/README.md +++ b/python/cuda_parallel/README.md @@ -7,6 +7,7 @@ Please visit the documentation here: https://nvidia.github.io/cccl/python.html. ## Local development ```bash +pip3 install -e ../cuda_cccl pip3 install -e .[test] pytest -v ./tests/ ``` diff --git a/python/cuda_parallel/pyproject.toml b/python/cuda_parallel/pyproject.toml index 4ab52c80318..a30fc856994 100644 --- a/python/cuda_parallel/pyproject.toml +++ b/python/cuda_parallel/pyproject.toml @@ -1,7 +1,7 @@ -# Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. ALL RIGHTS RESERVED. +# Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. ALL RIGHTS RESERVED. # # SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception [build-system] -requires = ["packaging", "setuptools>=61.0.0", "wheel"] +requires = ["setuptools>=61.0.0"] build-backend = "setuptools.build_meta" diff --git a/python/cuda_parallel/setup.py b/python/cuda_parallel/setup.py index c29a5237fc0..41ead058696 100644 --- a/python/cuda_parallel/setup.py +++ b/python/cuda_parallel/setup.py @@ -1,10 +1,9 @@ -# Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. ALL RIGHTS RESERVED. +# Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. ALL RIGHTS RESERVED. # # SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception -import os -import shutil import subprocess +from pathlib import Path from setuptools import Command, Extension, setup, find_packages, find_namespace_packages from setuptools.command.build_py import build_py @@ -84,8 +83,8 @@ def build_extension(self, ext): '-DCMAKE_BUILD_TYPE=Release', ] - if not os.path.exists(self.build_temp): - os.makedirs(self.build_temp) + build_temp_path = Path(self.build_temp) + build_temp_path.mkdir(parents=True, exist_ok=True) subprocess.check_call(['cmake', cccl_path] + cmake_args, cwd=self.build_temp)