Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

work around erroneous "undefined in device code" error in basic_any #3614

Merged
merged 1 commit into from
Jan 30, 2025

Conversation

ericniebler
Copy link
Collaborator

Description

this PR addresses an issue in `basic_any` that was causing nvcc to erroneously think that certain host-only entities were needed in device code, leading to an error.

the code was using the value of a host-only member function pointer in a host-only constexpr function. something about how this function was getting used in some code i was working on made nvcc think the member function was also needed on device. i have not been able to isolate the source of the problem, so i can't provide a test case for it, unfortunately.

this PR changes the constexpr function to a type computation, with the member function pointer as a NTTP. that seems to mollify nvcc.

@ericniebler ericniebler requested a review from a team as a code owner January 30, 2025 18:00
@ericniebler ericniebler enabled auto-merge (squash) January 30, 2025 18:42
Copy link
Contributor

🟩 CI finished in 58m 18s: Pass: 100%/20 | Total: 1h 59m | Avg: 5m 59s | Max: 14m 56s | Hits: 297%/522
  • 🟩 cudax: Pass: 100%/20 | Total: 1h 59m | Avg: 5m 59s | Max: 14m 56s | Hits: 297%/522

    🟩 cpu
      🟩 amd64              Pass: 100%/16  | Total:  1h 45m | Avg:  6m 34s | Max: 14m 56s | Hits: 297%/522   
      🟩 arm64              Pass: 100%/4   | Total: 14m 34s | Avg:  3m 38s | Max:  4m 04s
    🟩 ctk
      🟩 12.0               Pass: 100%/1   | Total: 11m 31s | Avg: 11m 31s | Max: 11m 31s | Hits: 297%/261   
      🟩 12.5               Pass: 100%/2   | Total: 13m 14s | Avg:  6m 37s | Max:  6m 43s
      🟩 12.6               Pass: 100%/17  | Total:  1h 34m | Avg:  5m 35s | Max: 14m 56s | Hits: 297%/261   
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/1   | Total: 11m 31s | Avg: 11m 31s | Max: 11m 31s | Hits: 297%/261   
      🟩 nvcc12.5           Pass: 100%/2   | Total: 13m 14s | Avg:  6m 37s | Max:  6m 43s
      🟩 nvcc12.6           Pass: 100%/17  | Total:  1h 34m | Avg:  5m 35s | Max: 14m 56s | Hits: 297%/261   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/20  | Total:  1h 59m | Avg:  5m 59s | Max: 14m 56s | Hits: 297%/522   
    🟩 cxx
      🟩 Clang14            Pass: 100%/1   | Total:  3m 57s | Avg:  3m 57s | Max:  3m 57s
      🟩 Clang15            Pass: 100%/1   | Total:  4m 11s | Avg:  4m 11s | Max:  4m 11s
      🟩 Clang16            Pass: 100%/1   | Total:  4m 13s | Avg:  4m 13s | Max:  4m 13s
      🟩 Clang17            Pass: 100%/1   | Total:  4m 28s | Avg:  4m 28s | Max:  4m 28s
      🟩 Clang18            Pass: 100%/4   | Total: 22m 52s | Avg:  5m 43s | Max: 11m 29s
      🟩 GCC10              Pass: 100%/1   | Total:  3m 50s | Avg:  3m 50s | Max:  3m 50s
      🟩 GCC11              Pass: 100%/1   | Total:  4m 03s | Avg:  4m 03s | Max:  4m 03s
      🟩 GCC12              Pass: 100%/2   | Total: 19m 09s | Avg:  9m 34s | Max: 14m 56s
      🟩 GCC13              Pass: 100%/4   | Total: 14m 04s | Avg:  3m 31s | Max:  4m 04s
      🟩 MSVC14.36          Pass: 100%/1   | Total: 11m 31s | Avg: 11m 31s | Max: 11m 31s | Hits: 297%/261   
      🟩 MSVC14.39          Pass: 100%/1   | Total: 14m 08s | Avg: 14m 08s | Max: 14m 08s | Hits: 297%/261   
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 13m 14s | Avg:  6m 37s | Max:  6m 43s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/8   | Total: 39m 41s | Avg:  4m 57s | Max: 11m 29s
      🟩 GCC                Pass: 100%/8   | Total: 41m 06s | Avg:  5m 08s | Max: 14m 56s
      🟩 MSVC               Pass: 100%/2   | Total: 25m 39s | Avg: 12m 49s | Max: 14m 08s | Hits: 297%/522   
      🟩 NVHPC              Pass: 100%/2   | Total: 13m 14s | Avg:  6m 37s | Max:  6m 43s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/4   | Total: 35m 03s | Avg:  8m 45s | Max: 14m 56s
      🟩 v100               Pass: 100%/16  | Total:  1h 24m | Avg:  5m 17s | Max: 14m 08s | Hits: 297%/522   
    🟩 jobs
      🟩 Build              Pass: 100%/18  | Total:  1h 33m | Avg:  5m 10s | Max: 14m 08s | Hits: 297%/522   
      🟩 Test               Pass: 100%/2   | Total: 26m 25s | Avg: 13m 12s | Max: 14m 56s
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total:  3m 14s | Avg:  3m 14s | Max:  3m 14s
      🟩 90a                Pass: 100%/1   | Total:  3m 14s | Avg:  3m 14s | Max:  3m 14s
    🟩 std
      🟩 17                 Pass: 100%/4   | Total: 17m 11s | Avg:  4m 17s | Max:  6m 31s
      🟩 20                 Pass: 100%/16  | Total:  1h 42m | Avg:  6m 24s | Max: 14m 56s | Hits: 297%/522   
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

🏃‍ Runner counts (total jobs: 20)

# Runner
12 linux-amd64-cpu16
4 linux-arm64-cpu16
2 windows-amd64-cpu16
2 linux-amd64-gpu-rtx2080-latest-1

@ericniebler ericniebler merged commit b6dd111 into NVIDIA:main Jan 30, 2025
33 of 37 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

2 participants