Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dynamic CUB dispatch for merge_sort #3525

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

NaderAlAwar
Copy link
Contributor

Description

closes #3387

Analogous to #2591 and #3398 , this PR extends the CUB dispatch layer for merge_sort to support dynamic kernel launching which will later be used by c.parallel.

Checklist

  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@NaderAlAwar NaderAlAwar requested a review from a team as a code owner January 24, 2025 18:47
Copy link
Contributor

🟨 CI finished in 1h 47m: Pass: 90%/90 | Total: 2d 15h | Avg: 42m 22s | Max: 1h 11m | Hits: 221%/12772
  • 🟨 cub: Pass: 88%/44 | Total: 1d 15h | Avg: 53m 28s | Max: 1h 11m | Hits: 244%/3552

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  88%/42  | Total:  1d 13h | Avg: 53m 00s | Max:  1h 11m | Hits: 244%/3552  
      🟩 arm64              Pass: 100%/2   | Total:  2h 06m | Avg:  1h 03m | Max:  1h 05m
    🔍 ctk: 12.6 🔍
      🟩 12.0               Pass: 100%/5   | Total:  4h 47m | Avg: 57m 35s | Max:  1h 02m | Hits: 245%/888   
      🟩 12.5               Pass: 100%/2   | Total:  2h 16m | Avg:  1h 08m | Max:  1h 10m
      🔍 12.6               Pass:  86%/37  | Total:  1d 08h | Avg: 52m 06s | Max:  1h 11m | Hits: 244%/2664  
    🔍 cudacxx: nvcc12.6 🔍
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  2h 04m | Avg:  1h 02m | Max:  1h 02m
      🟩 nvcc12.0           Pass: 100%/5   | Total:  4h 47m | Avg: 57m 35s | Max:  1h 02m | Hits: 245%/888   
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 16m | Avg:  1h 08m | Max:  1h 10m
      🔍 nvcc12.6           Pass:  85%/35  | Total:  1d 06h | Avg: 51m 32s | Max:  1h 11m | Hits: 244%/2664  
    🔍 cudacxx_family: nvcc 🔍
      🟩 ClangCUDA          Pass: 100%/2   | Total:  2h 04m | Avg:  1h 02m | Max:  1h 02m
      🔍 nvcc               Pass:  88%/42  | Total:  1d 13h | Avg: 53m 04s | Max:  1h 11m | Hits: 244%/3552  
    🔍 sm: 90 🔍
      🔍 90                 Pass:  50%/2   | Total: 46m 40s | Avg: 23m 20s | Max: 27m 21s
      🟩 90a                Pass: 100%/1   | Total: 27m 09s | Avg: 27m 09s | Max: 27m 09s
    🔍 std: 20 🔍
      🟩 17                 Pass: 100%/20  | Total: 19h 59m | Avg: 59m 59s | Max:  1h 11m | Hits: 245%/2664  
      🔍 20                 Pass:  79%/24  | Total: 19h 13m | Avg: 48m 02s | Max:  1h 06m | Hits: 242%/888   
    🟨 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  3h 50m | Avg: 57m 40s | Max:  1h 01m
      🟩 Clang15            Pass: 100%/2   | Total:  1h 56m | Avg: 58m 27s | Max:  1h 00m
      🟩 Clang16            Pass: 100%/2   | Total:  1h 58m | Avg: 59m 03s | Max:  1h 00m
      🟩 Clang17            Pass: 100%/2   | Total:  1h 56m | Avg: 58m 21s | Max:  1h 00m
      🟨 Clang18            Pass:  85%/7   | Total:  5h 58m | Avg: 51m 15s | Max:  1h 05m
      🟩 GCC7               Pass: 100%/2   | Total:  1h 51m | Avg: 55m 45s | Max: 56m 54s
      🟩 GCC8               Pass: 100%/1   | Total: 59m 07s | Avg: 59m 07s | Max: 59m 07s
      🟩 GCC9               Pass: 100%/2   | Total:  1h 57m | Avg: 58m 39s | Max: 59m 59s
      🟩 GCC10              Pass: 100%/2   | Total:  1h 58m | Avg: 59m 02s | Max:  1h 01m
      🟩 GCC11              Pass: 100%/2   | Total:  2h 00m | Avg:  1h 00m | Max:  1h 00m
      🟨 GCC12              Pass:  75%/4   | Total:  2h 42m | Avg: 40m 40s | Max: 59m 49s
      🟨 GCC13              Pass:  62%/8   | Total:  5h 19m | Avg: 39m 55s | Max:  1h 03m
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 09m | Avg:  1h 04m | Max:  1h 07m | Hits: 245%/1776  
      🟩 MSVC14.39          Pass: 100%/2   | Total:  2h 17m | Avg:  1h 08m | Max:  1h 11m | Hits: 244%/1776  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 16m | Avg:  1h 08m | Max:  1h 10m
    🟨 cxx_family
      🟨 Clang              Pass:  94%/17  | Total: 15h 41m | Avg: 55m 21s | Max:  1h 05m
      🟨 GCC                Pass:  80%/21  | Total: 16h 48m | Avg: 48m 00s | Max:  1h 03m
      🟩 MSVC               Pass: 100%/4   | Total:  4h 26m | Avg:  1h 06m | Max:  1h 11m | Hits: 244%/3552  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 16m | Avg:  1h 08m | Max:  1h 10m
    🟨 jobs
      🟩 Build              Pass: 100%/37  | Total:  1d 12h | Avg: 58m 46s | Max:  1h 11m | Hits: 244%/3552  
      🟥 DeviceLaunch       Pass:   0%/1   | Total: 30m 12s | Avg: 30m 12s | Max: 30m 12s
      🟥 GraphCapture       Pass:   0%/1   | Total: 25m 15s | Avg: 25m 15s | Max: 25m 15s
      🟥 HostLaunch         Pass:   0%/3   | Total:  1h 06m | Avg: 22m 09s | Max: 24m 58s
      🟩 TestGPU            Pass: 100%/2   | Total: 56m 36s | Avg: 28m 18s | Max: 29m 05s
    🟨 gpu
      🟨 h100               Pass:  50%/2   | Total: 46m 40s | Avg: 23m 20s | Max: 27m 21s
      🟨 v100               Pass:  90%/42  | Total:  1d 14h | Avg: 54m 54s | Max:  1h 11m | Hits: 244%/3552  
    
  • 🟨 thrust: Pass: 93%/43 | Total: 1d 00h | Avg: 33m 36s | Max: 1h 00m | Hits: 213%/9220

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  92%/41  | Total: 23h 06m | Avg: 33m 48s | Max:  1h 00m | Hits: 213%/9220  
      🟩 arm64              Pass: 100%/2   | Total: 59m 08s | Avg: 29m 34s | Max: 31m 28s
    🔍 ctk: 12.6 🔍
      🟩 12.0               Pass: 100%/5   | Total:  3h 05m | Avg: 37m 08s | Max: 54m 12s | Hits: 174%/1844  
      🟩 12.5               Pass: 100%/2   | Total:  1h 54m | Avg: 57m 16s | Max: 59m 08s
      🔍 12.6               Pass:  91%/36  | Total: 19h 05m | Avg: 31m 48s | Max:  1h 00m | Hits: 222%/7376  
    🔍 cudacxx: nvcc12.6 🔍
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 55m 51s | Avg: 27m 55s | Max: 29m 33s
      🟩 nvcc12.0           Pass: 100%/5   | Total:  3h 05m | Avg: 37m 08s | Max: 54m 12s | Hits: 174%/1844  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  1h 54m | Avg: 57m 16s | Max: 59m 08s
      🔍 nvcc12.6           Pass:  91%/34  | Total: 18h 09m | Avg: 32m 02s | Max:  1h 00m | Hits: 222%/7376  
    🔍 cudacxx_family: nvcc 🔍
      🟩 ClangCUDA          Pass: 100%/2   | Total: 55m 51s | Avg: 27m 55s | Max: 29m 33s
      🔍 nvcc               Pass:  92%/41  | Total: 23h 09m | Avg: 33m 53s | Max:  1h 00m | Hits: 213%/9220  
    🚨 jobs: TestGPU 🚨
      🟩 Build              Pass: 100%/37  | Total: 22h 28m | Avg: 36m 27s | Max:  1h 00m | Hits: 174%/7376  
      🟩 TestCPU            Pass: 100%/3   | Total: 46m 02s | Avg: 15m 20s | Max: 30m 36s | Hits: 365%/1844  
      🔥 TestGPU            Pass:   0%/3   | Total: 50m 24s | Avg: 16m 48s | Max: 23m 36s
    🔍 std: 20 🔍
      🟩 17                 Pass: 100%/20  | Total: 12h 48m | Avg: 38m 24s | Max: 59m 53s | Hits: 174%/5532  
      🔍 20                 Pass:  90%/21  | Total: 10h 24m | Avg: 29m 45s | Max:  1h 00m | Hits: 270%/3688  
    🟨 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  2h 12m | Avg: 33m 09s | Max: 34m 01s
      🟩 Clang15            Pass: 100%/2   | Total:  1h 05m | Avg: 32m 42s | Max: 34m 39s
      🟩 Clang16            Pass: 100%/2   | Total:  1h 09m | Avg: 34m 35s | Max: 37m 04s
      🟩 Clang17            Pass: 100%/2   | Total:  1h 09m | Avg: 34m 34s | Max: 34m 44s
      🟨 Clang18            Pass:  85%/7   | Total:  2h 47m | Avg: 23m 52s | Max: 32m 13s
      🟩 GCC7               Pass: 100%/2   | Total:  1h 10m | Avg: 35m 11s | Max: 37m 40s
      🟩 GCC8               Pass: 100%/1   | Total: 33m 16s | Avg: 33m 16s | Max: 33m 16s
      🟩 GCC9               Pass: 100%/2   | Total:  1h 09m | Avg: 34m 50s | Max: 36m 25s
      🟩 GCC10              Pass: 100%/2   | Total:  1h 04m | Avg: 32m 10s | Max: 32m 43s
      🟩 GCC11              Pass: 100%/2   | Total:  1h 13m | Avg: 36m 39s | Max: 37m 36s
      🟩 GCC12              Pass: 100%/2   | Total:  1h 03m | Avg: 31m 51s | Max: 32m 01s
      🟨 GCC13              Pass:  75%/8   | Total:  3h 12m | Avg: 24m 03s | Max: 34m 59s
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 54m | Avg: 57m 02s | Max: 59m 53s | Hits: 174%/3688  
      🟩 MSVC14.39          Pass: 100%/3   | Total:  2h 26m | Avg: 48m 41s | Max:  1h 00m | Hits: 238%/5532  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  1h 54m | Avg: 57m 16s | Max: 59m 08s
    🟨 cxx_family
      🟨 Clang              Pass:  94%/17  | Total:  8h 23m | Avg: 29m 37s | Max: 37m 04s
      🟨 GCC                Pass:  89%/19  | Total:  9h 27m | Avg: 29m 51s | Max: 37m 40s
      🟩 MSVC               Pass: 100%/5   | Total:  4h 20m | Avg: 52m 02s | Max:  1h 00m | Hits: 213%/9220  
      🟩 NVHPC              Pass: 100%/2   | Total:  1h 54m | Avg: 57m 16s | Max: 59m 08s
    🟨 cmake_options
      🟨 -DTHRUST_DISPATCH_TYPE=Force32bit Pass:  50%/2   | Total: 52m 25s | Avg: 26m 12s | Max: 28m 49s
    🟨 gpu
      🟨 v100               Pass:  93%/43  | Total:  1d 00h | Avg: 33m 36s | Max:  1h 00m | Hits: 213%/9220  
    🟩 sm
      🟩 90a                Pass: 100%/1   | Total: 17m 37s | Avg: 17m 37s | Max: 17m 37s
    
  • 🟥 python: Pass: 0%/1 | Total: 5m 20s | Avg: 5m 20s | Max: 5m 20s

    🟥 cpu
      🟥 amd64              Pass:   0%/1   | Total:  5m 20s | Avg:  5m 20s | Max:  5m 20s
    🟥 ctk
      🟥 12.6               Pass:   0%/1   | Total:  5m 20s | Avg:  5m 20s | Max:  5m 20s
    🟥 cudacxx
      🟥 nvcc12.6           Pass:   0%/1   | Total:  5m 20s | Avg:  5m 20s | Max:  5m 20s
    🟥 cudacxx_family
      🟥 nvcc               Pass:   0%/1   | Total:  5m 20s | Avg:  5m 20s | Max:  5m 20s
    🟥 cxx
      🟥 GCC13              Pass:   0%/1   | Total:  5m 20s | Avg:  5m 20s | Max:  5m 20s
    🟥 cxx_family
      🟥 GCC                Pass:   0%/1   | Total:  5m 20s | Avg:  5m 20s | Max:  5m 20s
    🟥 gpu
      🟥 v100               Pass:   0%/1   | Total:  5m 20s | Avg:  5m 20s | Max:  5m 20s
    🟥 jobs
      🟥 Test               Pass:   0%/1   | Total:  5m 20s | Avg:  5m 20s | Max:  5m 20s
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 10m 49s | Avg: 5m 24s | Max: 8m 20s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 10m 49s | Avg:  5m 24s | Max:  8m 20s
    🟩 ctk
      🟩 12.6               Pass: 100%/2   | Total: 10m 49s | Avg:  5m 24s | Max:  8m 20s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/2   | Total: 10m 49s | Avg:  5m 24s | Max:  8m 20s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 10m 49s | Avg:  5m 24s | Max:  8m 20s
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 10m 49s | Avg:  5m 24s | Max:  8m 20s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 10m 49s | Avg:  5m 24s | Max:  8m 20s
    🟩 gpu
      🟩 v100               Pass: 100%/2   | Total: 10m 49s | Avg:  5m 24s | Max:  8m 20s
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 29s | Avg:  2m 29s | Max:  2m 29s
      🟩 Test               Pass: 100%/1   | Total:  8m 20s | Avg:  8m 20s | Max:  8m 20s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 90)

# Runner
65 linux-amd64-cpu16
11 linux-amd64-gpu-v100-latest-1
9 windows-amd64-cpu16
4 linux-arm64-cpu16
1 linux-amd64-gpu-h100-latest-1-testing

Copy link
Contributor

🟨 CI finished in 3h 31m: Pass: 90%/90 | Total: 2d 17h | Avg: 43m 27s | Max: 1h 18m | Hits: 192%/10928
  • 🟨 cub: Pass: 88%/44 | Total: 1d 15h | Avg: 53m 47s | Max: 1h 18m | Hits: 228%/3552

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  88%/42  | Total:  1d 13h | Avg: 53m 30s | Max:  1h 18m | Hits: 228%/3552  
      🟩 arm64              Pass: 100%/2   | Total:  1h 59m | Avg: 59m 44s | Max:  1h 00m
    🔍 ctk: 12.6 🔍
      🟩 12.0               Pass: 100%/5   | Total:  4h 56m | Avg: 59m 12s | Max:  1h 02m | Hits: 229%/888   
      🟩 12.5               Pass: 100%/2   | Total:  2h 16m | Avg:  1h 08m | Max:  1h 09m
      🔍 12.6               Pass:  86%/37  | Total:  1d 08h | Avg: 52m 16s | Max:  1h 18m | Hits: 228%/2664  
    🔍 cudacxx: nvcc12.6 🔍
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  2h 02m | Avg:  1h 01m | Max:  1h 01m
      🟩 nvcc12.0           Pass: 100%/5   | Total:  4h 56m | Avg: 59m 12s | Max:  1h 02m | Hits: 229%/888   
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 16m | Avg:  1h 08m | Max:  1h 09m
      🔍 nvcc12.6           Pass:  85%/35  | Total:  1d 06h | Avg: 51m 46s | Max:  1h 18m | Hits: 228%/2664  
    🔍 cudacxx_family: nvcc 🔍
      🟩 ClangCUDA          Pass: 100%/2   | Total:  2h 02m | Avg:  1h 01m | Max:  1h 01m
      🔍 nvcc               Pass:  88%/42  | Total:  1d 13h | Avg: 53m 26s | Max:  1h 18m | Hits: 228%/3552  
    🔍 sm: 90 🔍
      🔍 90                 Pass:  50%/2   | Total: 43m 35s | Avg: 21m 47s | Max: 24m 08s
      🟩 90a                Pass: 100%/1   | Total: 25m 09s | Avg: 25m 09s | Max: 25m 09s
    🔍 std: 20 🔍
      🟩 17                 Pass: 100%/20  | Total: 20h 30m | Avg:  1h 01m | Max:  1h 13m | Hits: 229%/2664  
      🔍 20                 Pass:  79%/24  | Total: 18h 56m | Avg: 47m 21s | Max:  1h 18m | Hits: 226%/888   
    🟨 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  3h 58m | Avg: 59m 30s | Max:  1h 02m
      🟩 Clang15            Pass: 100%/2   | Total:  1h 57m | Avg: 58m 36s | Max:  1h 00m
      🟩 Clang16            Pass: 100%/2   | Total:  2h 02m | Avg:  1h 01m | Max:  1h 01m
      🟩 Clang17            Pass: 100%/2   | Total:  2h 03m | Avg:  1h 01m | Max:  1h 03m
      🟨 Clang18            Pass:  85%/7   | Total:  5h 53m | Avg: 50m 32s | Max:  1h 01m
      🟩 GCC7               Pass: 100%/2   | Total:  2h 02m | Avg:  1h 01m | Max:  1h 02m
      🟩 GCC8               Pass: 100%/1   | Total:  1h 02m | Avg:  1h 02m | Max:  1h 02m
      🟩 GCC9               Pass: 100%/2   | Total:  1h 53m | Avg: 56m 35s | Max: 56m 51s
      🟩 GCC10              Pass: 100%/2   | Total:  2h 06m | Avg:  1h 03m | Max:  1h 03m
      🟩 GCC11              Pass: 100%/2   | Total:  1h 59m | Avg: 59m 52s | Max:  1h 02m
      🟨 GCC12              Pass:  75%/4   | Total:  2h 37m | Avg: 39m 20s | Max: 57m 41s
      🟨 GCC13              Pass:  62%/8   | Total:  4h 50m | Avg: 36m 20s | Max:  1h 00m
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 11m | Avg:  1h 05m | Max:  1h 09m | Hits: 229%/1776  
      🟩 MSVC14.39          Pass: 100%/2   | Total:  2h 31m | Avg:  1h 15m | Max:  1h 18m | Hits: 227%/1776  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 16m | Avg:  1h 08m | Max:  1h 09m
    🟨 cxx_family
      🟨 Clang              Pass:  94%/17  | Total: 15h 55m | Avg: 56m 10s | Max:  1h 03m
      🟨 GCC                Pass:  80%/21  | Total: 16h 31m | Avg: 47m 13s | Max:  1h 03m
      🟩 MSVC               Pass: 100%/4   | Total:  4h 43m | Avg:  1h 10m | Max:  1h 18m | Hits: 228%/3552  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 16m | Avg:  1h 08m | Max:  1h 09m
    🟨 jobs
      🟩 Build              Pass: 100%/37  | Total:  1d 12h | Avg: 59m 40s | Max:  1h 18m | Hits: 228%/3552  
      🟥 DeviceLaunch       Pass:   0%/1   | Total: 21m 48s | Avg: 21m 48s | Max: 21m 48s
      🟥 GraphCapture       Pass:   0%/1   | Total: 16m 32s | Avg: 16m 32s | Max: 16m 32s
      🟥 HostLaunch         Pass:   0%/3   | Total:  1h 08m | Avg: 22m 51s | Max: 26m 55s
      🟩 TestGPU            Pass: 100%/2   | Total: 51m 45s | Avg: 25m 52s | Max: 28m 28s
    🟨 gpu
      🟨 h100               Pass:  50%/2   | Total: 43m 35s | Avg: 21m 47s | Max: 24m 08s
      🟨 v100               Pass:  90%/42  | Total:  1d 14h | Avg: 55m 19s | Max:  1h 18m | Hits: 228%/3552  
    
  • 🟨 thrust: Pass: 90%/43 | Total: 1d 00h | Avg: 34m 30s | Max: 1h 10m | Hits: 174%/7376

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  90%/41  | Total: 23h 38m | Avg: 34m 36s | Max:  1h 10m | Hits: 174%/7376  
      🟩 arm64              Pass: 100%/2   | Total:  1h 04m | Avg: 32m 21s | Max: 33m 53s
    🔍 ctk: 12.6 🔍
      🟩 12.0               Pass: 100%/5   | Total:  3h 12m | Avg: 38m 35s | Max: 59m 21s | Hits: 174%/1844  
      🟩 12.5               Pass: 100%/2   | Total:  1h 57m | Avg: 58m 45s | Max: 59m 08s
      🔍 12.6               Pass:  88%/36  | Total: 19h 33m | Avg: 32m 35s | Max:  1h 10m | Hits: 174%/5532  
    🔍 cudacxx: nvcc12.6 🔍
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 51m 00s | Avg: 25m 30s | Max: 25m 59s
      🟩 nvcc12.0           Pass: 100%/5   | Total:  3h 12m | Avg: 38m 35s | Max: 59m 21s | Hits: 174%/1844  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  1h 57m | Avg: 58m 45s | Max: 59m 08s
      🔍 nvcc12.6           Pass:  88%/34  | Total: 18h 42m | Avg: 33m 00s | Max:  1h 10m | Hits: 174%/5532  
    🔍 cudacxx_family: nvcc 🔍
      🟩 ClangCUDA          Pass: 100%/2   | Total: 51m 00s | Avg: 25m 30s | Max: 25m 59s
      🔍 nvcc               Pass:  90%/41  | Total: 23h 52m | Avg: 34m 56s | Max:  1h 10m | Hits: 174%/7376  
    🔍 std: 20 🔍
      🟩 17                 Pass: 100%/20  | Total: 12h 44m | Avg: 38m 12s | Max:  1h 03m | Hits: 174%/5532  
      🔍 20                 Pass:  85%/21  | Total: 11h 00m | Avg: 31m 28s | Max:  1h 10m | Hits: 174%/1844  
    🟨 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  2h 11m | Avg: 32m 52s | Max: 34m 28s
      🟩 Clang15            Pass: 100%/2   | Total:  1h 05m | Avg: 32m 40s | Max: 33m 24s
      🟩 Clang16            Pass: 100%/2   | Total:  1h 06m | Avg: 33m 03s | Max: 35m 04s
      🟩 Clang17            Pass: 100%/2   | Total:  1h 13m | Avg: 36m 44s | Max: 39m 44s
      🟨 Clang18            Pass:  85%/7   | Total:  2h 45m | Avg: 23m 41s | Max: 33m 45s
      🟩 GCC7               Pass: 100%/2   | Total:  1h 05m | Avg: 32m 45s | Max: 34m 08s
      🟩 GCC8               Pass: 100%/1   | Total: 33m 49s | Avg: 33m 49s | Max: 33m 49s
      🟩 GCC9               Pass: 100%/2   | Total:  1h 08m | Avg: 34m 10s | Max: 35m 50s
      🟩 GCC10              Pass: 100%/2   | Total:  1h 07m | Avg: 33m 50s | Max: 34m 46s
      🟩 GCC11              Pass: 100%/2   | Total:  1h 08m | Avg: 34m 05s | Max: 35m 33s
      🟩 GCC12              Pass: 100%/2   | Total:  1h 10m | Avg: 35m 26s | Max: 38m 20s
      🟨 GCC13              Pass:  75%/8   | Total:  3h 22m | Avg: 25m 20s | Max: 34m 51s
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 02m | Avg:  1h 01m | Max:  1h 03m | Hits: 174%/3688  
      🟨 MSVC14.39          Pass:  66%/3   | Total:  2h 44m | Avg: 54m 46s | Max:  1h 10m | Hits: 174%/3688  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  1h 57m | Avg: 58m 45s | Max: 59m 08s
    🟨 cxx_family
      🟨 Clang              Pass:  94%/17  | Total:  8h 22m | Avg: 29m 32s | Max: 39m 44s
      🟨 GCC                Pass:  89%/19  | Total:  9h 37m | Avg: 30m 22s | Max: 38m 20s
      🟨 MSVC               Pass:  80%/5   | Total:  4h 46m | Avg: 57m 20s | Max:  1h 10m | Hits: 174%/7376  
      🟩 NVHPC              Pass: 100%/2   | Total:  1h 57m | Avg: 58m 45s | Max: 59m 08s
    🟨 jobs
      🟩 Build              Pass: 100%/37  | Total: 22h 57m | Avg: 37m 14s | Max:  1h 10m | Hits: 174%/7376  
      🟨 TestCPU            Pass:  66%/3   | Total: 48m 56s | Avg: 16m 18s | Max: 33m 31s
      🟥 TestGPU            Pass:   0%/3   | Total: 56m 48s | Avg: 18m 56s | Max: 31m 18s
    🟨 cmake_options
      🟨 -DTHRUST_DISPATCH_TYPE=Force32bit Pass:  50%/2   | Total: 58m 26s | Avg: 29m 13s | Max: 31m 18s
    🟨 gpu
      🟨 v100               Pass:  90%/43  | Total:  1d 00h | Avg: 34m 30s | Max:  1h 10m | Hits: 174%/7376  
    🟩 sm
      🟩 90a                Pass: 100%/1   | Total: 19m 55s | Avg: 19m 55s | Max: 19m 55s
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 9m 37s | Avg: 4m 48s | Max: 7m 20s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total:  9m 37s | Avg:  4m 48s | Max:  7m 20s
    🟩 ctk
      🟩 12.6               Pass: 100%/2   | Total:  9m 37s | Avg:  4m 48s | Max:  7m 20s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/2   | Total:  9m 37s | Avg:  4m 48s | Max:  7m 20s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total:  9m 37s | Avg:  4m 48s | Max:  7m 20s
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total:  9m 37s | Avg:  4m 48s | Max:  7m 20s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total:  9m 37s | Avg:  4m 48s | Max:  7m 20s
    🟩 gpu
      🟩 v100               Pass: 100%/2   | Total:  9m 37s | Avg:  4m 48s | Max:  7m 20s
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 17s | Avg:  2m 17s | Max:  2m 17s
      🟩 Test               Pass: 100%/1   | Total:  7m 20s | Avg:  7m 20s | Max:  7m 20s
    
  • 🟩 python: Pass: 100%/1 | Total: 51m 32s | Avg: 51m 32s | Max: 51m 32s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 51m 32s | Avg: 51m 32s | Max: 51m 32s
    🟩 ctk
      🟩 12.6               Pass: 100%/1   | Total: 51m 32s | Avg: 51m 32s | Max: 51m 32s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/1   | Total: 51m 32s | Avg: 51m 32s | Max: 51m 32s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 51m 32s | Avg: 51m 32s | Max: 51m 32s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 51m 32s | Avg: 51m 32s | Max: 51m 32s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 51m 32s | Avg: 51m 32s | Max: 51m 32s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 51m 32s | Avg: 51m 32s | Max: 51m 32s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 51m 32s | Avg: 51m 32s | Max: 51m 32s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 90)

# Runner
65 linux-amd64-cpu16
11 linux-amd64-gpu-v100-latest-1
9 windows-amd64-cpu16
4 linux-arm64-cpu16
1 linux-amd64-gpu-h100-latest-1-testing

@NaderAlAwar NaderAlAwar marked this pull request as draft January 28, 2025 20:13
Copy link

copy-pr-bot bot commented Jan 28, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@NaderAlAwar
Copy link
Contributor Author

/ok to test

… to select constexpr `BLOCK_THREADS` if available"

This reverts commit 7cced5e.
@NaderAlAwar
Copy link
Contributor Author

/ok to test

@NaderAlAwar
Copy link
Contributor Author

/ok to test

@NaderAlAwar NaderAlAwar marked this pull request as ready for review January 29, 2025 01:33
Copy link
Contributor

🟨 CI finished in 37m 03s: Pass: 97%/89 | Total: 14h 09m | Avg: 9m 32s | Max: 33m 04s | Hits: 422%/10936
  • 🟨 cub: Pass: 95%/44 | Total: 7h 35m | Avg: 10m 21s | Max: 33m 04s | Hits: 539%/3552

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  95%/42  | Total:  7h 25m | Avg: 10m 37s | Max: 33m 04s | Hits: 539%/3552  
      🟩 arm64              Pass: 100%/2   | Total:  9m 46s | Avg:  4m 53s | Max:  5m 09s
    🔍 ctk: 12.6 🔍
      🟩 12.0               Pass: 100%/5   | Total: 46m 01s | Avg:  9m 12s | Max: 24m 46s | Hits: 539%/888   
      🟩 12.5               Pass: 100%/2   | Total: 19m 04s | Avg:  9m 32s | Max:  9m 38s
      🔍 12.6               Pass:  94%/37  | Total:  6h 30m | Avg: 10m 33s | Max: 33m 04s | Hits: 539%/2664  
    🔍 cudacxx: nvcc12.6 🔍
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  8m 55s | Avg:  4m 27s | Max:  4m 35s
      🟩 nvcc12.0           Pass: 100%/5   | Total: 46m 01s | Avg:  9m 12s | Max: 24m 46s | Hits: 539%/888   
      🟩 nvcc12.5           Pass: 100%/2   | Total: 19m 04s | Avg:  9m 32s | Max:  9m 38s
      🔍 nvcc12.6           Pass:  94%/35  | Total:  6h 21m | Avg: 10m 54s | Max: 33m 04s | Hits: 539%/2664  
    🔍 cudacxx_family: nvcc 🔍
      🟩 ClangCUDA          Pass: 100%/2   | Total:  8m 55s | Avg:  4m 27s | Max:  4m 35s
      🔍 nvcc               Pass:  95%/42  | Total:  7h 26m | Avg: 10m 38s | Max: 33m 04s | Hits: 539%/3552  
    🔍 gpu: rtxa6000 🔍
      🟩 h100               Pass: 100%/2   | Total: 32m 37s | Avg: 16m 18s | Max: 28m 08s
      🔍 rtxa6000           Pass:  75%/8   | Total:  2h 18m | Avg: 17m 15s | Max: 25m 41s
      🟩 v100               Pass: 100%/34  | Total:  4h 45m | Avg:  8m 22s | Max: 33m 04s | Hits: 539%/3552  
    🚨 jobs: TestGPU 🚨
      🟩 Build              Pass: 100%/37  | Total:  5h 01m | Avg:  8m 09s | Max: 33m 04s | Hits: 539%/3552  
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 20m 14s | Avg: 20m 14s | Max: 20m 14s
      🟩 GraphCapture       Pass: 100%/1   | Total: 15m 16s | Avg: 15m 16s | Max: 15m 16s
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 18m | Avg: 26m 01s | Max: 28m 08s
      🔥 TestGPU            Pass:   0%/2   | Total: 40m 20s | Avg: 20m 10s | Max: 21m 13s
    🔍 std: 20 🔍
      🟩 17                 Pass: 100%/20  | Total:  3h 00m | Avg:  9m 00s | Max: 28m 16s | Hits: 539%/2664  
      🔍 20                 Pass:  91%/24  | Total:  4h 35m | Avg: 11m 28s | Max: 33m 04s | Hits: 539%/888   
    🟨 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 21m 19s | Avg:  5m 19s | Max:  5m 31s
      🟩 Clang15            Pass: 100%/2   | Total: 11m 20s | Avg:  5m 40s | Max:  5m 42s
      🟩 Clang16            Pass: 100%/2   | Total: 10m 58s | Avg:  5m 29s | Max:  5m 33s
      🟩 Clang17            Pass: 100%/2   | Total: 11m 23s | Avg:  5m 41s | Max:  5m 48s
      🟨 Clang18            Pass:  85%/7   | Total:  1h 11m | Avg: 10m 15s | Max: 25m 41s
      🟩 GCC7               Pass: 100%/2   | Total: 11m 04s | Avg:  5m 32s | Max:  5m 47s
      🟩 GCC8               Pass: 100%/1   | Total:  5m 53s | Avg:  5m 53s | Max:  5m 53s
      🟩 GCC9               Pass: 100%/2   | Total: 11m 06s | Avg:  5m 33s | Max:  5m 48s
      🟩 GCC10              Pass: 100%/2   | Total: 11m 20s | Avg:  5m 40s | Max:  5m 52s
      🟩 GCC11              Pass: 100%/2   | Total: 11m 11s | Avg:  5m 35s | Max:  5m 42s
      🟩 GCC12              Pass: 100%/4   | Total: 44m 31s | Avg: 11m 07s | Max: 28m 08s
      🟨 GCC13              Pass:  87%/8   | Total:  1h 40m | Avg: 12m 37s | Max: 24m 16s
      🟩 MSVC14.29          Pass: 100%/2   | Total: 52m 28s | Avg: 26m 14s | Max: 27m 42s | Hits: 539%/1776  
      🟩 MSVC14.39          Pass: 100%/2   | Total:  1h 01m | Avg: 30m 40s | Max: 33m 04s | Hits: 539%/1776  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 19m 04s | Avg:  9m 32s | Max:  9m 38s
    🟨 cxx_family
      🟨 Clang              Pass:  94%/17  | Total:  2h 06m | Avg:  7m 27s | Max: 25m 41s
      🟨 GCC                Pass:  95%/21  | Total:  3h 16m | Avg:  9m 20s | Max: 28m 08s
      🟩 MSVC               Pass: 100%/4   | Total:  1h 53m | Avg: 28m 27s | Max: 33m 04s | Hits: 539%/3552  
      🟩 NVHPC              Pass: 100%/2   | Total: 19m 04s | Avg:  9m 32s | Max:  9m 38s
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 32m 37s | Avg: 16m 18s | Max: 28m 08s
      🟩 90a                Pass: 100%/1   | Total:  4m 22s | Avg:  4m 22s | Max:  4m 22s
    
  • 🟩 thrust: Pass: 100%/42 | Total: 6h 02m | Avg: 8m 37s | Max: 31m 44s | Hits: 365%/7384

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 16m 34s | Avg:  8m 17s | Max: 11m 05s
    🟩 cpu
      🟩 amd64              Pass: 100%/40  | Total:  5h 52m | Avg:  8m 48s | Max: 31m 44s | Hits: 365%/7384  
      🟩 arm64              Pass: 100%/2   | Total:  9m 28s | Avg:  4m 44s | Max:  4m 57s
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total: 43m 39s | Avg:  8m 43s | Max: 23m 01s | Hits: 365%/1846  
      🟩 12.5               Pass: 100%/2   | Total: 29m 02s | Avg: 14m 31s | Max: 15m 18s
      🟩 12.6               Pass: 100%/35  | Total:  4h 49m | Avg:  8m 16s | Max: 31m 44s | Hits: 365%/5538  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 10m 28s | Avg:  5m 14s | Max:  5m 15s
      🟩 nvcc12.0           Pass: 100%/5   | Total: 43m 39s | Avg:  8m 43s | Max: 23m 01s | Hits: 365%/1846  
      🟩 nvcc12.5           Pass: 100%/2   | Total: 29m 02s | Avg: 14m 31s | Max: 15m 18s
      🟩 nvcc12.6           Pass: 100%/33  | Total:  4h 38m | Avg:  8m 27s | Max: 31m 44s | Hits: 365%/5538  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 10m 28s | Avg:  5m 14s | Max:  5m 15s
      🟩 nvcc               Pass: 100%/40  | Total:  5h 51m | Avg:  8m 47s | Max: 31m 44s | Hits: 365%/7384  
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 20m 35s | Avg:  5m 08s | Max:  5m 37s
      🟩 Clang15            Pass: 100%/2   | Total: 11m 36s | Avg:  5m 48s | Max:  5m 59s
      🟩 Clang16            Pass: 100%/2   | Total: 11m 00s | Avg:  5m 30s | Max:  5m 34s
      🟩 Clang17            Pass: 100%/2   | Total: 11m 22s | Avg:  5m 41s | Max:  5m 51s
      🟩 Clang18            Pass: 100%/7   | Total: 44m 51s | Avg:  6m 24s | Max: 10m 29s
      🟩 GCC7               Pass: 100%/2   | Total: 10m 27s | Avg:  5m 13s | Max:  5m 14s
      🟩 GCC8               Pass: 100%/1   | Total:  5m 15s | Avg:  5m 15s | Max:  5m 15s
      🟩 GCC9               Pass: 100%/2   | Total: 11m 49s | Avg:  5m 54s | Max:  6m 18s
      🟩 GCC10              Pass: 100%/2   | Total: 11m 22s | Avg:  5m 41s | Max:  5m 55s
      🟩 GCC11              Pass: 100%/2   | Total: 11m 44s | Avg:  5m 52s | Max:  5m 53s
      🟩 GCC12              Pass: 100%/2   | Total: 13m 10s | Avg:  6m 35s | Max:  7m 10s
      🟩 GCC13              Pass: 100%/8   | Total: 56m 24s | Avg:  7m 03s | Max: 11m 17s
      🟩 MSVC14.29          Pass: 100%/2   | Total: 52m 12s | Avg: 26m 06s | Max: 29m 11s | Hits: 365%/3692  
      🟩 MSVC14.39          Pass: 100%/2   | Total:  1h 01m | Avg: 30m 38s | Max: 31m 44s | Hits: 365%/3692  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 29m 02s | Avg: 14m 31s | Max: 15m 18s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  1h 39m | Avg:  5m 50s | Max: 10m 29s
      🟩 GCC                Pass: 100%/19  | Total:  2h 00m | Avg:  6m 19s | Max: 11m 17s
      🟩 MSVC               Pass: 100%/4   | Total:  1h 53m | Avg: 28m 22s | Max: 31m 44s | Hits: 365%/7384  
      🟩 NVHPC              Pass: 100%/2   | Total: 29m 02s | Avg: 14m 31s | Max: 15m 18s
    🟩 gpu
      🟩 rtx4090            Pass: 100%/8   | Total:  1h 05m | Avg:  8m 11s | Max: 11m 17s
      🟩 v100               Pass: 100%/34  | Total:  4h 56m | Avg:  8m 43s | Max: 31m 44s | Hits: 365%/7384  
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total:  5h 13m | Avg:  8m 28s | Max: 31m 44s | Hits: 365%/7384  
      🟩 TestCPU            Pass: 100%/2   | Total: 15m 28s | Avg:  7m 44s | Max:  7m 56s
      🟩 TestGPU            Pass: 100%/3   | Total: 32m 51s | Avg: 10m 57s | Max: 11m 17s
    🟩 sm
      🟩 90a                Pass: 100%/1   | Total:  4m 28s | Avg:  4m 28s | Max:  4m 28s
    🟩 std
      🟩 17                 Pass: 100%/20  | Total:  3h 06m | Avg:  9m 19s | Max: 29m 33s | Hits: 365%/5538  
      🟩 20                 Pass: 100%/20  | Total:  2h 39m | Avg:  7m 57s | Max: 31m 44s | Hits: 365%/1846  
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 6m 49s | Avg: 3m 24s | Max: 4m 51s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total:  6m 49s | Avg:  3m 24s | Max:  4m 51s
    🟩 ctk
      🟩 12.6               Pass: 100%/2   | Total:  6m 49s | Avg:  3m 24s | Max:  4m 51s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/2   | Total:  6m 49s | Avg:  3m 24s | Max:  4m 51s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total:  6m 49s | Avg:  3m 24s | Max:  4m 51s
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total:  6m 49s | Avg:  3m 24s | Max:  4m 51s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total:  6m 49s | Avg:  3m 24s | Max:  4m 51s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/2   | Total:  6m 49s | Avg:  3m 24s | Max:  4m 51s
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  1m 58s | Avg:  1m 58s | Max:  1m 58s
      🟩 Test               Pass: 100%/1   | Total:  4m 51s | Avg:  4m 51s | Max:  4m 51s
    
  • 🟩 python: Pass: 100%/1 | Total: 25m 10s | Avg: 25m 10s | Max: 25m 10s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 25m 10s | Avg: 25m 10s | Max: 25m 10s
    🟩 ctk
      🟩 12.6               Pass: 100%/1   | Total: 25m 10s | Avg: 25m 10s | Max: 25m 10s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/1   | Total: 25m 10s | Avg: 25m 10s | Max: 25m 10s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 25m 10s | Avg: 25m 10s | Max: 25m 10s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 25m 10s | Avg: 25m 10s | Max: 25m 10s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 25m 10s | Avg: 25m 10s | Max: 25m 10s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/1   | Total: 25m 10s | Avg: 25m 10s | Max: 25m 10s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 25m 10s | Avg: 25m 10s | Max: 25m 10s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 89)

# Runner
65 linux-amd64-cpu16
8 windows-amd64-cpu16
6 linux-amd64-gpu-rtxa6000-latest-1
4 linux-arm64-cpu16
3 linux-amd64-gpu-rtx4090-latest-1
2 linux-amd64-gpu-rtx2080-latest-1
1 linux-amd64-gpu-h100-latest-1

Copy link
Contributor

🟩 CI finished in 1h 46m: Pass: 100%/89 | Total: 1d 16h | Avg: 27m 23s | Max: 1h 00m | Hits: 398%/10936
  • 🟩 cub: Pass: 100%/44 | Total: 1d 05h | Avg: 39m 46s | Max: 1h 00m | Hits: 512%/3552

    🟩 cpu
      🟩 amd64              Pass: 100%/42  | Total:  1d 03h | Avg: 39m 30s | Max:  1h 00m | Hits: 512%/3552  
      🟩 arm64              Pass: 100%/2   | Total:  1h 30m | Avg: 45m 10s | Max: 45m 38s
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  3h 38m | Avg: 43m 45s | Max: 56m 08s | Hits: 512%/888   
      🟩 12.5               Pass: 100%/2   | Total:  1h 30m | Avg: 45m 02s | Max: 46m 58s
      🟩 12.6               Pass: 100%/37  | Total:  1d 00h | Avg: 38m 56s | Max:  1h 00m | Hits: 512%/2664  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  1h 51m | Avg: 55m 57s | Max: 58m 05s
      🟩 nvcc12.0           Pass: 100%/5   | Total:  3h 38m | Avg: 43m 45s | Max: 56m 08s | Hits: 512%/888   
      🟩 nvcc12.5           Pass: 100%/2   | Total:  1h 30m | Avg: 45m 02s | Max: 46m 58s
      🟩 nvcc12.6           Pass: 100%/35  | Total: 22h 09m | Avg: 37m 58s | Max:  1h 00m | Hits: 512%/2664  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  1h 51m | Avg: 55m 57s | Max: 58m 05s
      🟩 nvcc               Pass: 100%/42  | Total:  1d 03h | Avg: 39m 00s | Max:  1h 00m | Hits: 512%/3552  
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  2h 43m | Avg: 40m 54s | Max: 43m 11s
      🟩 Clang15            Pass: 100%/2   | Total:  1h 25m | Avg: 42m 43s | Max: 43m 33s
      🟩 Clang16            Pass: 100%/2   | Total:  1h 20m | Avg: 40m 02s | Max: 40m 45s
      🟩 Clang17            Pass: 100%/2   | Total:  1h 21m | Avg: 40m 47s | Max: 41m 37s
      🟩 Clang18            Pass: 100%/7   | Total:  4h 49m | Avg: 41m 19s | Max: 58m 05s
      🟩 GCC7               Pass: 100%/2   | Total:  1h 20m | Avg: 40m 00s | Max: 40m 03s
      🟩 GCC8               Pass: 100%/1   | Total: 38m 30s | Avg: 38m 30s | Max: 38m 30s
      🟩 GCC9               Pass: 100%/2   | Total:  1h 18m | Avg: 39m 11s | Max: 39m 26s
      🟩 GCC10              Pass: 100%/2   | Total:  1h 19m | Avg: 39m 48s | Max: 39m 53s
      🟩 GCC11              Pass: 100%/2   | Total:  1h 22m | Avg: 41m 04s | Max: 42m 32s
      🟩 GCC12              Pass: 100%/4   | Total:  2h 07m | Avg: 31m 57s | Max: 42m 37s
      🟩 GCC13              Pass: 100%/8   | Total:  3h 55m | Avg: 29m 25s | Max: 45m 38s
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 56m | Avg: 58m 21s | Max:  1h 00m | Hits: 512%/1776  
      🟩 MSVC14.39          Pass: 100%/2   | Total:  2h 01m | Avg:  1h 00m | Max:  1h 00m | Hits: 512%/1776  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  1h 30m | Avg: 45m 02s | Max: 46m 58s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total: 11h 40m | Avg: 41m 10s | Max: 58m 05s
      🟩 GCC                Pass: 100%/21  | Total: 12h 01m | Avg: 34m 22s | Max: 45m 38s
      🟩 MSVC               Pass: 100%/4   | Total:  3h 57m | Avg: 59m 29s | Max:  1h 00m | Hits: 512%/3552  
      🟩 NVHPC              Pass: 100%/2   | Total:  1h 30m | Avg: 45m 02s | Max: 46m 58s
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 45m 29s | Avg: 22m 44s | Max: 27m 54s
      🟩 rtxa6000           Pass: 100%/8   | Total:  3h 37m | Avg: 27m 14s | Max: 42m 42s
      🟩 v100               Pass: 100%/34  | Total:  1d 00h | Avg: 43m 43s | Max:  1h 00m | Hits: 512%/3552  
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total:  1d 02h | Avg: 42m 54s | Max:  1h 00m | Hits: 512%/3552  
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 24m 56s | Avg: 24m 56s | Max: 24m 56s
      🟩 GraphCapture       Pass: 100%/1   | Total: 14m 18s | Avg: 14m 18s | Max: 14m 18s
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 19m | Avg: 26m 33s | Max: 27m 54s
      🟩 TestGPU            Pass: 100%/2   | Total: 43m 36s | Avg: 21m 48s | Max: 22m 31s
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 45m 29s | Avg: 22m 44s | Max: 27m 54s
      🟩 90a                Pass: 100%/1   | Total: 16m 55s | Avg: 16m 55s | Max: 16m 55s
    🟩 std
      🟩 17                 Pass: 100%/20  | Total: 14h 45m | Avg: 44m 16s | Max:  1h 00m | Hits: 512%/2664  
      🟩 20                 Pass: 100%/24  | Total: 14h 24m | Avg: 36m 01s | Max:  1h 00m | Hits: 512%/888   
    
  • 🟩 thrust: Pass: 100%/42 | Total: 10h 50m | Avg: 15m 29s | Max: 37m 29s | Hits: 344%/7384

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 23m 15s | Avg: 11m 37s | Max: 12m 12s
    🟩 cpu
      🟩 amd64              Pass: 100%/40  | Total: 10h 26m | Avg: 15m 39s | Max: 37m 29s | Hits: 344%/7384  
      🟩 arm64              Pass: 100%/2   | Total: 24m 28s | Avg: 12m 14s | Max: 12m 41s
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  1h 26m | Avg: 17m 18s | Max: 32m 06s | Hits: 344%/1846  
      🟩 12.5               Pass: 100%/2   | Total: 54m 14s | Avg: 27m 07s | Max: 28m 34s
      🟩 12.6               Pass: 100%/35  | Total:  8h 29m | Avg: 14m 34s | Max: 37m 29s | Hits: 344%/5538  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 25m 33s | Avg: 12m 46s | Max: 13m 24s
      🟩 nvcc12.0           Pass: 100%/5   | Total:  1h 26m | Avg: 17m 18s | Max: 32m 06s | Hits: 344%/1846  
      🟩 nvcc12.5           Pass: 100%/2   | Total: 54m 14s | Avg: 27m 07s | Max: 28m 34s
      🟩 nvcc12.6           Pass: 100%/33  | Total:  8h 04m | Avg: 14m 40s | Max: 37m 29s | Hits: 344%/5538  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 25m 33s | Avg: 12m 46s | Max: 13m 24s
      🟩 nvcc               Pass: 100%/40  | Total: 10h 25m | Avg: 15m 37s | Max: 37m 29s | Hits: 344%/7384  
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 52m 44s | Avg: 13m 11s | Max: 13m 54s
      🟩 Clang15            Pass: 100%/2   | Total: 28m 27s | Avg: 14m 13s | Max: 14m 17s
      🟩 Clang16            Pass: 100%/2   | Total: 27m 26s | Avg: 13m 43s | Max: 14m 18s
      🟩 Clang17            Pass: 100%/2   | Total: 25m 40s | Avg: 12m 50s | Max: 13m 25s
      🟩 Clang18            Pass: 100%/7   | Total:  1h 20m | Avg: 11m 32s | Max: 13m 24s
      🟩 GCC7               Pass: 100%/2   | Total: 27m 46s | Avg: 13m 53s | Max: 14m 10s
      🟩 GCC8               Pass: 100%/1   | Total: 12m 07s | Avg: 12m 07s | Max: 12m 07s
      🟩 GCC9               Pass: 100%/2   | Total: 27m 44s | Avg: 13m 52s | Max: 14m 15s
      🟩 GCC10              Pass: 100%/2   | Total: 26m 01s | Avg: 13m 00s | Max: 13m 09s
      🟩 GCC11              Pass: 100%/2   | Total: 27m 23s | Avg: 13m 41s | Max: 13m 59s
      🟩 GCC12              Pass: 100%/2   | Total: 27m 20s | Avg: 13m 40s | Max: 13m 45s
      🟩 GCC13              Pass: 100%/8   | Total:  1h 31m | Avg: 11m 26s | Max: 14m 56s
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 07m | Avg: 33m 34s | Max: 35m 02s | Hits: 344%/3692  
      🟩 MSVC14.39          Pass: 100%/2   | Total:  1h 14m | Avg: 37m 09s | Max: 37m 29s | Hits: 344%/3692  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 54m 14s | Avg: 27m 07s | Max: 28m 34s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  3h 35m | Avg: 12m 39s | Max: 14m 18s
      🟩 GCC                Pass: 100%/19  | Total:  3h 59m | Avg: 12m 37s | Max: 14m 56s
      🟩 MSVC               Pass: 100%/4   | Total:  2h 21m | Avg: 35m 21s | Max: 37m 29s | Hits: 344%/7384  
      🟩 NVHPC              Pass: 100%/2   | Total: 54m 14s | Avg: 27m 07s | Max: 28m 34s
    🟩 gpu
      🟩 rtx4090            Pass: 100%/8   | Total:  1h 28m | Avg: 11m 03s | Max: 14m 56s
      🟩 v100               Pass: 100%/34  | Total:  9h 22m | Avg: 16m 32s | Max: 37m 29s | Hits: 344%/7384  
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total: 10h 02m | Avg: 16m 17s | Max: 37m 29s | Hits: 344%/7384  
      🟩 TestCPU            Pass: 100%/2   | Total: 15m 12s | Avg:  7m 36s | Max:  7m 42s
      🟩 TestGPU            Pass: 100%/3   | Total: 32m 54s | Avg: 10m 58s | Max: 11m 25s
    🟩 sm
      🟩 90a                Pass: 100%/1   | Total:  7m 46s | Avg:  7m 46s | Max:  7m 46s
    🟩 std
      🟩 17                 Pass: 100%/20  | Total:  5h 43m | Avg: 17m 10s | Max: 37m 29s | Hits: 344%/5538  
      🟩 20                 Pass: 100%/20  | Total:  4h 43m | Avg: 14m 11s | Max: 36m 49s | Hits: 344%/1846  
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 6m 53s | Avg: 3m 26s | Max: 4m 53s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total:  6m 53s | Avg:  3m 26s | Max:  4m 53s
    🟩 ctk
      🟩 12.6               Pass: 100%/2   | Total:  6m 53s | Avg:  3m 26s | Max:  4m 53s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/2   | Total:  6m 53s | Avg:  3m 26s | Max:  4m 53s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total:  6m 53s | Avg:  3m 26s | Max:  4m 53s
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total:  6m 53s | Avg:  3m 26s | Max:  4m 53s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total:  6m 53s | Avg:  3m 26s | Max:  4m 53s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/2   | Total:  6m 53s | Avg:  3m 26s | Max:  4m 53s
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 00s | Avg:  2m 00s | Max:  2m 00s
      🟩 Test               Pass: 100%/1   | Total:  4m 53s | Avg:  4m 53s | Max:  4m 53s
    
  • 🟩 python: Pass: 100%/1 | Total: 30m 06s | Avg: 30m 06s | Max: 30m 06s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 30m 06s | Avg: 30m 06s | Max: 30m 06s
    🟩 ctk
      🟩 12.6               Pass: 100%/1   | Total: 30m 06s | Avg: 30m 06s | Max: 30m 06s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/1   | Total: 30m 06s | Avg: 30m 06s | Max: 30m 06s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 30m 06s | Avg: 30m 06s | Max: 30m 06s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 30m 06s | Avg: 30m 06s | Max: 30m 06s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 30m 06s | Avg: 30m 06s | Max: 30m 06s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/1   | Total: 30m 06s | Avg: 30m 06s | Max: 30m 06s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 30m 06s | Avg: 30m 06s | Max: 30m 06s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 89)

# Runner
65 linux-amd64-cpu16
8 windows-amd64-cpu16
6 linux-amd64-gpu-rtxa6000-latest-1
4 linux-arm64-cpu16
3 linux-amd64-gpu-rtx4090-latest-1
2 linux-amd64-gpu-rtx2080-latest-1
1 linux-amd64-gpu-h100-latest-1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Review
Development

Successfully merging this pull request may close these issues.

Add dynamic CUB dispatch path for merge_sort
3 participants