Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move zip_iterator to internally use cuda::std::tuple #3725

Merged
merged 1 commit into from
Feb 6, 2025

Conversation

miscco
Copy link
Collaborator

@miscco miscco commented Feb 6, 2025

We want to get rid of the former and passing in a cuda::std::tuple into thrust::make_zip_iterator has surprising results.

We want to get rid of the former and passing in a `cuda::std::tuple` into `thrust::make_zip_iterator` has surprising results.
@miscco miscco requested a review from a team as a code owner February 6, 2025 19:48
@miscco miscco requested a review from gevtushenko February 6, 2025 19:48
Copy link
Contributor

github-actions bot commented Feb 6, 2025

🟩 CI finished in 1h 53m: Pass: 100%/90 | Total: 2d 19h | Avg: 44m 59s | Max: 1h 23m | Hits: 60%/132225
  • 🟩 cub: Pass: 100%/44 | Total: 1d 15h | Avg: 53m 58s | Max: 1h 23m | Hits: 70%/52320

    🟩 cpu
      🟩 amd64              Pass: 100%/42  | Total:  1d 13h | Avg: 53m 37s | Max:  1h 23m | Hits:  70%/49888 
      🟩 arm64              Pass: 100%/2   | Total:  2h 02m | Avg:  1h 01m | Max:  1h 01m | Hits:  69%/2432  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  4h 46m | Avg: 57m 17s | Max:  1h 03m | Hits:  59%/5914  
      🟩 12.5               Pass: 100%/2   | Total:  2h 14m | Avg:  1h 07m | Max:  1h 09m | Hits:  69%/2250  
      🟩 12.8               Pass: 100%/37  | Total:  1d 08h | Avg: 52m 48s | Max:  1h 23m | Hits:  71%/44156 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  1h 58m | Avg: 59m 09s | Max: 59m 32s | Hits:  75%/2104  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  4h 46m | Avg: 57m 17s | Max:  1h 03m | Hits:  59%/5914  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 14m | Avg:  1h 07m | Max:  1h 09m | Hits:  69%/2250  
      🟩 nvcc12.8           Pass: 100%/35  | Total:  1d 06h | Avg: 52m 26s | Max:  1h 23m | Hits:  71%/42052 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  1h 58m | Avg: 59m 09s | Max: 59m 32s | Hits:  75%/2104  
      🟩 nvcc               Pass: 100%/42  | Total:  1d 13h | Avg: 53m 43s | Max:  1h 23m | Hits:  70%/50216 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  3h 56m | Avg: 59m 01s | Max:  1h 02m | Hits:  69%/4872  
      🟩 Clang15            Pass: 100%/2   | Total:  1h 49m | Avg: 54m 53s | Max: 55m 05s | Hits:  69%/2432  
      🟩 Clang16            Pass: 100%/2   | Total:  1h 55m | Avg: 57m 37s | Max: 57m 44s | Hits:  69%/2432  
      🟩 Clang17            Pass: 100%/2   | Total:  1h 59m | Avg: 59m 30s | Max:  1h 00m | Hits:  69%/2432  
      🟩 Clang18            Pass: 100%/7   | Total:  5h 38m | Avg: 48m 21s | Max:  1h 01m | Hits:  80%/8184  
      🟩 GCC7               Pass: 100%/2   | Total:  1h 53m | Avg: 56m 52s | Max: 57m 26s | Hits:  69%/2436  
      🟩 GCC8               Pass: 100%/1   | Total: 58m 24s | Avg: 58m 24s | Max: 58m 24s | Hits:  69%/1218  
      🟩 GCC9               Pass: 100%/2   | Total:  1h 50m | Avg: 55m 23s | Max: 55m 38s | Hits:  69%/2436  
      🟩 GCC10              Pass: 100%/2   | Total:  1h 58m | Avg: 59m 15s | Max:  1h 02m | Hits:  69%/2436  
      🟩 GCC11              Pass: 100%/2   | Total:  1h 59m | Avg: 59m 42s | Max:  1h 01m | Hits:  69%/2432  
      🟩 GCC12              Pass: 100%/2   | Total:  2h 00m | Avg:  1h 00m | Max:  1h 00m | Hits:  69%/2432  
      🟩 GCC13              Pass: 100%/10  | Total:  6h 30m | Avg: 39m 01s | Max:  1h 12m | Hits:  84%/12160 
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 15m | Avg:  1h 07m | Max:  1h 12m | Hits:  14%/2084  
      🟩 MSVC14.42          Pass: 100%/2   | Total:  2h 34m | Avg:  1h 17m | Max:  1h 23m | Hits:  14%/2084  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 14m | Avg:  1h 07m | Max:  1h 09m | Hits:  69%/2250  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total: 15h 18m | Avg: 54m 02s | Max:  1h 02m | Hits:  74%/20352 
      🟩 GCC                Pass: 100%/21  | Total: 17h 11m | Avg: 49m 06s | Max:  1h 12m | Hits:  76%/25550 
      🟩 MSVC               Pass: 100%/4   | Total:  4h 50m | Avg:  1h 12m | Max:  1h 23m | Hits:  14%/4168  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 14m | Avg:  1h 07m | Max:  1h 09m | Hits:  69%/2250  
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 49m 15s | Avg: 24m 37s | Max: 25m 36s | Hits:  84%/2432  
      🟩 rtx2080            Pass: 100%/34  | Total:  1d 10h | Avg:  1h 01m | Max:  1h 23m | Hits:  64%/40160 
      🟩 rtxa6000           Pass: 100%/8   | Total:  4h 06m | Avg: 30m 48s | Max: 57m 13s | Hits:  92%/9728  
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total:  1d 12h | Avg: 59m 52s | Max:  1h 23m | Hits:  64%/43808 
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 21m 40s | Avg: 21m 40s | Max: 21m 40s | Hits:  99%/1216  
      🟩 GraphCapture       Pass: 100%/1   | Total: 19m 11s | Avg: 19m 11s | Max: 19m 11s | Hits:  99%/1216  
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 17m | Avg: 25m 49s | Max: 27m 35s | Hits:  99%/3648  
      🟩 TestGPU            Pass: 100%/2   | Total: 40m 53s | Avg: 20m 26s | Max: 21m 48s | Hits:  99%/2432  
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 49m 15s | Avg: 24m 37s | Max: 25m 36s | Hits:  84%/2432  
      🟩 90;90a;100         Pass: 100%/1   | Total:  1h 12m | Avg:  1h 12m | Max:  1h 12m | Hits:  69%/1216  
    🟩 std
      🟩 17                 Pass: 100%/20  | Total: 19h 59m | Avg: 59m 59s | Max:  1h 12m | Hits:  62%/23559 
      🟩 20                 Pass: 100%/24  | Total: 19h 35m | Avg: 48m 57s | Max:  1h 23m | Hits:  76%/28761 
    
  • 🟩 thrust: Pass: 100%/43 | Total: 1d 03h | Avg: 37m 59s | Max: 1h 19m | Hits: 54%/79625

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 41m 48s | Avg: 20m 54s | Max: 30m 16s | Hits:  74%/3706  
    🟩 cpu
      🟩 amd64              Pass: 100%/41  | Total:  1d 02h | Avg: 38m 13s | Max:  1h 19m | Hits:  54%/75920 
      🟩 arm64              Pass: 100%/2   | Total:  1h 06m | Avg: 33m 13s | Max: 35m 02s | Hits:  48%/3705  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  3h 31m | Avg: 42m 12s | Max:  1h 04m | Hits:  48%/9256  
      🟩 12.5               Pass: 100%/2   | Total:  2h 30m | Avg:  1h 15m | Max:  1h 19m | Hits:  30%/3704  
      🟩 12.8               Pass: 100%/36  | Total: 21h 11m | Avg: 35m 19s | Max:  1h 16m | Hits:  56%/66665 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  1h 05m | Avg: 32m 52s | Max: 33m 08s | Hits:  48%/3704  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  3h 31m | Avg: 42m 12s | Max:  1h 04m | Hits:  48%/9256  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 30m | Avg:  1h 15m | Max:  1h 19m | Hits:  30%/3704  
      🟩 nvcc12.8           Pass: 100%/34  | Total: 20h 06m | Avg: 35m 28s | Max:  1h 16m | Hits:  56%/62961 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  1h 05m | Avg: 32m 52s | Max: 33m 08s | Hits:  48%/3704  
      🟩 nvcc               Pass: 100%/41  | Total:  1d 02h | Avg: 38m 14s | Max:  1h 19m | Hits:  54%/75921 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  2h 20m | Avg: 35m 12s | Max: 35m 28s | Hits:  55%/7408  
      🟩 Clang15            Pass: 100%/2   | Total:  1h 17m | Avg: 38m 37s | Max: 39m 29s | Hits:  48%/3704  
      🟩 Clang16            Pass: 100%/2   | Total:  1h 16m | Avg: 38m 05s | Max: 38m 53s | Hits:  48%/3704  
      🟩 Clang17            Pass: 100%/2   | Total:  1h 09m | Avg: 34m 52s | Max: 35m 30s | Hits:  48%/3704  
      🟩 Clang18            Pass: 100%/7   | Total:  3h 06m | Avg: 26m 34s | Max: 37m 51s | Hits:  64%/12964 
      🟩 GCC7               Pass: 100%/2   | Total:  1h 13m | Avg: 36m 35s | Max: 36m 37s | Hits:  56%/3706  
      🟩 GCC8               Pass: 100%/1   | Total: 38m 02s | Avg: 38m 02s | Max: 38m 02s | Hits:  48%/1853  
      🟩 GCC9               Pass: 100%/2   | Total:  1h 18m | Avg: 39m 16s | Max: 39m 19s | Hits:  57%/3706  
      🟩 GCC10              Pass: 100%/2   | Total:  1h 12m | Avg: 36m 14s | Max: 36m 18s | Hits:  48%/3706  
      🟩 GCC11              Pass: 100%/2   | Total:  1h 18m | Avg: 39m 13s | Max: 39m 24s | Hits:  48%/3706  
      🟩 GCC12              Pass: 100%/2   | Total:  1h 20m | Avg: 40m 02s | Max: 42m 40s | Hits:  48%/3706  
      🟩 GCC13              Pass: 100%/8   | Total:  3h 32m | Avg: 26m 35s | Max: 40m 49s | Hits:  69%/14824 
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 09m | Avg:  1h 04m | Max:  1h 04m | Hits:  33%/3692  
      🟩 MSVC14.42          Pass: 100%/3   | Total:  2h 50m | Avg: 56m 49s | Max:  1h 16m | Hits:  37%/5538  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 30m | Avg:  1h 15m | Max:  1h 19m | Hits:  30%/3704  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  9h 10m | Avg: 32m 21s | Max: 39m 29s | Hits:  57%/31484 
      🟩 GCC                Pass: 100%/19  | Total: 10h 33m | Avg: 33m 20s | Max: 42m 40s | Hits:  59%/35207 
      🟩 MSVC               Pass: 100%/5   | Total:  4h 59m | Avg: 59m 54s | Max:  1h 16m | Hits:  35%/9230  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 30m | Avg:  1h 15m | Max:  1h 19m | Hits:  30%/3704  
    🟩 gpu
      🟩 rtx2080            Pass: 100%/33  | Total: 22h 48m | Avg: 41m 27s | Max:  1h 19m | Hits:  48%/61112 
      🟩 rtx4090            Pass: 100%/10  | Total:  4h 25m | Avg: 26m 33s | Max:  1h 16m | Hits:  75%/18513 
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total:  1d 01h | Avg: 41m 58s | Max:  1h 19m | Hits:  47%/68516 
      🟩 TestCPU            Pass: 100%/3   | Total: 47m 01s | Avg: 15m 40s | Max: 31m 36s | Hits:  89%/5551  
      🟩 TestGPU            Pass: 100%/3   | Total: 33m 43s | Avg: 11m 14s | Max: 11m 44s | Hits:  99%/5558  
    🟩 sm
      🟩 90;90a;100         Pass: 100%/1   | Total: 40m 49s | Avg: 40m 49s | Max: 40m 49s | Hits:  54%/1853  
    🟩 std
      🟩 17                 Pass: 100%/20  | Total: 14h 17m | Avg: 42m 53s | Max:  1h 19m | Hits:  47%/37031 
      🟩 20                 Pass: 100%/21  | Total: 12h 14m | Avg: 34m 58s | Max:  1h 16m | Hits:  59%/38888 
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 12m 42s | Avg: 6m 21s | Max: 10m 19s | Hits: 98%/280

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 12m 42s | Avg:  6m 21s | Max: 10m 19s | Hits:  98%/280   
    🟩 ctk
      🟩 12.8               Pass: 100%/2   | Total: 12m 42s | Avg:  6m 21s | Max: 10m 19s | Hits:  98%/280   
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/2   | Total: 12m 42s | Avg:  6m 21s | Max: 10m 19s | Hits:  98%/280   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 12m 42s | Avg:  6m 21s | Max: 10m 19s | Hits:  98%/280   
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 12m 42s | Avg:  6m 21s | Max: 10m 19s | Hits:  98%/280   
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 12m 42s | Avg:  6m 21s | Max: 10m 19s | Hits:  98%/280   
    🟩 gpu
      🟩 rtx2080            Pass: 100%/2   | Total: 12m 42s | Avg:  6m 21s | Max: 10m 19s | Hits:  98%/280   
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 23s | Avg:  2m 23s | Max:  2m 23s | Hits:  98%/140   
      🟩 Test               Pass: 100%/1   | Total: 10m 19s | Avg: 10m 19s | Max: 10m 19s | Hits:  98%/140   
    
  • 🟩 python: Pass: 100%/1 | Total: 28m 36s | Avg: 28m 36s | Max: 28m 36s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 28m 36s | Avg: 28m 36s | Max: 28m 36s
    🟩 ctk
      🟩 12.8               Pass: 100%/1   | Total: 28m 36s | Avg: 28m 36s | Max: 28m 36s
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/1   | Total: 28m 36s | Avg: 28m 36s | Max: 28m 36s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 28m 36s | Avg: 28m 36s | Max: 28m 36s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 28m 36s | Avg: 28m 36s | Max: 28m 36s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 28m 36s | Avg: 28m 36s | Max: 28m 36s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/1   | Total: 28m 36s | Avg: 28m 36s | Max: 28m 36s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 28m 36s | Avg: 28m 36s | Max: 28m 36s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
+/- Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 90)

# Runner
65 linux-amd64-cpu16
9 windows-amd64-cpu16
6 linux-amd64-gpu-rtxa6000-latest-1
4 linux-arm64-cpu16
3 linux-amd64-gpu-rtx4090-latest-1
2 linux-amd64-gpu-rtx2080-latest-1
1 linux-amd64-gpu-h100-latest-1

@bernhardmgruber bernhardmgruber merged commit 743457a into NVIDIA:main Feb 6, 2025
104 of 106 checks passed
@miscco miscco deleted the move_zip_iterator_std branch February 11, 2025 17:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

2 participants