Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize and cleanup cuda::std::rotl and cuda::std::rotr #3228

Merged
merged 22 commits into from
Mar 4, 2025

Conversation

fbusato
Copy link
Contributor

@fbusato fbusato commented Jan 1, 2025

Fixes #2239

Description

Use funnel shift intrinsic to optimize cuda::std::rotl and cuda::std::rotr

Requires: #3414

@fbusato fbusato requested review from a team as code owners January 1, 2025 00:24
@fbusato fbusato requested review from miscco and gonidelis January 1, 2025 00:24
@fbusato fbusato changed the title Optimize and clean optimize cuda::std::rotl and cuda::std::rotr [DO NOT MERGE] Optimize and clean optimize cuda::std::rotl and cuda::std::rotr Jan 1, 2025
Copy link
Contributor

@miscco miscco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot break existing users even if we deprecated C++11, so changing that requirement is unfortunately a no go

@fbusato fbusato self-assigned this Jan 2, 2025
@fbusato fbusato added the 3.0 Targeted for 3.0 release label Jan 16, 2025
@fbusato fbusato changed the title [DO NOT MERGE] Optimize and clean optimize cuda::std::rotl and cuda::std::rotr Optimize and cleanup cuda::std::rotl and cuda::std::rotr Jan 24, 2025
@fbusato fbusato requested a review from miscco January 24, 2025 19:06
Copy link
Contributor

🟨 CI finished in 1h 47m: Pass: 98%/158 | Total: 3d 05h | Avg: 29m 23s | Max: 1h 16m | Hits: 68%/240318
  • 🟨 libcudacxx: Pass: 93%/43 | Total: 10h 22m | Avg: 14m 28s | Max: 30m 48s | Hits: 69%/94521

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  92%/41  | Total: 10h 05m | Avg: 14m 46s | Max: 30m 48s | Hits:  68%/88880 
      🟩 arm64              Pass: 100%/2   | Total: 16m 44s | Avg:  8m 22s | Max: 10m 05s | Hits:  85%/5641  
    🔍 cudacxx_family: nvcc 🔍
      🟩 ClangCUDA          Pass: 100%/2   | Total: 40m 01s | Avg: 20m 00s | Max: 21m 31s | Hits:  26%/5606  
      🔍 nvcc               Pass:  92%/41  | Total:  9h 42m | Avg: 14m 12s | Max: 30m 48s | Hits:  72%/88915 
    🔍 cxx_family: GCC 🔍
      🟩 Clang              Pass: 100%/16  | Total:  3h 42m | Avg: 13m 53s | Max: 23m 44s | Hits:  63%/41994 
      🔍 GCC                Pass:  85%/21  | Total:  4h 00m | Avg: 11m 28s | Max: 21m 52s | Hits:  79%/36633 
      🟩 MSVC               Pass: 100%/4   | Total:  1h 53m | Avg: 28m 17s | Max: 30m 48s | Hits:  61%/10308 
      🟩 NVHPC              Pass: 100%/2   | Total: 45m 58s | Avg: 22m 59s | Max: 30m 32s | Hits:  61%/5586  
    🔍 gpu: rtx2080 🔍
      🟩 h100               Pass: 100%/2   | Total: 17m 29s | Avg:  8m 44s | Max: 11m 59s | Hits:  92%/2910  
      🔍 rtx2080            Pass:  92%/41  | Total: 10h 04m | Avg: 14m 45s | Max: 30m 48s | Hits:  68%/91611 
    🔍 jobs: Build 🔍
      🔍 Build              Pass:  91%/37  | Total:  9h 06m | Avg: 14m 46s | Max: 30m 48s | Hits:  69%/94481 
      🟩 NVRTC              Pass: 100%/2   | Total: 31m 35s | Avg: 15m 47s | Max: 16m 21s | Hits:  90%/40    
      🟩 Test               Pass: 100%/3   | Total: 41m 51s | Avg: 13m 57s | Max: 15m 54s
      🟩 VerifyCodegen      Pass: 100%/1   | Total:  2m 19s | Avg:  2m 19s | Max:  2m 19s
    🔍 std: 17 🔍
      🔍 17                 Pass:  85%/21  | Total:  5h 17m | Avg: 15m 06s | Max: 30m 48s | Hits:  71%/46514 
      🟩 20                 Pass: 100%/21  | Total:  5h 02m | Avg: 14m 24s | Max: 30m 35s | Hits:  67%/48007 
    🟨 ctk
      🟨 12.0               Pass:  80%/5   | Total:  1h 04m | Avg: 12m 55s | Max: 24m 36s | Hits:  88%/10879 
      🟩 12.5               Pass: 100%/2   | Total: 45m 58s | Avg: 22m 59s | Max: 30m 32s | Hits:  61%/5586  
      🟨 12.8               Pass:  94%/36  | Total:  8h 31m | Avg: 14m 12s | Max: 30m 48s | Hits:  67%/78056 
    🟨 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 40m 01s | Avg: 20m 00s | Max: 21m 31s | Hits:  26%/5606  
      🟨 nvcc12.0           Pass:  80%/5   | Total:  1h 04m | Avg: 12m 55s | Max: 24m 36s | Hits:  88%/10879 
      🟩 nvcc12.5           Pass: 100%/2   | Total: 45m 58s | Avg: 22m 59s | Max: 30m 32s | Hits:  61%/5586  
      🟨 nvcc12.8           Pass:  94%/34  | Total:  7h 51m | Avg: 13m 52s | Max: 30m 48s | Hits:  70%/72450 
    🟨 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 43m 00s | Avg: 10m 45s | Max: 22m 14s | Hits:  77%/11176 
      🟩 Clang15            Pass: 100%/2   | Total: 45m 00s | Avg: 22m 30s | Max: 22m 31s | Hits:  32%/5598  
      🟩 Clang16            Pass: 100%/2   | Total: 14m 12s | Avg:  7m 06s | Max:  7m 17s | Hits:  92%/5598  
      🟩 Clang17            Pass: 100%/2   | Total: 44m 59s | Avg: 22m 29s | Max: 23m 44s | Hits:  33%/5598  
      🟩 Clang18            Pass: 100%/6   | Total:  1h 15m | Avg: 12m 30s | Max: 21m 31s | Hits:  65%/14024 
      🟥 GCC7               Pass:   0%/2   | Total: 33m 51s | Avg: 16m 55s | Max: 20m 03s
      🟥 GCC8               Pass:   0%/1   | Total: 11m 03s | Avg: 11m 03s | Max: 11m 03s
      🟩 GCC9               Pass: 100%/2   | Total: 14m 32s | Avg:  7m 16s | Max:  7m 41s | Hits:  89%/5548  
      🟩 GCC10              Pass: 100%/2   | Total: 20m 01s | Avg: 10m 00s | Max: 13m 30s | Hits:  83%/5604  
      🟩 GCC11              Pass: 100%/2   | Total: 28m 37s | Avg: 14m 18s | Max: 21m 52s | Hits:  63%/5600  
      🟩 GCC12              Pass: 100%/2   | Total: 21m 35s | Avg: 10m 47s | Max: 10m 54s | Hits:  80%/5600  
      🟩 GCC13              Pass: 100%/10  | Total:  1h 51m | Avg: 11m 07s | Max: 16m 21s | Hits:  80%/14281 
      🟩 MSVC14.29          Pass: 100%/2   | Total: 51m 47s | Avg: 25m 53s | Max: 27m 11s | Hits:  60%/5074  
      🟩 MSVC14.42          Pass: 100%/2   | Total:  1h 01m | Avg: 30m 41s | Max: 30m 48s | Hits:  61%/5234  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 45m 58s | Avg: 22m 59s | Max: 30m 32s | Hits:  61%/5586  
    🟩 sm
      🟩 75                 Pass: 100%/2   | Total: 31m 35s | Avg: 15m 47s | Max: 16m 21s | Hits:  90%/40    
      🟩 90                 Pass: 100%/2   | Total: 17m 29s | Avg:  8m 44s | Max: 11m 59s | Hits:  92%/2910  
      🟩 90;90a;100         Pass: 100%/1   | Total: 13m 50s | Avg: 13m 50s | Max: 13m 50s | Hits:  67%/2910  
    
  • 🟩 cub: Pass: 100%/45 | Total: 1d 18h | Avg: 56m 28s | Max: 1h 16m | Hits: 47%/53761

    🟩 cpu
      🟩 amd64              Pass: 100%/43  | Total:  1d 16h | Avg: 56m 13s | Max:  1h 16m | Hits:  47%/51319 
      🟩 arm64              Pass: 100%/2   | Total:  2h 03m | Avg:  1h 01m | Max:  1h 01m | Hits:  38%/2442  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  5h 15m | Avg:  1h 03m | Max:  1h 04m | Hits:  33%/5939  
      🟩 12.5               Pass: 100%/2   | Total:  2h 25m | Avg:  1h 12m | Max:  1h 14m | Hits:  35%/2260  
      🟩 12.8               Pass: 100%/38  | Total:  1d 10h | Avg: 54m 43s | Max:  1h 16m | Hits:  49%/45562 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  2h 10m | Avg:  1h 05m | Max:  1h 06m | Hits:  39%/2114  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  5h 15m | Avg:  1h 03m | Max:  1h 04m | Hits:  33%/5939  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 25m | Avg:  1h 12m | Max:  1h 14m | Hits:  35%/2260  
      🟩 nvcc12.8           Pass: 100%/36  | Total:  1d 08h | Avg: 54m 09s | Max:  1h 16m | Hits:  50%/43448 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  2h 10m | Avg:  1h 05m | Max:  1h 06m | Hits:  39%/2114  
      🟩 nvcc               Pass: 100%/43  | Total:  1d 16h | Avg: 56m 03s | Max:  1h 16m | Hits:  47%/51647 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  4h 10m | Avg:  1h 02m | Max:  1h 04m | Hits:  38%/4892  
      🟩 Clang15            Pass: 100%/2   | Total:  2h 08m | Avg:  1h 04m | Max:  1h 06m | Hits:  38%/2442  
      🟩 Clang16            Pass: 100%/2   | Total:  1h 59m | Avg: 59m 30s | Max:  1h 01m | Hits:  38%/2442  
      🟩 Clang17            Pass: 100%/2   | Total:  2h 10m | Avg:  1h 05m | Max:  1h 06m | Hits:  38%/2442  
      🟩 Clang18            Pass: 100%/7   | Total:  6h 04m | Avg: 52m 07s | Max:  1h 06m | Hits:  57%/8219  
      🟩 GCC7               Pass: 100%/2   | Total:  1h 58m | Avg: 59m 29s | Max: 59m 29s | Hits:  38%/2446  
      🟩 GCC8               Pass: 100%/1   | Total:  1h 01m | Avg:  1h 01m | Max:  1h 01m | Hits:  38%/1223  
      🟩 GCC9               Pass: 100%/2   | Total:  2h 09m | Avg:  1h 04m | Max:  1h 04m | Hits:  38%/2446  
      🟩 GCC10              Pass: 100%/2   | Total:  2h 09m | Avg:  1h 04m | Max:  1h 06m | Hits:  38%/2446  
      🟩 GCC11              Pass: 100%/2   | Total:  2h 06m | Avg:  1h 03m | Max:  1h 06m | Hits:  38%/2442  
      🟩 GCC12              Pass: 100%/2   | Total:  1h 59m | Avg: 59m 56s | Max:  1h 02m | Hits:  38%/2442  
      🟩 GCC13              Pass: 100%/11  | Total:  7h 08m | Avg: 38m 55s | Max:  1h 14m | Hits:  71%/13431 
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 17m | Avg:  1h 08m | Max:  1h 12m | Hits:  12%/2094  
      🟩 MSVC14.42          Pass: 100%/2   | Total:  2h 31m | Avg:  1h 15m | Max:  1h 16m | Hits:  12%/2094  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 25m | Avg:  1h 12m | Max:  1h 14m | Hits:  35%/2260  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total: 16h 33m | Avg: 58m 27s | Max:  1h 06m | Hits:  45%/20437 
      🟩 GCC                Pass: 100%/22  | Total: 18h 33m | Avg: 50m 36s | Max:  1h 14m | Hits:  54%/26876 
      🟩 MSVC               Pass: 100%/4   | Total:  4h 48m | Avg:  1h 12m | Max:  1h 16m | Hits:  12%/4188  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 25m | Avg:  1h 12m | Max:  1h 14m | Hits:  35%/2260  
    🟩 gpu
      🟩 h100               Pass: 100%/3   | Total:  1h 16m | Avg: 25m 35s | Max: 29m 39s | Hits:  79%/3663  
      🟩 rtx2080            Pass: 100%/34  | Total:  1d 12h | Avg:  1h 04m | Max:  1h 16m | Hits:  35%/40330 
      🟩 rtxa6000           Pass: 100%/8   | Total:  4h 22m | Avg: 32m 52s | Max:  1h 06m | Hits:  84%/9768  
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total:  1d 15h | Avg:  1h 03m | Max:  1h 16m | Hits:  35%/43993 
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 22m 11s | Avg: 22m 11s | Max: 22m 11s | Hits:  99%/1221  
      🟩 GraphCapture       Pass: 100%/1   | Total: 16m 34s | Avg: 16m 34s | Max: 16m 34s | Hits:  99%/1221  
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 14m | Avg: 24m 58s | Max: 25m 22s | Hits:  99%/3663  
      🟩 TestGPU            Pass: 100%/3   | Total:  1h 03m | Avg: 21m 19s | Max: 22m 54s | Hits:  99%/3663  
    🟩 sm
      🟩 90                 Pass: 100%/3   | Total:  1h 16m | Avg: 25m 35s | Max: 29m 39s | Hits:  79%/3663  
      🟩 90;90a;100         Pass: 100%/1   | Total:  1h 14m | Avg:  1h 14m | Max:  1h 14m | Hits:  38%/1221  
    🟩 std
      🟩 17                 Pass: 100%/20  | Total: 21h 19m | Avg:  1h 03m | Max:  1h 16m | Hits:  34%/23659 
      🟩 20                 Pass: 100%/25  | Total: 21h 02m | Avg: 50m 28s | Max:  1h 15m | Hits:  57%/30102 
    
  • 🟩 thrust: Pass: 100%/45 | Total: 21h 48m | Avg: 29m 05s | Max: 56m 50s | Hits: 76%/80496

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 36m 11s | Avg: 18m 05s | Max: 25m 08s | Hits:  88%/3580  
    🟩 cpu
      🟩 amd64              Pass: 100%/43  | Total: 20h 55m | Avg: 29m 12s | Max: 56m 50s | Hits:  76%/76917 
      🟩 arm64              Pass: 100%/2   | Total: 53m 00s | Avg: 26m 30s | Max: 27m 36s | Hits:  77%/3579  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  2h 49m | Avg: 33m 54s | Max: 51m 37s | Hits:  66%/8941  
      🟩 12.5               Pass: 100%/2   | Total:  1h 39m | Avg: 49m 47s | Max: 52m 31s | Hits:  64%/3578  
      🟩 12.8               Pass: 100%/38  | Total: 17h 19m | Avg: 27m 21s | Max: 56m 50s | Hits:  78%/67977 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 46m 49s | Avg: 23m 24s | Max: 23m 50s | Hits:  77%/3578  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  2h 49m | Avg: 33m 54s | Max: 51m 37s | Hits:  66%/8941  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  1h 39m | Avg: 49m 47s | Max: 52m 31s | Hits:  64%/3578  
      🟩 nvcc12.8           Pass: 100%/36  | Total: 16h 32m | Avg: 27m 34s | Max: 56m 50s | Hits:  78%/64399 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 46m 49s | Avg: 23m 24s | Max: 23m 50s | Hits:  77%/3578  
      🟩 nvcc               Pass: 100%/43  | Total: 21h 02m | Avg: 29m 20s | Max: 56m 50s | Hits:  76%/76918 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  1h 49m | Avg: 27m 23s | Max: 28m 59s | Hits:  77%/7156  
      🟩 Clang15            Pass: 100%/2   | Total: 56m 21s | Avg: 28m 10s | Max: 30m 48s | Hits:  77%/3578  
      🟩 Clang16            Pass: 100%/2   | Total: 57m 54s | Avg: 28m 57s | Max: 30m 06s | Hits:  77%/3578  
      🟩 Clang17            Pass: 100%/2   | Total: 55m 42s | Avg: 27m 51s | Max: 27m 53s | Hits:  77%/3578  
      🟩 Clang18            Pass: 100%/7   | Total:  2h 26m | Avg: 20m 52s | Max: 30m 23s | Hits:  83%/12523 
      🟩 GCC7               Pass: 100%/2   | Total:  1h 00m | Avg: 30m 20s | Max: 30m 30s | Hits:  77%/3580  
      🟩 GCC8               Pass: 100%/1   | Total: 29m 25s | Avg: 29m 25s | Max: 29m 25s | Hits:  77%/1790  
      🟩 GCC9               Pass: 100%/2   | Total:  1h 03m | Avg: 31m 43s | Max: 32m 06s | Hits:  62%/3580  
      🟩 GCC10              Pass: 100%/2   | Total: 56m 12s | Avg: 28m 06s | Max: 28m 40s | Hits:  77%/3580  
      🟩 GCC11              Pass: 100%/2   | Total:  1h 02m | Avg: 31m 01s | Max: 31m 41s | Hits:  77%/3580  
      🟩 GCC12              Pass: 100%/2   | Total:  1h 04m | Avg: 32m 08s | Max: 32m 46s | Hits:  77%/3580  
      🟩 GCC13              Pass: 100%/10  | Total:  3h 25m | Avg: 20m 34s | Max: 33m 07s | Hits:  86%/17900 
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 40m | Avg: 50m 04s | Max: 51m 37s | Hits:  55%/3566  
      🟩 MSVC14.42          Pass: 100%/3   | Total:  2h 21m | Avg: 47m 14s | Max: 56m 50s | Hits:  60%/5349  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  1h 39m | Avg: 49m 47s | Max: 52m 31s | Hits:  64%/3578  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  7h 05m | Avg: 25m 02s | Max: 30m 48s | Hits:  79%/30413 
      🟩 GCC                Pass: 100%/21  | Total:  9h 01m | Avg: 25m 47s | Max: 33m 07s | Hits:  80%/37590 
      🟩 MSVC               Pass: 100%/5   | Total:  4h 01m | Avg: 48m 22s | Max: 56m 50s | Hits:  58%/8915  
      🟩 NVHPC              Pass: 100%/2   | Total:  1h 39m | Avg: 49m 47s | Max: 52m 31s | Hits:  64%/3578  
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 28m 03s | Avg: 14m 01s | Max: 16m 38s | Hits:  88%/3580  
      🟩 rtx2080            Pass: 100%/33  | Total: 17h 34m | Avg: 31m 57s | Max: 52m 31s | Hits:  73%/59033 
      🟩 rtx4090            Pass: 100%/10  | Total:  3h 46m | Avg: 22m 38s | Max: 56m 50s | Hits:  85%/17883 
    🟩 jobs
      🟩 Build              Pass: 100%/38  | Total: 20h 16m | Avg: 32m 00s | Max: 56m 50s | Hits:  73%/67975 
      🟩 TestCPU            Pass: 100%/3   | Total: 48m 40s | Avg: 16m 13s | Max: 33m 14s | Hits:  90%/5362  
      🟩 TestGPU            Pass: 100%/4   | Total: 43m 43s | Avg: 10m 55s | Max: 11m 25s | Hits:  99%/7159  
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 28m 03s | Avg: 14m 01s | Max: 16m 38s | Hits:  88%/3580  
      🟩 90;90a;100         Pass: 100%/1   | Total: 32m 14s | Avg: 32m 14s | Max: 32m 14s | Hits:  77%/1790  
    🟩 std
      🟩 17                 Pass: 100%/20  | Total: 11h 02m | Avg: 33m 07s | Max: 51m 38s | Hits:  71%/35771 
      🟩 20                 Pass: 100%/23  | Total: 10h 10m | Avg: 26m 31s | Max: 56m 50s | Hits:  80%/41145 
    
  • 🟩 cudax: Pass: 100%/22 | Total: 2h 09m | Avg: 5m 53s | Max: 13m 52s | Hits: 96%/11244

    🟩 cpu
      🟩 amd64              Pass: 100%/18  | Total:  1h 55m | Avg:  6m 23s | Max: 13m 52s | Hits:  96%/9020  
      🟩 arm64              Pass: 100%/4   | Total: 14m 41s | Avg:  3m 40s | Max:  3m 54s | Hits:  98%/2224  
    🟩 ctk
      🟩 12.0               Pass: 100%/1   | Total:  9m 55s | Avg:  9m 55s | Max:  9m 55s | Hits:  60%/262   
      🟩 12.5               Pass: 100%/2   | Total: 12m 28s | Avg:  6m 14s | Max:  6m 16s | Hits:  95%/708   
      🟩 12.8               Pass: 100%/19  | Total:  1h 47m | Avg:  5m 38s | Max: 13m 52s | Hits:  97%/10274 
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/1   | Total:  9m 55s | Avg:  9m 55s | Max:  9m 55s | Hits:  60%/262   
      🟩 nvcc12.5           Pass: 100%/2   | Total: 12m 28s | Avg:  6m 14s | Max:  6m 16s | Hits:  95%/708   
      🟩 nvcc12.8           Pass: 100%/19  | Total:  1h 47m | Avg:  5m 38s | Max: 13m 52s | Hits:  97%/10274 
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/22  | Total:  2h 09m | Avg:  5m 53s | Max: 13m 52s | Hits:  96%/11244 
    🟩 cxx
      🟩 Clang14            Pass: 100%/1   | Total:  4m 05s | Avg:  4m 05s | Max:  4m 05s | Hits:  98%/558   
      🟩 Clang15            Pass: 100%/1   | Total:  4m 17s | Avg:  4m 17s | Max:  4m 17s | Hits:  98%/556   
      🟩 Clang16            Pass: 100%/1   | Total:  4m 25s | Avg:  4m 25s | Max:  4m 25s | Hits:  98%/556   
      🟩 Clang17            Pass: 100%/1   | Total:  4m 15s | Avg:  4m 15s | Max:  4m 15s | Hits:  98%/556   
      🟩 Clang18            Pass: 100%/4   | Total: 23m 01s | Avg:  5m 45s | Max: 12m 14s | Hits:  98%/2224  
      🟩 GCC10              Pass: 100%/1   | Total:  4m 25s | Avg:  4m 25s | Max:  4m 25s | Hits:  98%/558   
      🟩 GCC11              Pass: 100%/1   | Total:  4m 20s | Avg:  4m 20s | Max:  4m 20s | Hits:  98%/556   
      🟩 GCC12              Pass: 100%/2   | Total: 16m 27s | Avg:  8m 13s | Max: 12m 22s | Hits:  98%/1112  
      🟩 GCC13              Pass: 100%/6   | Total: 32m 14s | Avg:  5m 22s | Max: 13m 52s | Hits:  98%/3336  
      🟩 MSVC14.39          Pass: 100%/1   | Total:  9m 55s | Avg:  9m 55s | Max:  9m 55s | Hits:  60%/262   
      🟩 MSVC14.42          Pass: 100%/1   | Total:  9m 51s | Avg:  9m 51s | Max:  9m 51s | Hits:  60%/262   
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 12m 28s | Avg:  6m 14s | Max:  6m 16s | Hits:  95%/708   
    🟩 cxx_family
      🟩 Clang              Pass: 100%/8   | Total: 40m 03s | Avg:  5m 00s | Max: 12m 14s | Hits:  98%/4450  
      🟩 GCC                Pass: 100%/10  | Total: 57m 26s | Avg:  5m 44s | Max: 13m 52s | Hits:  98%/5562  
      🟩 MSVC               Pass: 100%/2   | Total: 19m 46s | Avg:  9m 53s | Max:  9m 55s | Hits:  60%/524   
      🟩 NVHPC              Pass: 100%/2   | Total: 12m 28s | Avg:  6m 14s | Max:  6m 16s | Hits:  95%/708   
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 17m 21s | Avg:  8m 40s | Max: 13m 52s | Hits:  98%/1112  
      🟩 rtx2080            Pass: 100%/20  | Total:  1h 52m | Avg:  5m 37s | Max: 12m 22s | Hits:  96%/10132 
    🟩 jobs
      🟩 Build              Pass: 100%/19  | Total:  1h 31m | Avg:  4m 48s | Max:  9m 55s | Hits:  96%/9576  
      🟩 Test               Pass: 100%/3   | Total: 38m 28s | Avg: 12m 49s | Max: 13m 52s | Hits:  99%/1668  
    🟩 sm
      🟩 90                 Pass: 100%/3   | Total: 21m 04s | Avg:  7m 01s | Max: 13m 52s | Hits:  98%/1668  
      🟩 90a                Pass: 100%/1   | Total:  3m 30s | Avg:  3m 30s | Max:  3m 30s | Hits:  98%/556   
    🟩 std
      🟩 17                 Pass: 100%/4   | Total: 17m 19s | Avg:  4m 19s | Max:  6m 12s | Hits:  97%/2022  
      🟩 20                 Pass: 100%/18  | Total:  1h 52m | Avg:  6m 14s | Max: 13m 52s | Hits:  96%/9222  
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 12m 38s | Avg: 6m 19s | Max: 10m 16s | Hits: 97%/296

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 12m 38s | Avg:  6m 19s | Max: 10m 16s | Hits:  97%/296   
    🟩 ctk
      🟩 12.8               Pass: 100%/2   | Total: 12m 38s | Avg:  6m 19s | Max: 10m 16s | Hits:  97%/296   
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/2   | Total: 12m 38s | Avg:  6m 19s | Max: 10m 16s | Hits:  97%/296   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 12m 38s | Avg:  6m 19s | Max: 10m 16s | Hits:  97%/296   
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 12m 38s | Avg:  6m 19s | Max: 10m 16s | Hits:  97%/296   
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 12m 38s | Avg:  6m 19s | Max: 10m 16s | Hits:  97%/296   
    🟩 gpu
      🟩 rtx2080            Pass: 100%/2   | Total: 12m 38s | Avg:  6m 19s | Max: 10m 16s | Hits:  97%/296   
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 22s | Avg:  2m 22s | Max:  2m 22s | Hits:  97%/148   
      🟩 Test               Pass: 100%/1   | Total: 10m 16s | Avg: 10m 16s | Max: 10m 16s | Hits:  98%/148   
    
  • 🟩 python: Pass: 100%/1 | Total: 30m 36s | Avg: 30m 36s | Max: 30m 36s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 30m 36s | Avg: 30m 36s | Max: 30m 36s
    🟩 ctk
      🟩 12.8               Pass: 100%/1   | Total: 30m 36s | Avg: 30m 36s | Max: 30m 36s
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/1   | Total: 30m 36s | Avg: 30m 36s | Max: 30m 36s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 30m 36s | Avg: 30m 36s | Max: 30m 36s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 30m 36s | Avg: 30m 36s | Max: 30m 36s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 30m 36s | Avg: 30m 36s | Max: 30m 36s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/1   | Total: 30m 36s | Avg: 30m 36s | Max: 30m 36s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 30m 36s | Avg: 30m 36s | Max: 30m 36s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
+/- libcu++
CUB
Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
+/- libcu++
+/- CUB
+/- Thrust
+/- CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 158)

# Runner
111 linux-amd64-cpu16
15 windows-amd64-cpu16
10 linux-arm64-cpu16
8 linux-amd64-gpu-rtx2080-latest-1
6 linux-amd64-gpu-rtxa6000-latest-1
5 linux-amd64-gpu-h100-latest-1
3 linux-amd64-gpu-rtx4090-latest-1

Copy link
Contributor

🟩 CI finished in 3h 05m: Pass: 100%/158 | Total: 3d 05h | Avg: 29m 28s | Max: 1h 45m | Hits: 70%/248632
  • 🟩 cub: Pass: 100%/45 | Total: 1d 19h | Avg: 58m 04s | Max: 1h 45m | Hits: 46%/53761

    🟩 cpu
      🟩 amd64              Pass: 100%/43  | Total:  1d 17h | Avg: 57m 54s | Max:  1h 45m | Hits:  46%/51319 
      🟩 arm64              Pass: 100%/2   | Total:  2h 03m | Avg:  1h 01m | Max:  1h 01m | Hits:  38%/2442  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  5h 01m | Avg:  1h 00m | Max:  1h 03m | Hits:  33%/5939  
      🟩 12.5               Pass: 100%/2   | Total:  2h 18m | Avg:  1h 09m | Max:  1h 09m | Hits:  35%/2260  
      🟩 12.8               Pass: 100%/38  | Total:  1d 12h | Avg: 57m 10s | Max:  1h 45m | Hits:  48%/45562 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  2h 04m | Avg:  1h 02m | Max:  1h 03m | Hits:  39%/2114  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  5h 01m | Avg:  1h 00m | Max:  1h 03m | Hits:  33%/5939  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 18m | Avg:  1h 09m | Max:  1h 09m | Hits:  35%/2260  
      🟩 nvcc12.8           Pass: 100%/36  | Total:  1d 10h | Avg: 56m 54s | Max:  1h 45m | Hits:  48%/43448 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  2h 04m | Avg:  1h 02m | Max:  1h 03m | Hits:  39%/2114  
      🟩 nvcc               Pass: 100%/43  | Total:  1d 17h | Avg: 57m 53s | Max:  1h 45m | Hits:  46%/51647 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  4h 07m | Avg:  1h 01m | Max:  1h 04m | Hits:  38%/4892  
      🟩 Clang15            Pass: 100%/2   | Total:  2h 04m | Avg:  1h 02m | Max:  1h 04m | Hits:  38%/2442  
      🟩 Clang16            Pass: 100%/2   | Total:  2h 09m | Avg:  1h 04m | Max:  1h 04m | Hits:  38%/2442  
      🟩 Clang17            Pass: 100%/2   | Total:  2h 02m | Avg:  1h 01m | Max:  1h 02m | Hits:  38%/2442  
      🟩 Clang18            Pass: 100%/7   | Total:  5h 58m | Avg: 51m 10s | Max:  1h 05m | Hits:  57%/8219  
      🟩 GCC7               Pass: 100%/2   | Total:  1h 58m | Avg: 59m 19s | Max:  1h 00m | Hits:  38%/2446  
      🟩 GCC8               Pass: 100%/1   | Total:  1h 02m | Avg:  1h 02m | Max:  1h 02m | Hits:  38%/1223  
      🟩 GCC9               Pass: 100%/2   | Total:  2h 02m | Avg:  1h 01m | Max:  1h 04m | Hits:  38%/2446  
      🟩 GCC10              Pass: 100%/2   | Total:  2h 08m | Avg:  1h 04m | Max:  1h 04m | Hits:  38%/2446  
      🟩 GCC11              Pass: 100%/2   | Total:  2h 07m | Avg:  1h 03m | Max:  1h 04m | Hits:  38%/2442  
      🟩 GCC12              Pass: 100%/2   | Total:  2h 08m | Avg:  1h 04m | Max:  1h 06m | Hits:  38%/2442  
      🟩 GCC13              Pass: 100%/11  | Total:  8h 38m | Avg: 47m 06s | Max:  1h 45m | Hits:  66%/13431 
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 15m | Avg:  1h 07m | Max:  1h 11m | Hits:  12%/2094  
      🟩 MSVC14.42          Pass: 100%/2   | Total:  2h 32m | Avg:  1h 16m | Max:  1h 21m | Hits:  12%/2094  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 18m | Avg:  1h 09m | Max:  1h 09m | Hits:  35%/2260  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total: 16h 21m | Avg: 57m 45s | Max:  1h 05m | Hits:  45%/20437 
      🟩 GCC                Pass: 100%/22  | Total: 20h 05m | Avg: 54m 46s | Max:  1h 45m | Hits:  52%/26876 
      🟩 MSVC               Pass: 100%/4   | Total:  4h 47m | Avg:  1h 11m | Max:  1h 21m | Hits:  12%/4188  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 18m | Avg:  1h 09m | Max:  1h 09m | Hits:  35%/2260  
    🟩 gpu
      🟩 h100               Pass: 100%/3   | Total:  1h 35m | Avg: 31m 42s | Max: 44m 29s | Hits:  69%/3663  
      🟩 rtx2080            Pass: 100%/34  | Total:  1d 12h | Avg:  1h 04m | Max:  1h 21m | Hits:  35%/40330 
      🟩 rtxa6000           Pass: 100%/8   | Total:  5h 39m | Avg: 42m 22s | Max:  1h 45m | Hits:  80%/9768  
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total:  1d 14h | Avg:  1h 03m | Max:  1h 21m | Hits:  35%/43993 
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 20m 38s | Avg: 20m 38s | Max: 20m 38s | Hits:  99%/1221  
      🟩 GraphCapture       Pass: 100%/1   | Total: 16m 39s | Avg: 16m 39s | Max: 16m 39s | Hits:  99%/1221  
      🟩 HostLaunch         Pass: 100%/3   | Total:  2h 34m | Avg: 51m 22s | Max:  1h 45m | Hits:  89%/3663  
      🟩 TestGPU            Pass: 100%/3   | Total:  1h 26m | Avg: 28m 48s | Max: 44m 29s | Hits:  89%/3663  
    🟩 sm
      🟩 90                 Pass: 100%/3   | Total:  1h 35m | Avg: 31m 42s | Max: 44m 29s | Hits:  69%/3663  
      🟩 90;90a;100         Pass: 100%/1   | Total:  1h 10m | Avg:  1h 10m | Max:  1h 10m | Hits:  38%/1221  
    🟩 std
      🟩 17                 Pass: 100%/20  | Total: 21h 08m | Avg:  1h 03m | Max:  1h 11m | Hits:  34%/23659 
      🟩 20                 Pass: 100%/25  | Total: 22h 24m | Avg: 53m 46s | Max:  1h 45m | Hits:  54%/30102 
    
  • 🟩 thrust: Pass: 100%/45 | Total: 21h 25m | Avg: 28m 34s | Max: 59m 22s | Hits: 77%/80496

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 33m 07s | Avg: 16m 33s | Max: 22m 01s | Hits:  88%/3580  
    🟩 cpu
      🟩 amd64              Pass: 100%/43  | Total: 20h 32m | Avg: 28m 39s | Max: 59m 22s | Hits:  77%/76917 
      🟩 arm64              Pass: 100%/2   | Total: 53m 14s | Avg: 26m 37s | Max: 29m 09s | Hits:  77%/3579  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  2h 40m | Avg: 32m 06s | Max: 48m 11s | Hits:  72%/8941  
      🟩 12.5               Pass: 100%/2   | Total:  1h 38m | Avg: 49m 06s | Max: 50m 52s | Hits:  64%/3578  
      🟩 12.8               Pass: 100%/38  | Total: 17h 06m | Avg: 27m 01s | Max: 59m 22s | Hits:  78%/67977 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 46m 40s | Avg: 23m 20s | Max: 24m 04s | Hits:  77%/3578  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  2h 40m | Avg: 32m 06s | Max: 48m 11s | Hits:  72%/8941  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  1h 38m | Avg: 49m 06s | Max: 50m 52s | Hits:  64%/3578  
      🟩 nvcc12.8           Pass: 100%/36  | Total: 16h 20m | Avg: 27m 13s | Max: 59m 22s | Hits:  78%/64399 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 46m 40s | Avg: 23m 20s | Max: 24m 04s | Hits:  77%/3578  
      🟩 nvcc               Pass: 100%/43  | Total: 20h 38m | Avg: 28m 48s | Max: 59m 22s | Hits:  77%/76918 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  1h 51m | Avg: 27m 59s | Max: 29m 17s | Hits:  77%/7156  
      🟩 Clang15            Pass: 100%/2   | Total: 57m 36s | Avg: 28m 48s | Max: 29m 44s | Hits:  77%/3578  
      🟩 Clang16            Pass: 100%/2   | Total: 53m 47s | Avg: 26m 53s | Max: 28m 00s | Hits:  77%/3578  
      🟩 Clang17            Pass: 100%/2   | Total: 57m 38s | Avg: 28m 49s | Max: 29m 27s | Hits:  77%/3578  
      🟩 Clang18            Pass: 100%/7   | Total:  2h 22m | Avg: 20m 18s | Max: 27m 25s | Hits:  83%/12523 
      🟩 GCC7               Pass: 100%/2   | Total: 53m 21s | Avg: 26m 40s | Max: 27m 35s | Hits:  77%/3580  
      🟩 GCC8               Pass: 100%/1   | Total: 29m 08s | Avg: 29m 08s | Max: 29m 08s | Hits:  77%/1790  
      🟩 GCC9               Pass: 100%/2   | Total: 58m 37s | Avg: 29m 18s | Max: 31m 09s | Hits:  77%/3580  
      🟩 GCC10              Pass: 100%/2   | Total: 58m 34s | Avg: 29m 17s | Max: 31m 03s | Hits:  77%/3580  
      🟩 GCC11              Pass: 100%/2   | Total: 59m 43s | Avg: 29m 51s | Max: 30m 46s | Hits:  77%/3580  
      🟩 GCC12              Pass: 100%/2   | Total:  1h 00m | Avg: 30m 17s | Max: 31m 00s | Hits:  77%/3580  
      🟩 GCC13              Pass: 100%/10  | Total:  3h 18m | Avg: 19m 49s | Max: 31m 04s | Hits:  86%/17900 
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 38m | Avg: 49m 25s | Max: 50m 40s | Hits:  55%/3566  
      🟩 MSVC14.42          Pass: 100%/3   | Total:  2h 27m | Avg: 49m 02s | Max: 59m 22s | Hits:  60%/5349  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  1h 38m | Avg: 49m 06s | Max: 50m 52s | Hits:  64%/3578  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  7h 03m | Avg: 24m 53s | Max: 29m 44s | Hits:  79%/30413 
      🟩 GCC                Pass: 100%/21  | Total:  8h 38m | Avg: 24m 40s | Max: 31m 09s | Hits:  81%/37590 
      🟩 MSVC               Pass: 100%/5   | Total:  4h 05m | Avg: 49m 11s | Max: 59m 22s | Hits:  58%/8915  
      🟩 NVHPC              Pass: 100%/2   | Total:  1h 38m | Avg: 49m 06s | Max: 50m 52s | Hits:  64%/3578  
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 29m 45s | Avg: 14m 52s | Max: 18m 36s | Hits:  88%/3580  
      🟩 rtx2080            Pass: 100%/33  | Total: 17h 19m | Avg: 31m 30s | Max: 56m 55s | Hits:  74%/59033 
      🟩 rtx4090            Pass: 100%/10  | Total:  3h 35m | Avg: 21m 35s | Max: 59m 22s | Hits:  85%/17883 
    🟩 jobs
      🟩 Build              Pass: 100%/38  | Total: 19h 55m | Avg: 31m 28s | Max: 59m 22s | Hits:  74%/67975 
      🟩 TestCPU            Pass: 100%/3   | Total: 45m 55s | Avg: 15m 18s | Max: 30m 50s | Hits:  90%/5362  
      🟩 TestGPU            Pass: 100%/4   | Total: 43m 46s | Avg: 10m 56s | Max: 11m 20s | Hits:  99%/7159  
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 29m 45s | Avg: 14m 52s | Max: 18m 36s | Hits:  88%/3580  
      🟩 90;90a;100         Pass: 100%/1   | Total: 31m 04s | Avg: 31m 04s | Max: 31m 04s | Hits:  77%/1790  
    🟩 std
      🟩 17                 Pass: 100%/20  | Total: 10h 51m | Avg: 32m 35s | Max: 56m 55s | Hits:  73%/35771 
      🟩 20                 Pass: 100%/23  | Total: 10h 00m | Avg: 26m 07s | Max: 59m 22s | Hits:  80%/41145 
    
  • 🟩 libcudacxx: Pass: 100%/43 | Total: 9h 42m | Avg: 13m 32s | Max: 32m 19s | Hits: 73%/102835

    🟩 cpu
      🟩 amd64              Pass: 100%/41  | Total:  9h 15m | Avg: 13m 32s | Max: 32m 19s | Hits:  74%/97194 
      🟩 arm64              Pass: 100%/2   | Total: 27m 27s | Avg: 13m 43s | Max: 21m 29s | Hits:  63%/5641  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  1h 05m | Avg: 13m 01s | Max: 23m 45s | Hits:  75%/13642 
      🟩 12.5               Pass: 100%/2   | Total: 44m 08s | Avg: 22m 04s | Max: 32m 19s | Hits:  58%/5586  
      🟩 12.8               Pass: 100%/36  | Total:  7h 53m | Avg: 13m 08s | Max: 27m 40s | Hits:  74%/83607 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 40m 13s | Avg: 20m 06s | Max: 21m 19s | Hits:  26%/5606  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  1h 05m | Avg: 13m 01s | Max: 23m 45s | Hits:  75%/13642 
      🟩 nvcc12.5           Pass: 100%/2   | Total: 44m 08s | Avg: 22m 04s | Max: 32m 19s | Hits:  58%/5586  
      🟩 nvcc12.8           Pass: 100%/34  | Total:  7h 13m | Avg: 12m 44s | Max: 27m 40s | Hits:  77%/78001 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 40m 13s | Avg: 20m 06s | Max: 21m 19s | Hits:  26%/5606  
      🟩 nvcc               Pass: 100%/41  | Total:  9h 02m | Avg: 13m 13s | Max: 32m 19s | Hits:  76%/97229 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 46m 54s | Avg: 11m 43s | Max: 21m 17s | Hits:  70%/11176 
      🟩 Clang15            Pass: 100%/2   | Total: 37m 31s | Avg: 18m 45s | Max: 22m 47s | Hits:  50%/5598  
      🟩 Clang16            Pass: 100%/2   | Total: 39m 02s | Avg: 19m 31s | Max: 24m 21s | Hits:  51%/5598  
      🟩 Clang17            Pass: 100%/2   | Total: 14m 14s | Avg:  7m 07s | Max:  7m 07s | Hits:  92%/5598  
      🟩 Clang18            Pass: 100%/6   | Total:  1h 10m | Avg: 11m 41s | Max: 21m 19s | Hits:  66%/14024 
      🟩 GCC7               Pass: 100%/2   | Total: 12m 52s | Avg:  6m 26s | Max:  6m 58s | Hits:  89%/5536  
      🟩 GCC8               Pass: 100%/1   | Total:  5m 59s | Avg:  5m 59s | Max:  5m 59s | Hits:  92%/2778  
      🟩 GCC9               Pass: 100%/2   | Total: 28m 42s | Avg: 14m 21s | Max: 17m 03s | Hits:  58%/5548  
      🟩 GCC10              Pass: 100%/2   | Total: 35m 03s | Avg: 17m 31s | Max: 24m 02s | Hits:  55%/5604  
      🟩 GCC11              Pass: 100%/2   | Total: 22m 10s | Avg: 11m 05s | Max: 14m 31s | Hits:  78%/5600  
      🟩 GCC12              Pass: 100%/2   | Total: 14m 31s | Avg:  7m 15s | Max:  7m 22s | Hits:  92%/5600  
      🟩 GCC13              Pass: 100%/10  | Total:  1h 49m | Avg: 10m 56s | Max: 21m 29s | Hits:  80%/14281 
      🟩 MSVC14.29          Pass: 100%/2   | Total: 49m 06s | Avg: 24m 33s | Max: 25m 21s | Hits:  92%/5074  
      🟩 MSVC14.42          Pass: 100%/2   | Total: 52m 55s | Avg: 26m 27s | Max: 27m 40s | Hits:  92%/5234  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 44m 08s | Avg: 22m 04s | Max: 32m 19s | Hits:  58%/5586  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/16  | Total:  3h 27m | Avg: 12m 59s | Max: 24m 21s | Hits:  66%/41994 
      🟩 GCC                Pass: 100%/21  | Total:  3h 48m | Avg: 10m 53s | Max: 24m 02s | Hits:  77%/44947 
      🟩 MSVC               Pass: 100%/4   | Total:  1h 42m | Avg: 25m 30s | Max: 27m 40s | Hits:  92%/10308 
      🟩 NVHPC              Pass: 100%/2   | Total: 44m 08s | Avg: 22m 04s | Max: 32m 19s | Hits:  58%/5586  
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 17m 33s | Avg:  8m 46s | Max: 11m 57s | Hits:  92%/2910  
      🟩 rtx2080            Pass: 100%/41  | Total:  9h 25m | Avg: 13m 46s | Max: 32m 19s | Hits:  73%/99925 
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total:  8h 33m | Avg: 13m 52s | Max: 32m 19s | Hits:  73%/102795
      🟩 NVRTC              Pass: 100%/2   | Total: 31m 18s | Avg: 15m 39s | Max: 16m 08s | Hits:  90%/40    
      🟩 Test               Pass: 100%/3   | Total: 35m 43s | Avg: 11m 54s | Max: 14m 18s
      🟩 VerifyCodegen      Pass: 100%/1   | Total:  2m 24s | Avg:  2m 24s | Max:  2m 24s
    🟩 sm
      🟩 75                 Pass: 100%/2   | Total: 31m 18s | Avg: 15m 39s | Max: 16m 08s | Hits:  90%/40    
      🟩 90                 Pass: 100%/2   | Total: 17m 33s | Avg:  8m 46s | Max: 11m 57s | Hits:  92%/2910  
      🟩 90;90a;100         Pass: 100%/1   | Total:  7m 59s | Avg:  7m 59s | Max:  7m 59s | Hits:  89%/2910  
    🟩 std
      🟩 17                 Pass: 100%/21  | Total:  4h 40m | Avg: 13m 22s | Max: 25m 21s | Hits:  77%/54828 
      🟩 20                 Pass: 100%/21  | Total:  4h 59m | Avg: 14m 15s | Max: 32m 19s | Hits:  69%/48007 
    
  • 🟩 cudax: Pass: 100%/22 | Total: 2h 11m | Avg: 5m 57s | Max: 14m 06s | Hits: 96%/11244

    🟩 cpu
      🟩 amd64              Pass: 100%/18  | Total:  1h 56m | Avg:  6m 28s | Max: 14m 06s | Hits:  96%/9020  
      🟩 arm64              Pass: 100%/4   | Total: 14m 41s | Avg:  3m 40s | Max:  3m 50s | Hits:  98%/2224  
    🟩 ctk
      🟩 12.0               Pass: 100%/1   | Total: 11m 24s | Avg: 11m 24s | Max: 11m 24s | Hits:  60%/262   
      🟩 12.5               Pass: 100%/2   | Total: 11m 47s | Avg:  5m 53s | Max:  5m 54s | Hits:  95%/708   
      🟩 12.8               Pass: 100%/19  | Total:  1h 48m | Avg:  5m 41s | Max: 14m 06s | Hits:  97%/10274 
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/1   | Total: 11m 24s | Avg: 11m 24s | Max: 11m 24s | Hits:  60%/262   
      🟩 nvcc12.5           Pass: 100%/2   | Total: 11m 47s | Avg:  5m 53s | Max:  5m 54s | Hits:  95%/708   
      🟩 nvcc12.8           Pass: 100%/19  | Total:  1h 48m | Avg:  5m 41s | Max: 14m 06s | Hits:  97%/10274 
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/22  | Total:  2h 11m | Avg:  5m 57s | Max: 14m 06s | Hits:  96%/11244 
    🟩 cxx
      🟩 Clang14            Pass: 100%/1   | Total:  4m 08s | Avg:  4m 08s | Max:  4m 08s | Hits:  98%/558   
      🟩 Clang15            Pass: 100%/1   | Total:  4m 09s | Avg:  4m 09s | Max:  4m 09s | Hits:  98%/556   
      🟩 Clang16            Pass: 100%/1   | Total:  4m 08s | Avg:  4m 08s | Max:  4m 08s | Hits:  98%/556   
      🟩 Clang17            Pass: 100%/1   | Total:  3m 57s | Avg:  3m 57s | Max:  3m 57s | Hits:  98%/556   
      🟩 Clang18            Pass: 100%/4   | Total: 25m 26s | Avg:  6m 21s | Max: 14m 06s | Hits:  98%/2224  
      🟩 GCC10              Pass: 100%/1   | Total:  3m 53s | Avg:  3m 53s | Max:  3m 53s | Hits:  98%/558   
      🟩 GCC11              Pass: 100%/1   | Total:  4m 15s | Avg:  4m 15s | Max:  4m 15s | Hits:  98%/556   
      🟩 GCC12              Pass: 100%/2   | Total: 16m 54s | Avg:  8m 27s | Max: 12m 31s | Hits:  98%/1112  
      🟩 GCC13              Pass: 100%/6   | Total: 31m 49s | Avg:  5m 18s | Max: 13m 51s | Hits:  98%/3336  
      🟩 MSVC14.39          Pass: 100%/1   | Total: 11m 24s | Avg: 11m 24s | Max: 11m 24s | Hits:  60%/262   
      🟩 MSVC14.42          Pass: 100%/1   | Total:  9m 21s | Avg:  9m 21s | Max:  9m 21s | Hits:  60%/262   
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 11m 47s | Avg:  5m 53s | Max:  5m 54s | Hits:  95%/708   
    🟩 cxx_family
      🟩 Clang              Pass: 100%/8   | Total: 41m 48s | Avg:  5m 13s | Max: 14m 06s | Hits:  98%/4450  
      🟩 GCC                Pass: 100%/10  | Total: 56m 51s | Avg:  5m 41s | Max: 13m 51s | Hits:  98%/5562  
      🟩 MSVC               Pass: 100%/2   | Total: 20m 45s | Avg: 10m 22s | Max: 11m 24s | Hits:  60%/524   
      🟩 NVHPC              Pass: 100%/2   | Total: 11m 47s | Avg:  5m 53s | Max:  5m 54s | Hits:  95%/708   
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 17m 32s | Avg:  8m 46s | Max: 13m 51s | Hits:  98%/1112  
      🟩 rtx2080            Pass: 100%/20  | Total:  1h 53m | Avg:  5m 40s | Max: 14m 06s | Hits:  96%/10132 
    🟩 jobs
      🟩 Build              Pass: 100%/19  | Total:  1h 30m | Avg:  4m 46s | Max: 11m 24s | Hits:  96%/9576  
      🟩 Test               Pass: 100%/3   | Total: 40m 28s | Avg: 13m 29s | Max: 14m 06s | Hits:  99%/1668  
    🟩 sm
      🟩 90                 Pass: 100%/3   | Total: 20m 56s | Avg:  6m 58s | Max: 13m 51s | Hits:  98%/1668  
      🟩 90a                Pass: 100%/1   | Total:  3m 18s | Avg:  3m 18s | Max:  3m 18s | Hits:  98%/556   
    🟩 std
      🟩 17                 Pass: 100%/4   | Total: 16m 42s | Avg:  4m 10s | Max:  5m 54s | Hits:  97%/2022  
      🟩 20                 Pass: 100%/18  | Total:  1h 54m | Avg:  6m 21s | Max: 14m 06s | Hits:  96%/9222  
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 13m 18s | Avg: 6m 39s | Max: 10m 47s | Hits: 97%/296

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 13m 18s | Avg:  6m 39s | Max: 10m 47s | Hits:  97%/296   
    🟩 ctk
      🟩 12.8               Pass: 100%/2   | Total: 13m 18s | Avg:  6m 39s | Max: 10m 47s | Hits:  97%/296   
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/2   | Total: 13m 18s | Avg:  6m 39s | Max: 10m 47s | Hits:  97%/296   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 13m 18s | Avg:  6m 39s | Max: 10m 47s | Hits:  97%/296   
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 13m 18s | Avg:  6m 39s | Max: 10m 47s | Hits:  97%/296   
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 13m 18s | Avg:  6m 39s | Max: 10m 47s | Hits:  97%/296   
    🟩 gpu
      🟩 rtx2080            Pass: 100%/2   | Total: 13m 18s | Avg:  6m 39s | Max: 10m 47s | Hits:  97%/296   
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 31s | Avg:  2m 31s | Max:  2m 31s | Hits:  97%/148   
      🟩 Test               Pass: 100%/1   | Total: 10m 47s | Avg: 10m 47s | Max: 10m 47s | Hits:  98%/148   
    
  • 🟩 python: Pass: 100%/1 | Total: 30m 04s | Avg: 30m 04s | Max: 30m 04s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 30m 04s | Avg: 30m 04s | Max: 30m 04s
    🟩 ctk
      🟩 12.8               Pass: 100%/1   | Total: 30m 04s | Avg: 30m 04s | Max: 30m 04s
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/1   | Total: 30m 04s | Avg: 30m 04s | Max: 30m 04s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 30m 04s | Avg: 30m 04s | Max: 30m 04s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 30m 04s | Avg: 30m 04s | Max: 30m 04s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 30m 04s | Avg: 30m 04s | Max: 30m 04s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/1   | Total: 30m 04s | Avg: 30m 04s | Max: 30m 04s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 30m 04s | Avg: 30m 04s | Max: 30m 04s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
+/- libcu++
CUB
Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
+/- libcu++
+/- CUB
+/- Thrust
+/- CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 158)

# Runner
111 linux-amd64-cpu16
15 windows-amd64-cpu16
10 linux-arm64-cpu16
8 linux-amd64-gpu-rtx2080-latest-1
6 linux-amd64-gpu-rtxa6000-latest-1
5 linux-amd64-gpu-h100-latest-1
3 linux-amd64-gpu-rtx4090-latest-1

@fbusato
Copy link
Contributor Author

fbusato commented Feb 20, 2025

I just realized that we defined the function in the wrong way from the beginning compared to the specification. We defined rotl( T x, unsigned s) instead of rotl( T x, int s ) 🐛

@fbusato fbusato requested a review from miscco February 20, 2025 23:21
Copy link
Contributor

🟩 CI finished in 1h 32m: Pass: 100%/158 | Total: 3d 01h | Avg: 27m 52s | Max: 1h 20m | Hits: 76%/248632
  • 🟩 cub: Pass: 100%/45 | Total: 1d 17h | Avg: 55m 15s | Max: 1h 20m | Hits: 47%/53761

    🟩 cpu
      🟩 amd64              Pass: 100%/43  | Total:  1d 15h | Avg: 54m 54s | Max:  1h 20m | Hits:  47%/51319 
      🟩 arm64              Pass: 100%/2   | Total:  2h 05m | Avg:  1h 02m | Max:  1h 05m | Hits:  38%/2442  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  5h 02m | Avg:  1h 00m | Max:  1h 04m | Hits:  33%/5939  
      🟩 12.5               Pass: 100%/2   | Total:  2h 20m | Avg:  1h 10m | Max:  1h 10m | Hits:  35%/2260  
      🟩 12.8               Pass: 100%/38  | Total:  1d 10h | Avg: 53m 46s | Max:  1h 20m | Hits:  49%/45562 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  2h 00m | Avg:  1h 00m | Max:  1h 00m | Hits:  39%/2114  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  5h 02m | Avg:  1h 00m | Max:  1h 04m | Hits:  33%/5939  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 20m | Avg:  1h 10m | Max:  1h 10m | Hits:  35%/2260  
      🟩 nvcc12.8           Pass: 100%/36  | Total:  1d 08h | Avg: 53m 24s | Max:  1h 20m | Hits:  50%/43448 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  2h 00m | Avg:  1h 00m | Max:  1h 00m | Hits:  39%/2114  
      🟩 nvcc               Pass: 100%/43  | Total:  1d 15h | Avg: 55m 01s | Max:  1h 20m | Hits:  47%/51647 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  3h 58m | Avg: 59m 30s | Max:  1h 01m | Hits:  38%/4892  
      🟩 Clang15            Pass: 100%/2   | Total:  1h 59m | Avg: 59m 55s | Max:  1h 02m | Hits:  38%/2442  
      🟩 Clang16            Pass: 100%/2   | Total:  1h 58m | Avg: 59m 28s | Max:  1h 02m | Hits:  38%/2442  
      🟩 Clang17            Pass: 100%/2   | Total:  1h 59m | Avg: 59m 52s | Max:  1h 03m | Hits:  38%/2442  
      🟩 Clang18            Pass: 100%/7   | Total:  5h 48m | Avg: 49m 45s | Max:  1h 03m | Hits:  57%/8219  
      🟩 GCC7               Pass: 100%/2   | Total:  2h 05m | Avg:  1h 02m | Max:  1h 04m | Hits:  38%/2446  
      🟩 GCC8               Pass: 100%/1   | Total:  1h 05m | Avg:  1h 05m | Max:  1h 05m | Hits:  38%/1223  
      🟩 GCC9               Pass: 100%/2   | Total:  1h 59m | Avg: 59m 59s | Max:  1h 01m | Hits:  38%/2446  
      🟩 GCC10              Pass: 100%/2   | Total:  2h 09m | Avg:  1h 04m | Max:  1h 05m | Hits:  38%/2446  
      🟩 GCC11              Pass: 100%/2   | Total:  1h 59m | Avg: 59m 39s | Max:  1h 02m | Hits:  38%/2442  
      🟩 GCC12              Pass: 100%/2   | Total:  2h 13m | Avg:  1h 06m | Max:  1h 10m | Hits:  38%/2442  
      🟩 GCC13              Pass: 100%/11  | Total:  6h 52m | Avg: 37m 29s | Max:  1h 08m | Hits:  71%/13431 
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 20m | Avg:  1h 10m | Max:  1h 16m | Hits:  12%/2094  
      🟩 MSVC14.42          Pass: 100%/2   | Total:  2h 34m | Avg:  1h 17m | Max:  1h 20m | Hits:  12%/2094  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 20m | Avg:  1h 10m | Max:  1h 10m | Hits:  35%/2260  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total: 15h 44m | Avg: 55m 34s | Max:  1h 03m | Hits:  45%/20437 
      🟩 GCC                Pass: 100%/22  | Total: 18h 25m | Avg: 50m 14s | Max:  1h 10m | Hits:  54%/26876 
      🟩 MSVC               Pass: 100%/4   | Total:  4h 55m | Avg:  1h 13m | Max:  1h 20m | Hits:  12%/4188  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 20m | Avg:  1h 10m | Max:  1h 10m | Hits:  35%/2260  
    🟩 gpu
      🟩 h100               Pass: 100%/3   | Total:  1h 14m | Avg: 24m 40s | Max: 28m 19s | Hits:  79%/3663  
      🟩 rtx2080            Pass: 100%/34  | Total:  1d 11h | Avg:  1h 03m | Max:  1h 20m | Hits:  35%/40330 
      🟩 rtxa6000           Pass: 100%/8   | Total:  4h 15m | Avg: 31m 54s | Max:  1h 03m | Hits:  84%/9768  
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total:  1d 14h | Avg:  1h 02m | Max:  1h 20m | Hits:  35%/43993 
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 22m 56s | Avg: 22m 56s | Max: 22m 56s | Hits:  99%/1221  
      🟩 GraphCapture       Pass: 100%/1   | Total: 16m 05s | Avg: 16m 05s | Max: 16m 05s | Hits:  99%/1221  
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 11m | Avg: 23m 53s | Max: 24m 30s | Hits:  99%/3663  
      🟩 TestGPU            Pass: 100%/3   | Total:  1h 05m | Avg: 21m 43s | Max: 22m 26s | Hits:  99%/3663  
    🟩 sm
      🟩 90                 Pass: 100%/3   | Total:  1h 14m | Avg: 24m 40s | Max: 28m 19s | Hits:  79%/3663  
      🟩 90;90a;100         Pass: 100%/1   | Total:  1h 08m | Avg:  1h 08m | Max:  1h 08m | Hits:  38%/1221  
    🟩 std
      🟩 17                 Pass: 100%/20  | Total: 20h 55m | Avg:  1h 02m | Max:  1h 16m | Hits:  34%/23659 
      🟩 20                 Pass: 100%/25  | Total: 20h 31m | Avg: 49m 15s | Max:  1h 20m | Hits:  57%/30102 
    
  • 🟩 thrust: Pass: 100%/45 | Total: 21h 27m | Avg: 28m 36s | Max: 54m 52s | Hits: 77%/80496

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 35m 02s | Avg: 17m 31s | Max: 23m 47s | Hits:  88%/3580  
    🟩 cpu
      🟩 amd64              Pass: 100%/43  | Total: 20h 34m | Avg: 28m 43s | Max: 54m 52s | Hits:  77%/76917 
      🟩 arm64              Pass: 100%/2   | Total: 52m 41s | Avg: 26m 20s | Max: 28m 12s | Hits:  77%/3579  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  2h 45m | Avg: 33m 05s | Max: 53m 47s | Hits:  72%/8941  
      🟩 12.5               Pass: 100%/2   | Total:  1h 40m | Avg: 50m 01s | Max: 51m 12s | Hits:  64%/3578  
      🟩 12.8               Pass: 100%/38  | Total: 17h 02m | Avg: 26m 53s | Max: 54m 52s | Hits:  78%/67977 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 47m 35s | Avg: 23m 47s | Max: 25m 21s | Hits:  77%/3578  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  2h 45m | Avg: 33m 05s | Max: 53m 47s | Hits:  72%/8941  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  1h 40m | Avg: 50m 01s | Max: 51m 12s | Hits:  64%/3578  
      🟩 nvcc12.8           Pass: 100%/36  | Total: 16h 14m | Avg: 27m 04s | Max: 54m 52s | Hits:  78%/64399 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 47m 35s | Avg: 23m 47s | Max: 25m 21s | Hits:  77%/3578  
      🟩 nvcc               Pass: 100%/43  | Total: 20h 39m | Avg: 28m 50s | Max: 54m 52s | Hits:  77%/76918 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  1h 53m | Avg: 28m 16s | Max: 29m 52s | Hits:  77%/7156  
      🟩 Clang15            Pass: 100%/2   | Total: 53m 03s | Avg: 26m 31s | Max: 26m 53s | Hits:  77%/3578  
      🟩 Clang16            Pass: 100%/2   | Total: 57m 06s | Avg: 28m 33s | Max: 28m 51s | Hits:  77%/3578  
      🟩 Clang17            Pass: 100%/2   | Total: 57m 57s | Avg: 28m 58s | Max: 29m 02s | Hits:  77%/3578  
      🟩 Clang18            Pass: 100%/7   | Total:  2h 26m | Avg: 20m 54s | Max: 28m 43s | Hits:  83%/12523 
      🟩 GCC7               Pass: 100%/2   | Total: 55m 44s | Avg: 27m 52s | Max: 28m 21s | Hits:  77%/3580  
      🟩 GCC8               Pass: 100%/1   | Total: 28m 37s | Avg: 28m 37s | Max: 28m 37s | Hits:  77%/1790  
      🟩 GCC9               Pass: 100%/2   | Total: 54m 05s | Avg: 27m 02s | Max: 27m 08s | Hits:  77%/3580  
      🟩 GCC10              Pass: 100%/2   | Total: 56m 47s | Avg: 28m 23s | Max: 29m 05s | Hits:  77%/3580  
      🟩 GCC11              Pass: 100%/2   | Total: 57m 47s | Avg: 28m 53s | Max: 30m 14s | Hits:  77%/3580  
      🟩 GCC12              Pass: 100%/2   | Total: 59m 23s | Avg: 29m 41s | Max: 31m 32s | Hits:  77%/3580  
      🟩 GCC13              Pass: 100%/10  | Total:  3h 27m | Avg: 20m 43s | Max: 32m 46s | Hits:  86%/17900 
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 42m | Avg: 51m 10s | Max: 53m 47s | Hits:  55%/3566  
      🟩 MSVC14.42          Pass: 100%/3   | Total:  2h 17m | Avg: 45m 59s | Max: 54m 52s | Hits:  60%/5349  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  1h 40m | Avg: 50m 01s | Max: 51m 12s | Hits:  64%/3578  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  7h 07m | Avg: 25m 08s | Max: 29m 52s | Hits:  79%/30413 
      🟩 GCC                Pass: 100%/21  | Total:  8h 39m | Avg: 24m 44s | Max: 32m 46s | Hits:  81%/37590 
      🟩 MSVC               Pass: 100%/5   | Total:  4h 00m | Avg: 48m 03s | Max: 54m 52s | Hits:  58%/8915  
      🟩 NVHPC              Pass: 100%/2   | Total:  1h 40m | Avg: 50m 01s | Max: 51m 12s | Hits:  64%/3578  
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 28m 44s | Avg: 14m 22s | Max: 18m 01s | Hits:  88%/3580  
      🟩 rtx2080            Pass: 100%/33  | Total: 17h 21m | Avg: 31m 34s | Max: 53m 47s | Hits:  74%/59033 
      🟩 rtx4090            Pass: 100%/10  | Total:  3h 36m | Avg: 21m 41s | Max: 54m 52s | Hits:  85%/17883 
    🟩 jobs
      🟩 Build              Pass: 100%/38  | Total: 19h 58m | Avg: 31m 31s | Max: 54m 52s | Hits:  74%/67975 
      🟩 TestCPU            Pass: 100%/3   | Total: 46m 00s | Avg: 15m 20s | Max: 29m 54s | Hits:  90%/5362  
      🟩 TestGPU            Pass: 100%/4   | Total: 43m 22s | Avg: 10m 50s | Max: 11m 16s | Hits:  99%/7159  
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 28m 44s | Avg: 14m 22s | Max: 18m 01s | Hits:  88%/3580  
      🟩 90;90a;100         Pass: 100%/1   | Total: 32m 46s | Avg: 32m 46s | Max: 32m 46s | Hits:  77%/1790  
    🟩 std
      🟩 17                 Pass: 100%/20  | Total: 10h 51m | Avg: 32m 35s | Max: 53m 47s | Hits:  73%/35771 
      🟩 20                 Pass: 100%/23  | Total: 10h 00m | Avg: 26m 07s | Max: 54m 52s | Hits:  80%/41145 
    
  • 🟩 libcudacxx: Pass: 100%/43 | Total: 7h 35m | Avg: 10m 35s | Max: 27m 30s | Hits: 88%/102835

    🟩 cpu
      🟩 amd64              Pass: 100%/41  | Total:  7h 23m | Avg: 10m 49s | Max: 27m 30s | Hits:  87%/97194 
      🟩 arm64              Pass: 100%/2   | Total: 11m 55s | Avg:  5m 57s | Max:  6m 06s | Hits:  92%/5641  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total: 55m 49s | Avg: 11m 09s | Max: 23m 41s | Hits:  86%/13642 
      🟩 12.5               Pass: 100%/2   | Total: 26m 18s | Avg: 13m 09s | Max: 13m 22s | Hits:  90%/5586  
      🟩 12.8               Pass: 100%/36  | Total:  6h 13m | Avg: 10m 22s | Max: 27m 30s | Hits:  88%/83607 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 42m 22s | Avg: 21m 11s | Max: 23m 04s | Hits:  26%/5606  
      🟩 nvcc12.0           Pass: 100%/5   | Total: 55m 49s | Avg: 11m 09s | Max: 23m 41s | Hits:  86%/13642 
      🟩 nvcc12.5           Pass: 100%/2   | Total: 26m 18s | Avg: 13m 09s | Max: 13m 22s | Hits:  90%/5586  
      🟩 nvcc12.8           Pass: 100%/34  | Total:  5h 30m | Avg:  9m 44s | Max: 27m 30s | Hits:  92%/78001 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 42m 22s | Avg: 21m 11s | Max: 23m 04s | Hits:  26%/5606  
      🟩 nvcc               Pass: 100%/41  | Total:  6h 53m | Avg: 10m 04s | Max: 27m 30s | Hits:  91%/97229 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 34m 48s | Avg:  8m 42s | Max: 12m 34s | Hits:  84%/11176 
      🟩 Clang15            Pass: 100%/2   | Total: 15m 15s | Avg:  7m 37s | Max:  7m 52s | Hits:  92%/5598  
      🟩 Clang16            Pass: 100%/2   | Total: 15m 21s | Avg:  7m 40s | Max:  7m 45s | Hits:  92%/5598  
      🟩 Clang17            Pass: 100%/2   | Total: 14m 46s | Avg:  7m 23s | Max:  7m 31s | Hits:  92%/5598  
      🟩 Clang18            Pass: 100%/6   | Total:  1h 21m | Avg: 13m 38s | Max: 23m 04s | Hits:  66%/14024 
      🟩 GCC7               Pass: 100%/2   | Total: 11m 45s | Avg:  5m 52s | Max:  6m 15s | Hits:  92%/5536  
      🟩 GCC8               Pass: 100%/1   | Total:  6m 13s | Avg:  6m 13s | Max:  6m 13s | Hits:  92%/2778  
      🟩 GCC9               Pass: 100%/2   | Total: 13m 29s | Avg:  6m 44s | Max:  6m 55s | Hits:  92%/5548  
      🟩 GCC10              Pass: 100%/2   | Total: 13m 41s | Avg:  6m 50s | Max:  6m 57s | Hits:  92%/5604  
      🟩 GCC11              Pass: 100%/2   | Total: 13m 42s | Avg:  6m 51s | Max:  7m 15s | Hits:  92%/5600  
      🟩 GCC12              Pass: 100%/2   | Total: 15m 13s | Avg:  7m 36s | Max:  7m 42s | Hits:  91%/5600  
      🟩 GCC13              Pass: 100%/10  | Total:  1h 32m | Avg:  9m 15s | Max: 16m 15s | Hits:  92%/14281 
      🟩 MSVC14.29          Pass: 100%/2   | Total: 48m 35s | Avg: 24m 17s | Max: 24m 54s | Hits:  92%/5074  
      🟩 MSVC14.42          Pass: 100%/2   | Total: 51m 53s | Avg: 25m 56s | Max: 27m 30s | Hits:  92%/5234  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 26m 18s | Avg: 13m 09s | Max: 13m 22s | Hits:  90%/5586  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/16  | Total:  2h 41m | Avg: 10m 07s | Max: 23m 04s | Hits:  81%/41994 
      🟩 GCC                Pass: 100%/21  | Total:  2h 46m | Avg:  7m 56s | Max: 16m 15s | Hits:  92%/44947 
      🟩 MSVC               Pass: 100%/4   | Total:  1h 40m | Avg: 25m 07s | Max: 27m 30s | Hits:  92%/10308 
      🟩 NVHPC              Pass: 100%/2   | Total: 26m 18s | Avg: 13m 09s | Max: 13m 22s | Hits:  90%/5586  
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 18m 52s | Avg:  9m 26s | Max: 13m 20s | Hits:  92%/2910  
      🟩 rtx2080            Pass: 100%/41  | Total:  7h 16m | Avg: 10m 38s | Max: 27m 30s | Hits:  87%/99925 
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total:  6h 17m | Avg: 10m 12s | Max: 27m 30s | Hits:  88%/102795
      🟩 NVRTC              Pass: 100%/2   | Total: 31m 07s | Avg: 15m 33s | Max: 16m 15s | Hits:  90%/40    
      🟩 Test               Pass: 100%/3   | Total: 44m 21s | Avg: 14m 47s | Max: 17m 48s
      🟩 VerifyCodegen      Pass: 100%/1   | Total:  2m 08s | Avg:  2m 08s | Max:  2m 08s
    🟩 sm
      🟩 75                 Pass: 100%/2   | Total: 31m 07s | Avg: 15m 33s | Max: 16m 15s | Hits:  90%/40    
      🟩 90                 Pass: 100%/2   | Total: 18m 52s | Avg:  9m 26s | Max: 13m 20s | Hits:  92%/2910  
      🟩 90;90a;100         Pass: 100%/1   | Total:  7m 14s | Avg:  7m 14s | Max:  7m 14s | Hits:  92%/2910  
    🟩 std
      🟩 17                 Pass: 100%/21  | Total:  3h 49m | Avg: 10m 56s | Max: 24m 54s | Hits:  87%/54828 
      🟩 20                 Pass: 100%/21  | Total:  3h 43m | Avg: 10m 38s | Max: 27m 30s | Hits:  88%/48007 
    
  • 🟩 cudax: Pass: 100%/22 | Total: 2h 10m | Avg: 5m 55s | Max: 14m 11s | Hits: 96%/11244

    🟩 cpu
      🟩 amd64              Pass: 100%/18  | Total:  1h 55m | Avg:  6m 25s | Max: 14m 11s | Hits:  96%/9020  
      🟩 arm64              Pass: 100%/4   | Total: 14m 35s | Avg:  3m 38s | Max:  3m 47s | Hits:  98%/2224  
    🟩 ctk
      🟩 12.0               Pass: 100%/1   | Total: 10m 45s | Avg: 10m 45s | Max: 10m 45s | Hits:  60%/262   
      🟩 12.5               Pass: 100%/2   | Total: 13m 15s | Avg:  6m 37s | Max:  6m 49s | Hits:  95%/708   
      🟩 12.8               Pass: 100%/19  | Total:  1h 46m | Avg:  5m 35s | Max: 14m 11s | Hits:  97%/10274 
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/1   | Total: 10m 45s | Avg: 10m 45s | Max: 10m 45s | Hits:  60%/262   
      🟩 nvcc12.5           Pass: 100%/2   | Total: 13m 15s | Avg:  6m 37s | Max:  6m 49s | Hits:  95%/708   
      🟩 nvcc12.8           Pass: 100%/19  | Total:  1h 46m | Avg:  5m 35s | Max: 14m 11s | Hits:  97%/10274 
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/22  | Total:  2h 10m | Avg:  5m 55s | Max: 14m 11s | Hits:  96%/11244 
    🟩 cxx
      🟩 Clang14            Pass: 100%/1   | Total:  4m 08s | Avg:  4m 08s | Max:  4m 08s | Hits:  98%/558   
      🟩 Clang15            Pass: 100%/1   | Total:  4m 01s | Avg:  4m 01s | Max:  4m 01s | Hits:  98%/556   
      🟩 Clang16            Pass: 100%/1   | Total:  4m 16s | Avg:  4m 16s | Max:  4m 16s | Hits:  98%/556   
      🟩 Clang17            Pass: 100%/1   | Total:  3m 58s | Avg:  3m 58s | Max:  3m 58s | Hits:  98%/556   
      🟩 Clang18            Pass: 100%/4   | Total: 23m 03s | Avg:  5m 45s | Max: 11m 53s | Hits:  98%/2224  
      🟩 GCC10              Pass: 100%/1   | Total:  4m 10s | Avg:  4m 10s | Max:  4m 10s | Hits:  98%/558   
      🟩 GCC11              Pass: 100%/1   | Total:  4m 07s | Avg:  4m 07s | Max:  4m 07s | Hits:  98%/556   
      🟩 GCC12              Pass: 100%/2   | Total: 16m 12s | Avg:  8m 06s | Max: 12m 15s | Hits:  98%/1112  
      🟩 GCC13              Pass: 100%/6   | Total: 31m 54s | Avg:  5m 19s | Max: 14m 11s | Hits:  98%/3336  
      🟩 MSVC14.39          Pass: 100%/1   | Total: 10m 45s | Avg: 10m 45s | Max: 10m 45s | Hits:  60%/262   
      🟩 MSVC14.42          Pass: 100%/1   | Total: 10m 22s | Avg: 10m 22s | Max: 10m 22s | Hits:  60%/262   
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 13m 15s | Avg:  6m 37s | Max:  6m 49s | Hits:  95%/708   
    🟩 cxx_family
      🟩 Clang              Pass: 100%/8   | Total: 39m 26s | Avg:  4m 55s | Max: 11m 53s | Hits:  98%/4450  
      🟩 GCC                Pass: 100%/10  | Total: 56m 23s | Avg:  5m 38s | Max: 14m 11s | Hits:  98%/5562  
      🟩 MSVC               Pass: 100%/2   | Total: 21m 07s | Avg: 10m 33s | Max: 10m 45s | Hits:  60%/524   
      🟩 NVHPC              Pass: 100%/2   | Total: 13m 15s | Avg:  6m 37s | Max:  6m 49s | Hits:  95%/708   
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 17m 33s | Avg:  8m 46s | Max: 14m 11s | Hits:  98%/1112  
      🟩 rtx2080            Pass: 100%/20  | Total:  1h 52m | Avg:  5m 37s | Max: 12m 15s | Hits:  96%/10132 
    🟩 jobs
      🟩 Build              Pass: 100%/19  | Total:  1h 31m | Avg:  4m 50s | Max: 10m 45s | Hits:  96%/9576  
      🟩 Test               Pass: 100%/3   | Total: 38m 19s | Avg: 12m 46s | Max: 14m 11s | Hits:  99%/1668  
    🟩 sm
      🟩 90                 Pass: 100%/3   | Total: 20m 55s | Avg:  6m 58s | Max: 14m 11s | Hits:  98%/1668  
      🟩 90a                Pass: 100%/1   | Total:  3m 29s | Avg:  3m 29s | Max:  3m 29s | Hits:  98%/556   
    🟩 std
      🟩 17                 Pass: 100%/4   | Total: 17m 08s | Avg:  4m 17s | Max:  6m 26s | Hits:  97%/2022  
      🟩 20                 Pass: 100%/18  | Total:  1h 53m | Avg:  6m 16s | Max: 14m 11s | Hits:  96%/9222  
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 13m 17s | Avg: 6m 38s | Max: 10m 43s | Hits: 97%/296

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 13m 17s | Avg:  6m 38s | Max: 10m 43s | Hits:  97%/296   
    🟩 ctk
      🟩 12.8               Pass: 100%/2   | Total: 13m 17s | Avg:  6m 38s | Max: 10m 43s | Hits:  97%/296   
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/2   | Total: 13m 17s | Avg:  6m 38s | Max: 10m 43s | Hits:  97%/296   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 13m 17s | Avg:  6m 38s | Max: 10m 43s | Hits:  97%/296   
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 13m 17s | Avg:  6m 38s | Max: 10m 43s | Hits:  97%/296   
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 13m 17s | Avg:  6m 38s | Max: 10m 43s | Hits:  97%/296   
    🟩 gpu
      🟩 rtx2080            Pass: 100%/2   | Total: 13m 17s | Avg:  6m 38s | Max: 10m 43s | Hits:  97%/296   
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 34s | Avg:  2m 34s | Max:  2m 34s | Hits:  97%/148   
      🟩 Test               Pass: 100%/1   | Total: 10m 43s | Avg: 10m 43s | Max: 10m 43s | Hits:  98%/148   
    
  • 🟩 python: Pass: 100%/1 | Total: 31m 31s | Avg: 31m 31s | Max: 31m 31s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 31m 31s | Avg: 31m 31s | Max: 31m 31s
    🟩 ctk
      🟩 12.8               Pass: 100%/1   | Total: 31m 31s | Avg: 31m 31s | Max: 31m 31s
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/1   | Total: 31m 31s | Avg: 31m 31s | Max: 31m 31s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 31m 31s | Avg: 31m 31s | Max: 31m 31s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 31m 31s | Avg: 31m 31s | Max: 31m 31s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 31m 31s | Avg: 31m 31s | Max: 31m 31s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/1   | Total: 31m 31s | Avg: 31m 31s | Max: 31m 31s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 31m 31s | Avg: 31m 31s | Max: 31m 31s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
+/- libcu++
CUB
Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
+/- libcu++
+/- CUB
+/- Thrust
+/- CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 158)

# Runner
111 linux-amd64-cpu16
15 windows-amd64-cpu16
10 linux-arm64-cpu16
8 linux-amd64-gpu-rtx2080-latest-1
6 linux-amd64-gpu-rtxa6000-latest-1
5 linux-amd64-gpu-h100-latest-1
3 linux-amd64-gpu-rtx4090-latest-1

@fbusato fbusato added the blocked This PR cannot be merged due to various reasons label Feb 24, 2025
@fbusato fbusato enabled auto-merge (squash) February 24, 2025 18:47
@fbusato fbusato removed the blocked This PR cannot be merged due to various reasons label Feb 25, 2025
@fbusato fbusato requested a review from miscco March 3, 2025 16:54
Copy link
Contributor

github-actions bot commented Mar 3, 2025

🟩 CI finished in 1h 59m: Pass: 100%/158 | Total: 3d 06h | Avg: 29m 39s | Max: 1h 22m | Hits: 70%/249644
  • 🟩 cub: Pass: 100%/45 | Total: 1d 18h | Avg: 56m 24s | Max: 1h 22m | Hits: 46%/53614

    🟩 cpu
      🟩 amd64              Pass: 100%/43  | Total:  1d 16h | Avg: 56m 05s | Max:  1h 22m | Hits:  46%/51178 
      🟩 arm64              Pass: 100%/2   | Total:  2h 06m | Avg:  1h 03m | Max:  1h 05m | Hits:  37%/2436  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  5h 19m | Avg:  1h 03m | Max:  1h 10m | Hits:  32%/5922  
      🟩 12.5               Pass: 100%/2   | Total:  2h 22m | Avg:  1h 11m | Max:  1h 12m | Hits:  34%/2254  
      🟩 12.8               Pass: 100%/38  | Total:  1d 10h | Avg: 54m 38s | Max:  1h 22m | Hits:  48%/45438 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  2h 03m | Avg:  1h 01m | Max:  1h 02m | Hits:  38%/2104  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  5h 19m | Avg:  1h 03m | Max:  1h 10m | Hits:  32%/5922  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 22m | Avg:  1h 11m | Max:  1h 12m | Hits:  34%/2254  
      🟩 nvcc12.8           Pass: 100%/36  | Total:  1d 08h | Avg: 54m 15s | Max:  1h 22m | Hits:  49%/43334 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  2h 03m | Avg:  1h 01m | Max:  1h 02m | Hits:  38%/2104  
      🟩 nvcc               Pass: 100%/43  | Total:  1d 16h | Avg: 56m 10s | Max:  1h 22m | Hits:  46%/51510 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  4h 02m | Avg:  1h 00m | Max:  1h 02m | Hits:  37%/4880  
      🟩 Clang15            Pass: 100%/2   | Total:  2h 00m | Avg:  1h 00m | Max:  1h 01m | Hits:  37%/2436  
      🟩 Clang16            Pass: 100%/2   | Total:  2h 00m | Avg:  1h 00m | Max:  1h 01m | Hits:  37%/2436  
      🟩 Clang17            Pass: 100%/2   | Total:  2h 02m | Avg:  1h 01m | Max:  1h 01m | Hits:  37%/2436  
      🟩 Clang18            Pass: 100%/7   | Total:  5h 54m | Avg: 50m 35s | Max:  1h 05m | Hits:  56%/8194  
      🟩 GCC7               Pass: 100%/2   | Total:  2h 09m | Avg:  1h 04m | Max:  1h 06m | Hits:  36%/2440  
      🟩 GCC8               Pass: 100%/1   | Total:  1h 03m | Avg:  1h 03m | Max:  1h 03m | Hits:  36%/1220  
      🟩 GCC9               Pass: 100%/2   | Total:  2h 04m | Avg:  1h 02m | Max:  1h 04m | Hits:  36%/2440  
      🟩 GCC10              Pass: 100%/2   | Total:  2h 07m | Avg:  1h 03m | Max:  1h 05m | Hits:  36%/2440  
      🟩 GCC11              Pass: 100%/2   | Total:  2h 04m | Avg:  1h 02m | Max:  1h 06m | Hits:  36%/2436  
      🟩 GCC12              Pass: 100%/2   | Total:  2h 07m | Avg:  1h 03m | Max:  1h 05m | Hits:  36%/2436  
      🟩 GCC13              Pass: 100%/11  | Total:  7h 15m | Avg: 39m 37s | Max:  1h 22m | Hits:  71%/13398 
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 32m | Avg:  1h 16m | Max:  1h 22m | Hits:  12%/2084  
      🟩 MSVC14.42          Pass: 100%/2   | Total:  2h 30m | Avg:  1h 15m | Max:  1h 16m | Hits:  12%/2084  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 22m | Avg:  1h 11m | Max:  1h 12m | Hits:  34%/2254  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total: 16h 00m | Avg: 56m 28s | Max:  1h 05m | Hits:  44%/20382 
      🟩 GCC                Pass: 100%/22  | Total: 18h 52m | Avg: 51m 28s | Max:  1h 22m | Hits:  54%/26810 
      🟩 MSVC               Pass: 100%/4   | Total:  5h 03m | Avg:  1h 15m | Max:  1h 22m | Hits:  12%/4168  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 22m | Avg:  1h 11m | Max:  1h 12m | Hits:  34%/2254  
    🟩 gpu
      🟩 h100               Pass: 100%/3   | Total:  1h 11m | Avg: 23m 49s | Max: 26m 22s | Hits:  78%/3654  
      🟩 rtx2080            Pass: 100%/34  | Total:  1d 12h | Avg:  1h 04m | Max:  1h 22m | Hits:  34%/40216 
      🟩 rtxa6000           Pass: 100%/8   | Total:  4h 24m | Avg: 33m 06s | Max:  1h 05m | Hits:  84%/9744  
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total:  1d 15h | Avg:  1h 03m | Max:  1h 22m | Hits:  34%/43870 
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 24m 52s | Avg: 24m 52s | Max: 24m 52s | Hits:  99%/1218  
      🟩 GraphCapture       Pass: 100%/1   | Total: 19m 50s | Avg: 19m 50s | Max: 19m 50s | Hits:  99%/1218  
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 10m | Avg: 23m 23s | Max: 24m 55s | Hits:  99%/3654  
      🟩 TestGPU            Pass: 100%/3   | Total:  1h 05m | Avg: 21m 51s | Max: 24m 05s | Hits:  99%/3654  
    🟩 sm
      🟩 90                 Pass: 100%/3   | Total:  1h 11m | Avg: 23m 49s | Max: 26m 22s | Hits:  78%/3654  
      🟩 90;90a;100         Pass: 100%/1   | Total:  1h 22m | Avg:  1h 22m | Max:  1h 22m | Hits:  36%/1218  
    🟩 std
      🟩 17                 Pass: 100%/20  | Total: 21h 27m | Avg:  1h 04m | Max:  1h 22m | Hits:  33%/23591 
      🟩 20                 Pass: 100%/25  | Total: 20h 50m | Avg: 50m 02s | Max:  1h 22m | Hits:  56%/30023 
    
  • 🟩 thrust: Pass: 100%/45 | Total: 22h 42m | Avg: 30m 16s | Max: 1h 08m | Hits: 72%/79956

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 35m 53s | Avg: 17m 56s | Max: 24m 47s | Hits:  88%/3556  
    🟩 cpu
      🟩 amd64              Pass: 100%/43  | Total: 21h 48m | Avg: 30m 26s | Max:  1h 08m | Hits:  72%/76401 
      🟩 arm64              Pass: 100%/2   | Total: 53m 24s | Avg: 26m 42s | Max: 28m 31s | Hits:  77%/3555  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  2h 55m | Avg: 35m 10s | Max: 57m 34s | Hits:  66%/8881  
      🟩 12.5               Pass: 100%/2   | Total:  2h 11m | Avg:  1h 05m | Max:  1h 06m | Hits:  24%/3554  
      🟩 12.8               Pass: 100%/38  | Total: 17h 35m | Avg: 27m 46s | Max:  1h 08m | Hits:  76%/67521 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 45m 51s | Avg: 22m 55s | Max: 23m 03s | Hits:  77%/3554  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  2h 55m | Avg: 35m 10s | Max: 57m 34s | Hits:  66%/8881  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 11m | Avg:  1h 05m | Max:  1h 06m | Hits:  24%/3554  
      🟩 nvcc12.8           Pass: 100%/36  | Total: 16h 49m | Avg: 28m 02s | Max:  1h 08m | Hits:  76%/63967 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 45m 51s | Avg: 22m 55s | Max: 23m 03s | Hits:  77%/3554  
      🟩 nvcc               Pass: 100%/43  | Total: 21h 56m | Avg: 30m 36s | Max:  1h 08m | Hits:  72%/76402 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  1h 54m | Avg: 28m 35s | Max: 30m 43s | Hits:  77%/7108  
      🟩 Clang15            Pass: 100%/2   | Total: 53m 15s | Avg: 26m 37s | Max: 26m 49s | Hits:  77%/3554  
      🟩 Clang16            Pass: 100%/2   | Total: 57m 34s | Avg: 28m 47s | Max: 30m 20s | Hits:  77%/3554  
      🟩 Clang17            Pass: 100%/2   | Total: 54m 49s | Avg: 27m 24s | Max: 27m 46s | Hits:  77%/3554  
      🟩 Clang18            Pass: 100%/7   | Total:  2h 22m | Avg: 20m 18s | Max: 27m 19s | Hits:  83%/12439 
      🟩 GCC7               Pass: 100%/2   | Total: 57m 20s | Avg: 28m 40s | Max: 29m 07s | Hits:  76%/3556  
      🟩 GCC8               Pass: 100%/1   | Total: 29m 00s | Avg: 29m 00s | Max: 29m 00s | Hits:  76%/1778  
      🟩 GCC9               Pass: 100%/2   | Total:  1h 01m | Avg: 30m 48s | Max: 30m 57s | Hits:  76%/3556  
      🟩 GCC10              Pass: 100%/2   | Total:  1h 01m | Avg: 30m 44s | Max: 31m 52s | Hits:  76%/3556  
      🟩 GCC11              Pass: 100%/2   | Total: 56m 36s | Avg: 28m 18s | Max: 28m 20s | Hits:  76%/3556  
      🟩 GCC12              Pass: 100%/2   | Total:  1h 00m | Avg: 30m 18s | Max: 31m 38s | Hits:  76%/3556  
      🟩 GCC13              Pass: 100%/10  | Total:  3h 24m | Avg: 20m 24s | Max: 30m 58s | Hits:  86%/17780 
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 54m | Avg: 57m 19s | Max: 57m 34s | Hits:  28%/3542  
      🟩 MSVC14.42          Pass: 100%/3   | Total:  2h 43m | Avg: 54m 34s | Max:  1h 08m | Hits:  33%/5313  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 11m | Avg:  1h 05m | Max:  1h 06m | Hits:  24%/3554  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  7h 02m | Avg: 24m 50s | Max: 30m 43s | Hits:  79%/30209 
      🟩 GCC                Pass: 100%/21  | Total:  8h 50m | Avg: 25m 16s | Max: 31m 52s | Hits:  81%/37338 
      🟩 MSVC               Pass: 100%/5   | Total:  4h 38m | Avg: 55m 40s | Max:  1h 08m | Hits:  31%/8855  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 11m | Avg:  1h 05m | Max:  1h 06m | Hits:  24%/3554  
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 28m 23s | Avg: 14m 11s | Max: 16m 43s | Hits:  88%/3556  
      🟩 rtx2080            Pass: 100%/33  | Total: 18h 27m | Avg: 33m 33s | Max:  1h 08m | Hits:  69%/58637 
      🟩 rtx4090            Pass: 100%/10  | Total:  3h 46m | Avg: 22m 37s | Max:  1h 05m | Hits:  81%/17763 
    🟩 jobs
      🟩 Build              Pass: 100%/38  | Total: 21h 12m | Avg: 33m 28s | Max:  1h 08m | Hits:  68%/67519 
      🟩 TestCPU            Pass: 100%/3   | Total: 45m 45s | Avg: 15m 15s | Max: 30m 36s | Hits:  90%/5326  
      🟩 TestGPU            Pass: 100%/4   | Total: 44m 25s | Avg: 11m 06s | Max: 11m 40s | Hits:  99%/7111  
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 28m 23s | Avg: 14m 11s | Max: 16m 43s | Hits:  88%/3556  
      🟩 90;90a;100         Pass: 100%/1   | Total: 30m 58s | Avg: 30m 58s | Max: 30m 58s | Hits:  76%/1778  
    🟩 std
      🟩 17                 Pass: 100%/20  | Total: 11h 52m | Avg: 35m 37s | Max:  1h 08m | Hits:  66%/35531 
      🟩 20                 Pass: 100%/23  | Total: 10h 13m | Avg: 26m 41s | Max:  1h 05m | Hits:  76%/40869 
    
  • 🟩 libcudacxx: Pass: 100%/43 | Total: 9h 25m | Avg: 13m 08s | Max: 35m 17s | Hits: 78%/104044

    🟩 cpu
      🟩 amd64              Pass: 100%/41  | Total:  9h 07m | Avg: 13m 20s | Max: 35m 17s | Hits:  77%/98337 
      🟩 arm64              Pass: 100%/2   | Total: 17m 51s | Avg:  8m 55s | Max: 11m 37s | Hits:  85%/5707  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  1h 15m | Avg: 15m 07s | Max: 24m 55s | Hits:  69%/13804 
      🟩 12.5               Pass: 100%/2   | Total:  1h 07m | Avg: 33m 44s | Max: 35m 17s | Hits:  28%/5652  
      🟩 12.8               Pass: 100%/36  | Total:  7h 01m | Avg: 11m 43s | Max: 31m 39s | Hits:  82%/84588 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 43m 10s | Avg: 21m 35s | Max: 21m 47s | Hits:  27%/5668  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  1h 15m | Avg: 15m 07s | Max: 24m 55s | Hits:  69%/13804 
      🟩 nvcc12.5           Pass: 100%/2   | Total:  1h 07m | Avg: 33m 44s | Max: 35m 17s | Hits:  28%/5652  
      🟩 nvcc12.8           Pass: 100%/34  | Total:  6h 18m | Avg: 11m 08s | Max: 31m 39s | Hits:  86%/78920 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 43m 10s | Avg: 21m 35s | Max: 21m 47s | Hits:  27%/5668  
      🟩 nvcc               Pass: 100%/41  | Total:  8h 41m | Avg: 12m 43s | Max: 35m 17s | Hits:  81%/98376 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 55m 02s | Avg: 13m 45s | Max: 22m 26s | Hits:  63%/11306 
      🟩 Clang15            Pass: 100%/2   | Total: 14m 47s | Avg:  7m 23s | Max:  7m 31s | Hits:  92%/5664  
      🟩 Clang16            Pass: 100%/2   | Total: 15m 01s | Avg:  7m 30s | Max:  7m 31s | Hits:  92%/5664  
      🟩 Clang17            Pass: 100%/2   | Total: 14m 56s | Avg:  7m 28s | Max:  7m 47s | Hits:  92%/5664  
      🟩 Clang18            Pass: 100%/6   | Total:  1h 23m | Avg: 13m 59s | Max: 21m 47s | Hits:  63%/14185 
      🟩 GCC7               Pass: 100%/2   | Total: 29m 46s | Avg: 14m 53s | Max: 19m 10s | Hits:  57%/5602  
      🟩 GCC8               Pass: 100%/1   | Total:  6m 11s | Avg:  6m 11s | Max:  6m 11s | Hits:  92%/2811  
      🟩 GCC9               Pass: 100%/2   | Total: 12m 43s | Avg:  6m 21s | Max:  6m 40s | Hits:  92%/5614  
      🟩 GCC10              Pass: 100%/2   | Total: 14m 55s | Avg:  7m 27s | Max:  7m 28s | Hits:  91%/5670  
      🟩 GCC11              Pass: 100%/2   | Total: 14m 35s | Avg:  7m 17s | Max:  7m 36s | Hits:  92%/5666  
      🟩 GCC12              Pass: 100%/2   | Total: 14m 29s | Avg:  7m 14s | Max:  7m 22s | Hits:  92%/5666  
      🟩 GCC13              Pass: 100%/10  | Total:  1h 59m | Avg: 11m 56s | Max: 31m 39s | Hits:  79%/14446 
      🟩 MSVC14.29          Pass: 100%/2   | Total: 50m 36s | Avg: 25m 18s | Max: 25m 41s | Hits:  91%/5136  
      🟩 MSVC14.42          Pass: 100%/2   | Total: 51m 17s | Avg: 25m 38s | Max: 26m 27s | Hits:  90%/5298  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  1h 07m | Avg: 33m 44s | Max: 35m 17s | Hits:  28%/5652  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/16  | Total:  3h 03m | Avg: 11m 28s | Max: 22m 26s | Hits:  75%/42483 
      🟩 GCC                Pass: 100%/21  | Total:  3h 32m | Avg: 10m 05s | Max: 31m 39s | Hits:  84%/45475 
      🟩 MSVC               Pass: 100%/4   | Total:  1h 41m | Avg: 25m 28s | Max: 26m 27s | Hits:  90%/10434 
      🟩 NVHPC              Pass: 100%/2   | Total:  1h 07m | Avg: 33m 44s | Max: 35m 17s | Hits:  28%/5652  
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 18m 56s | Avg:  9m 28s | Max: 13m 23s | Hits:  93%/2943  
      🟩 rtx2080            Pass: 100%/41  | Total:  9h 06m | Avg: 13m 19s | Max: 35m 17s | Hits:  77%/101101
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total:  8h 10m | Avg: 13m 14s | Max: 35m 17s | Hits:  78%/104004
      🟩 NVRTC              Pass: 100%/2   | Total: 31m 16s | Avg: 15m 38s | Max: 16m 26s | Hits:  90%/40    
      🟩 Test               Pass: 100%/3   | Total: 41m 25s | Avg: 13m 48s | Max: 14m 07s
      🟩 VerifyCodegen      Pass: 100%/1   | Total:  2m 18s | Avg:  2m 18s | Max:  2m 18s
    🟩 sm
      🟩 75                 Pass: 100%/2   | Total: 31m 16s | Avg: 15m 38s | Max: 16m 26s | Hits:  90%/40    
      🟩 90                 Pass: 100%/2   | Total: 18m 56s | Avg:  9m 28s | Max: 13m 23s | Hits:  93%/2943  
      🟩 90;90a;100         Pass: 100%/1   | Total: 31m 39s | Avg: 31m 39s | Max: 31m 39s | Hits:  33%/2943  
    🟩 std
      🟩 17                 Pass: 100%/21  | Total:  4h 39m | Avg: 13m 18s | Max: 32m 12s | Hits:  79%/55480 
      🟩 20                 Pass: 100%/21  | Total:  4h 43m | Avg: 13m 29s | Max: 35m 17s | Hits:  76%/48564 
    
  • 🟩 cudax: Pass: 100%/22 | Total: 2h 25m | Avg: 6m 35s | Max: 20m 32s | Hits: 94%/11722

    🟩 cpu
      🟩 amd64              Pass: 100%/18  | Total:  2h 09m | Avg:  7m 11s | Max: 20m 32s | Hits:  94%/9406  
      🟩 arm64              Pass: 100%/4   | Total: 15m 40s | Avg:  3m 55s | Max:  4m 06s | Hits:  96%/2316  
    🟩 ctk
      🟩 12.0               Pass: 100%/1   | Total: 11m 13s | Avg: 11m 13s | Max: 11m 13s | Hits:  57%/277   
      🟩 12.5               Pass: 100%/2   | Total: 13m 53s | Avg:  6m 56s | Max:  7m 10s | Hits:  92%/742   
      🟩 12.8               Pass: 100%/19  | Total:  2h 00m | Avg:  6m 19s | Max: 20m 32s | Hits:  95%/10703 
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/1   | Total: 11m 13s | Avg: 11m 13s | Max: 11m 13s | Hits:  57%/277   
      🟩 nvcc12.5           Pass: 100%/2   | Total: 13m 53s | Avg:  6m 56s | Max:  7m 10s | Hits:  92%/742   
      🟩 nvcc12.8           Pass: 100%/19  | Total:  2h 00m | Avg:  6m 19s | Max: 20m 32s | Hits:  95%/10703 
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/22  | Total:  2h 25m | Avg:  6m 35s | Max: 20m 32s | Hits:  94%/11722 
    🟩 cxx
      🟩 Clang14            Pass: 100%/1   | Total:  4m 19s | Avg:  4m 19s | Max:  4m 19s | Hits:  96%/581   
      🟩 Clang15            Pass: 100%/1   | Total:  4m 36s | Avg:  4m 36s | Max:  4m 36s | Hits:  96%/579   
      🟩 Clang16            Pass: 100%/1   | Total:  4m 19s | Avg:  4m 19s | Max:  4m 19s | Hits:  96%/579   
      🟩 Clang17            Pass: 100%/1   | Total:  4m 24s | Avg:  4m 24s | Max:  4m 24s | Hits:  96%/579   
      🟩 Clang18            Pass: 100%/4   | Total: 32m 43s | Avg:  8m 10s | Max: 20m 32s | Hits:  97%/2316  
      🟩 GCC10              Pass: 100%/1   | Total:  4m 09s | Avg:  4m 09s | Max:  4m 09s | Hits:  96%/581   
      🟩 GCC11              Pass: 100%/1   | Total:  4m 28s | Avg:  4m 28s | Max:  4m 28s | Hits:  96%/579   
      🟩 GCC12              Pass: 100%/2   | Total: 17m 18s | Avg:  8m 39s | Max: 12m 33s | Hits:  97%/1158  
      🟩 GCC13              Pass: 100%/6   | Total: 33m 11s | Avg:  5m 31s | Max: 14m 00s | Hits:  96%/3474  
      🟩 MSVC14.39          Pass: 100%/1   | Total: 11m 13s | Avg: 11m 13s | Max: 11m 13s | Hits:  57%/277   
      🟩 MSVC14.42          Pass: 100%/1   | Total: 10m 35s | Avg: 10m 35s | Max: 10m 35s | Hits:  57%/277   
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 13m 53s | Avg:  6m 56s | Max:  7m 10s | Hits:  92%/742   
    🟩 cxx_family
      🟩 Clang              Pass: 100%/8   | Total: 50m 21s | Avg:  6m 17s | Max: 20m 32s | Hits:  96%/4634  
      🟩 GCC                Pass: 100%/10  | Total: 59m 06s | Avg:  5m 54s | Max: 14m 00s | Hits:  96%/5792  
      🟩 MSVC               Pass: 100%/2   | Total: 21m 48s | Avg: 10m 54s | Max: 11m 13s | Hits:  57%/554   
      🟩 NVHPC              Pass: 100%/2   | Total: 13m 53s | Avg:  6m 56s | Max:  7m 10s | Hits:  92%/742   
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 17m 47s | Avg:  8m 53s | Max: 14m 00s | Hits:  97%/1158  
      🟩 rtx2080            Pass: 100%/20  | Total:  2h 07m | Avg:  6m 22s | Max: 20m 32s | Hits:  94%/10564 
    🟩 jobs
      🟩 Build              Pass: 100%/19  | Total:  1h 38m | Avg:  5m 09s | Max: 11m 13s | Hits:  93%/9985  
      🟩 Test               Pass: 100%/3   | Total: 47m 05s | Avg: 15m 41s | Max: 20m 32s | Hits:  99%/1737  
    🟩 sm
      🟩 90                 Pass: 100%/3   | Total: 21m 18s | Avg:  7m 06s | Max: 14m 00s | Hits:  97%/1737  
      🟩 90a                Pass: 100%/1   | Total:  3m 51s | Avg:  3m 51s | Max:  3m 51s | Hits:  96%/579   
    🟩 std
      🟩 17                 Pass: 100%/4   | Total: 18m 21s | Avg:  4m 35s | Max:  7m 10s | Hits:  95%/2108  
      🟩 20                 Pass: 100%/18  | Total:  2h 06m | Avg:  7m 02s | Max: 20m 32s | Hits:  94%/9614  
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 15m 28s | Avg: 7m 44s | Max: 12m 51s | Hits: 97%/308

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 15m 28s | Avg:  7m 44s | Max: 12m 51s | Hits:  97%/308   
    🟩 ctk
      🟩 12.8               Pass: 100%/2   | Total: 15m 28s | Avg:  7m 44s | Max: 12m 51s | Hits:  97%/308   
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/2   | Total: 15m 28s | Avg:  7m 44s | Max: 12m 51s | Hits:  97%/308   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 15m 28s | Avg:  7m 44s | Max: 12m 51s | Hits:  97%/308   
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 15m 28s | Avg:  7m 44s | Max: 12m 51s | Hits:  97%/308   
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 15m 28s | Avg:  7m 44s | Max: 12m 51s | Hits:  97%/308   
    🟩 gpu
      🟩 rtx2080            Pass: 100%/2   | Total: 15m 28s | Avg:  7m 44s | Max: 12m 51s | Hits:  97%/308   
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 37s | Avg:  2m 37s | Max:  2m 37s | Hits:  96%/154   
      🟩 Test               Pass: 100%/1   | Total: 12m 51s | Avg: 12m 51s | Max: 12m 51s | Hits:  98%/154   
    
  • 🟩 python: Pass: 100%/1 | Total: 1h 00m | Avg: 1h 00m | Max: 1h 00m

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total:  1h 00m | Avg:  1h 00m | Max:  1h 00m
    🟩 ctk
      🟩 12.8               Pass: 100%/1   | Total:  1h 00m | Avg:  1h 00m | Max:  1h 00m
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/1   | Total:  1h 00m | Avg:  1h 00m | Max:  1h 00m
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total:  1h 00m | Avg:  1h 00m | Max:  1h 00m
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total:  1h 00m | Avg:  1h 00m | Max:  1h 00m
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total:  1h 00m | Avg:  1h 00m | Max:  1h 00m
    🟩 gpu
      🟩 rtx2080            Pass: 100%/1   | Total:  1h 00m | Avg:  1h 00m | Max:  1h 00m
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total:  1h 00m | Avg:  1h 00m | Max:  1h 00m
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
+/- libcu++
CUB
Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
+/- libcu++
+/- CUB
+/- Thrust
+/- CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 158)

# Runner
111 linux-amd64-cpu16
15 windows-amd64-cpu16
10 linux-arm64-cpu16
8 linux-amd64-gpu-rtx2080-latest-1
6 linux-amd64-gpu-rtxa6000-latest-1
5 linux-amd64-gpu-h100-latest-1
3 linux-amd64-gpu-rtx4090-latest-1

@fbusato fbusato merged commit ea28386 into NVIDIA:main Mar 4, 2025
169 of 172 checks passed
@fbusato fbusato deleted the optimize-rot-left-right branch March 20, 2025 22:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.0 Targeted for 3.0 release
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

[FEA]: Provide optimized <bit> functions for device
2 participants