Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PTX: Add cp.async.mbarrier.arrive{.noinc} #3602

Merged
merged 2 commits into from
Jan 30, 2025

Conversation

bernhardmgruber
Copy link
Contributor

No description provided.

@bernhardmgruber bernhardmgruber requested review from a team as code owners January 30, 2025 09:57
@bernhardmgruber bernhardmgruber changed the title ptx: Add cp.async.mbarrier.arrive{.noinc} PTX: Add cp.async.mbarrier.arrive{.noinc} Jan 30, 2025
@bernhardmgruber bernhardmgruber enabled auto-merge (squash) January 30, 2025 11:07
Copy link
Contributor

🟩 CI finished in 2h 11m: Pass: 100%/152 | Total: 2d 06h | Avg: 21m 35s | Max: 1h 06m | Hits: 480%/21587
  • 🟩 cub: Pass: 100%/44 | Total: 1d 06h | Avg: 41m 50s | Max: 1h 06m | Hits: 392%/3552

    🟩 cpu
      🟩 amd64              Pass: 100%/42  | Total:  1d 05h | Avg: 41m 42s | Max:  1h 06m | Hits: 392%/3552  
      🟩 arm64              Pass: 100%/2   | Total:  1h 28m | Avg: 44m 28s | Max: 44m 30s
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  3h 40m | Avg: 44m 10s | Max: 54m 36s | Hits: 407%/888   
      🟩 12.5               Pass: 100%/2   | Total:  1h 53m | Avg: 56m 38s | Max:  1h 00m
      🟩 12.6               Pass: 100%/37  | Total:  1d 01h | Avg: 40m 43s | Max:  1h 06m | Hits: 387%/2664  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  1h 43m | Avg: 51m 49s | Max: 52m 08s
      🟩 nvcc12.0           Pass: 100%/5   | Total:  3h 40m | Avg: 44m 10s | Max: 54m 36s | Hits: 407%/888   
      🟩 nvcc12.5           Pass: 100%/2   | Total:  1h 53m | Avg: 56m 38s | Max:  1h 00m
      🟩 nvcc12.6           Pass: 100%/35  | Total: 23h 22m | Avg: 40m 04s | Max:  1h 06m | Hits: 387%/2664  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  1h 43m | Avg: 51m 49s | Max: 52m 08s
      🟩 nvcc               Pass: 100%/42  | Total:  1d 04h | Avg: 41m 21s | Max:  1h 06m | Hits: 392%/3552  
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  2h 49m | Avg: 42m 15s | Max: 43m 35s
      🟩 Clang15            Pass: 100%/2   | Total:  1h 28m | Avg: 44m 11s | Max: 45m 27s
      🟩 Clang16            Pass: 100%/2   | Total:  1h 22m | Avg: 41m 16s | Max: 41m 29s
      🟩 Clang17            Pass: 100%/2   | Total:  1h 28m | Avg: 44m 00s | Max: 45m 04s
      🟩 Clang18            Pass: 100%/7   | Total:  4h 40m | Avg: 40m 04s | Max: 52m 08s
      🟩 GCC7               Pass: 100%/2   | Total:  1h 26m | Avg: 43m 13s | Max: 44m 29s
      🟩 GCC8               Pass: 100%/1   | Total: 39m 51s | Avg: 39m 51s | Max: 39m 51s
      🟩 GCC9               Pass: 100%/2   | Total:  1h 22m | Avg: 41m 15s | Max: 42m 25s
      🟩 GCC10              Pass: 100%/2   | Total:  1h 28m | Avg: 44m 17s | Max: 44m 47s
      🟩 GCC11              Pass: 100%/2   | Total:  1h 26m | Avg: 43m 20s | Max: 44m 26s
      🟩 GCC12              Pass: 100%/4   | Total:  2h 27m | Avg: 36m 48s | Max: 58m 45s
      🟩 GCC13              Pass: 100%/8   | Total:  3h 57m | Avg: 29m 41s | Max: 50m 05s
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 56m | Avg: 58m 25s | Max:  1h 02m | Hits: 398%/1776  
      🟩 MSVC14.39          Pass: 100%/2   | Total:  2h 13m | Avg:  1h 06m | Max:  1h 06m | Hits: 386%/1776  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  1h 53m | Avg: 56m 38s | Max:  1h 00m
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total: 11h 48m | Avg: 41m 40s | Max: 52m 08s
      🟩 GCC                Pass: 100%/21  | Total: 12h 48m | Avg: 36m 36s | Max: 58m 45s
      🟩 MSVC               Pass: 100%/4   | Total:  4h 10m | Avg:  1h 02m | Max:  1h 06m | Hits: 392%/3552  
      🟩 NVHPC              Pass: 100%/2   | Total:  1h 53m | Avg: 56m 38s | Max:  1h 00m
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 43m 31s | Avg: 21m 45s | Max: 29m 07s
      🟩 rtxa6000           Pass: 100%/8   | Total:  3h 43m | Avg: 27m 58s | Max: 50m 05s
      🟩 v100               Pass: 100%/34  | Total:  1d 02h | Avg: 46m 16s | Max:  1h 06m | Hits: 392%/3552  
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total:  1d 03h | Avg: 45m 24s | Max:  1h 06m | Hits: 392%/3552  
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 19m 39s | Avg: 19m 39s | Max: 19m 39s
      🟩 GraphCapture       Pass: 100%/1   | Total: 15m 23s | Avg: 15m 23s | Max: 15m 23s
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 22m | Avg: 27m 38s | Max: 29m 07s
      🟩 TestGPU            Pass: 100%/2   | Total: 42m 52s | Avg: 21m 26s | Max: 21m 51s
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 43m 31s | Avg: 21m 45s | Max: 29m 07s
      🟩 90a                Pass: 100%/1   | Total: 14m 37s | Avg: 14m 37s | Max: 14m 37s
    🟩 std
      🟩 17                 Pass: 100%/20  | Total: 15h 52m | Avg: 47m 36s | Max:  1h 06m | Hits: 394%/2664  
      🟩 20                 Pass: 100%/24  | Total: 14h 48m | Avg: 37m 01s | Max:  1h 06m | Hits: 384%/888   
    
  • 🟩 libcudacxx: Pass: 100%/43 | Total: 8h 27m | Avg: 11m 47s | Max: 38m 26s | Hits: 662%/10129

    🟩 cpu
      🟩 amd64              Pass: 100%/41  | Total:  8h 08m | Avg: 11m 55s | Max: 38m 26s | Hits: 662%/10129 
      🟩 arm64              Pass: 100%/2   | Total: 18m 40s | Avg:  9m 20s | Max: 15m 22s
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total: 39m 00s | Avg:  7m 48s | Max: 20m 48s | Hits: 688%/2487  
      🟩 12.5               Pass: 100%/2   | Total: 49m 43s | Avg: 24m 51s | Max: 33m 09s
      🟩 12.6               Pass: 100%/36  | Total:  6h 58m | Avg: 11m 37s | Max: 38m 26s | Hits: 653%/7642  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/4   | Total:  1h 07m | Avg: 16m 54s | Max: 22m 40s
      🟩 nvcc12.0           Pass: 100%/5   | Total: 39m 00s | Avg:  7m 48s | Max: 20m 48s | Hits: 688%/2487  
      🟩 nvcc12.5           Pass: 100%/2   | Total: 49m 43s | Avg: 24m 51s | Max: 33m 09s
      🟩 nvcc12.6           Pass: 100%/32  | Total:  5h 51m | Avg: 10m 58s | Max: 38m 26s | Hits: 653%/7642  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/4   | Total:  1h 07m | Avg: 16m 54s | Max: 22m 40s
      🟩 nvcc               Pass: 100%/39  | Total:  7h 19m | Avg: 11m 16s | Max: 38m 26s | Hits: 662%/10129 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 18m 10s | Avg:  4m 32s | Max:  5m 31s
      🟩 Clang15            Pass: 100%/2   | Total: 14m 23s | Avg:  7m 11s | Max:  9m 41s
      🟩 Clang16            Pass: 100%/2   | Total:  9m 10s | Avg:  4m 35s | Max:  4m 38s
      🟩 Clang17            Pass: 100%/2   | Total:  8m 55s | Avg:  4m 27s | Max:  4m 31s
      🟩 Clang18            Pass: 100%/8   | Total:  2h 27m | Avg: 18m 28s | Max: 38m 26s
      🟩 GCC7               Pass: 100%/2   | Total: 24m 02s | Avg: 12m 01s | Max: 20m 36s
      🟩 GCC8               Pass: 100%/1   | Total: 15m 44s | Avg: 15m 44s | Max: 15m 44s
      🟩 GCC9               Pass: 100%/2   | Total: 26m 48s | Avg: 13m 24s | Max: 21m 36s
      🟩 GCC10              Pass: 100%/2   | Total:  7m 44s | Avg:  3m 52s | Max:  4m 03s
      🟩 GCC11              Pass: 100%/2   | Total: 12m 26s | Avg:  6m 13s | Max:  8m 26s
      🟩 GCC12              Pass: 100%/2   | Total:  7m 55s | Avg:  3m 57s | Max:  4m 00s
      🟩 GCC13              Pass: 100%/8   | Total:  1h 01m | Avg:  7m 38s | Max: 16m 36s
      🟩 MSVC14.29          Pass: 100%/2   | Total: 46m 49s | Avg: 23m 24s | Max: 26m 01s | Hits: 675%/4984  
      🟩 MSVC14.39          Pass: 100%/2   | Total: 56m 35s | Avg: 28m 17s | Max: 28m 32s | Hits: 649%/5145  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 49m 43s | Avg: 24m 51s | Max: 33m 09s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/18  | Total:  3h 18m | Avg: 11m 01s | Max: 38m 26s
      🟩 GCC                Pass: 100%/19  | Total:  2h 35m | Avg:  8m 11s | Max: 21m 36s
      🟩 MSVC               Pass: 100%/4   | Total:  1h 43m | Avg: 25m 51s | Max: 28m 32s | Hits: 662%/10129 
      🟩 NVHPC              Pass: 100%/2   | Total: 49m 43s | Avg: 24m 51s | Max: 33m 09s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/6   | Total:  1h 33m | Avg: 15m 35s | Max: 38m 26s
      🟩 v100               Pass: 100%/37  | Total:  6h 53m | Avg: 11m 11s | Max: 33m 09s | Hits: 662%/10129 
    🟩 jobs
      🟩 Build              Pass: 100%/38  | Total:  7h 02m | Avg: 11m 07s | Max: 33m 09s | Hits: 662%/10129 
      🟩 NVRTC              Pass: 100%/2   | Total: 31m 58s | Avg: 15m 59s | Max: 16m 36s
      🟩 Test               Pass: 100%/2   | Total: 50m 36s | Avg: 25m 18s | Max: 38m 26s
      🟩 VerifyCodegen      Pass: 100%/1   | Total:  2m 00s | Avg:  2m 00s | Max:  2m 00s
    🟩 sm
      🟩 75                 Pass: 100%/2   | Total: 31m 58s | Avg: 15m 59s | Max: 16m 36s
      🟩 90                 Pass: 100%/1   | Total: 12m 35s | Avg: 12m 35s | Max: 12m 35s
      🟩 90a                Pass: 100%/2   | Total: 16m 34s | Avg:  8m 17s | Max: 13m 01s
    🟩 std
      🟩 17                 Pass: 100%/21  | Total:  4h 11m | Avg: 11m 59s | Max: 28m 32s | Hits: 657%/7481  
      🟩 20                 Pass: 100%/21  | Total:  4h 13m | Avg: 12m 04s | Max: 38m 26s | Hits: 676%/2648  
    
  • 🟩 thrust: Pass: 100%/42 | Total: 13h 19m | Avg: 19m 02s | Max: 58m 50s | Hits: 279%/7384

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 21m 16s | Avg: 10m 38s | Max: 10m 40s
    🟩 cpu
      🟩 amd64              Pass: 100%/40  | Total: 12h 54m | Avg: 19m 21s | Max: 58m 50s | Hits: 279%/7384  
      🟩 arm64              Pass: 100%/2   | Total: 25m 47s | Avg: 12m 53s | Max: 14m 00s
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  1h 51m | Avg: 22m 21s | Max: 51m 10s | Hits: 302%/1846  
      🟩 12.5               Pass: 100%/2   | Total:  1h 25m | Avg: 42m 41s | Max: 42m 50s
      🟩 12.6               Pass: 100%/35  | Total: 10h 02m | Avg: 17m 13s | Max: 58m 50s | Hits: 271%/5538  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 21m 53s | Avg: 10m 56s | Max: 10m 57s
      🟩 nvcc12.0           Pass: 100%/5   | Total:  1h 51m | Avg: 22m 21s | Max: 51m 10s | Hits: 302%/1846  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  1h 25m | Avg: 42m 41s | Max: 42m 50s
      🟩 nvcc12.6           Pass: 100%/33  | Total:  9h 40m | Avg: 17m 36s | Max: 58m 50s | Hits: 271%/5538  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 21m 53s | Avg: 10m 56s | Max: 10m 57s
      🟩 nvcc               Pass: 100%/40  | Total: 12h 58m | Avg: 19m 27s | Max: 58m 50s | Hits: 279%/7384  
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 57m 13s | Avg: 14m 18s | Max: 15m 51s
      🟩 Clang15            Pass: 100%/2   | Total: 28m 53s | Avg: 14m 26s | Max: 16m 18s
      🟩 Clang16            Pass: 100%/2   | Total: 29m 05s | Avg: 14m 32s | Max: 14m 43s
      🟩 Clang17            Pass: 100%/2   | Total: 28m 41s | Avg: 14m 20s | Max: 16m 25s
      🟩 Clang18            Pass: 100%/7   | Total:  1h 17m | Avg: 11m 06s | Max: 13m 12s
      🟩 GCC7               Pass: 100%/2   | Total: 31m 40s | Avg: 15m 50s | Max: 17m 03s
      🟩 GCC8               Pass: 100%/1   | Total: 13m 15s | Avg: 13m 15s | Max: 13m 15s
      🟩 GCC9               Pass: 100%/2   | Total: 35m 36s | Avg: 17m 48s | Max: 19m 04s
      🟩 GCC10              Pass: 100%/2   | Total: 27m 27s | Avg: 13m 43s | Max: 14m 03s
      🟩 GCC11              Pass: 100%/2   | Total: 30m 51s | Avg: 15m 25s | Max: 17m 27s
      🟩 GCC12              Pass: 100%/2   | Total: 36m 48s | Avg: 18m 24s | Max: 19m 20s
      🟩 GCC13              Pass: 100%/8   | Total:  1h 35m | Avg: 11m 58s | Max: 19m 22s
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 45m | Avg: 52m 54s | Max: 54m 38s | Hits: 287%/3692  
      🟩 MSVC14.39          Pass: 100%/2   | Total:  1h 55m | Avg: 57m 52s | Max: 58m 50s | Hits: 270%/3692  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  1h 25m | Avg: 42m 41s | Max: 42m 50s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  3h 41m | Avg: 13m 02s | Max: 16m 25s
      🟩 GCC                Pass: 100%/19  | Total:  4h 31m | Avg: 14m 17s | Max: 19m 22s
      🟩 MSVC               Pass: 100%/4   | Total:  3h 41m | Avg: 55m 23s | Max: 58m 50s | Hits: 279%/7384  
      🟩 NVHPC              Pass: 100%/2   | Total:  1h 25m | Avg: 42m 41s | Max: 42m 50s
    🟩 gpu
      🟩 rtx4090            Pass: 100%/8   | Total:  1h 29m | Avg: 11m 07s | Max: 17m 34s
      🟩 v100               Pass: 100%/34  | Total: 11h 50m | Avg: 20m 54s | Max: 58m 50s | Hits: 279%/7384  
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total: 12h 32m | Avg: 20m 20s | Max: 58m 50s | Hits: 279%/7384  
      🟩 TestCPU            Pass: 100%/2   | Total: 15m 32s | Avg:  7m 46s | Max:  7m 56s
      🟩 TestGPU            Pass: 100%/3   | Total: 32m 06s | Avg: 10m 42s | Max: 11m 10s
    🟩 sm
      🟩 90a                Pass: 100%/1   | Total:  4m 30s | Avg:  4m 30s | Max:  4m 30s
    🟩 std
      🟩 17                 Pass: 100%/20  | Total:  7h 24m | Avg: 22m 13s | Max: 58m 50s | Hits: 285%/5538  
      🟩 20                 Pass: 100%/20  | Total:  5h 34m | Avg: 16m 43s | Max: 56m 55s | Hits: 261%/1846  
    
  • 🟩 cudax: Pass: 100%/20 | Total: 1h 40m | Avg: 5m 00s | Max: 12m 56s | Hits: 388%/522

    🟩 cpu
      🟩 amd64              Pass: 100%/16  | Total:  1h 28m | Avg:  5m 31s | Max: 12m 56s | Hits: 388%/522   
      🟩 arm64              Pass: 100%/4   | Total: 11m 39s | Avg:  2m 54s | Max:  3m 45s
    🟩 ctk
      🟩 12.0               Pass: 100%/1   | Total:  8m 47s | Avg:  8m 47s | Max:  8m 47s | Hits: 388%/261   
      🟩 12.5               Pass: 100%/2   | Total: 10m 46s | Avg:  5m 23s | Max:  5m 32s
      🟩 12.6               Pass: 100%/17  | Total:  1h 20m | Avg:  4m 44s | Max: 12m 56s | Hits: 388%/261   
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/1   | Total:  8m 47s | Avg:  8m 47s | Max:  8m 47s | Hits: 388%/261   
      🟩 nvcc12.5           Pass: 100%/2   | Total: 10m 46s | Avg:  5m 23s | Max:  5m 32s
      🟩 nvcc12.6           Pass: 100%/17  | Total:  1h 20m | Avg:  4m 44s | Max: 12m 56s | Hits: 388%/261   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/20  | Total:  1h 40m | Avg:  5m 00s | Max: 12m 56s | Hits: 388%/522   
    🟩 cxx
      🟩 Clang14            Pass: 100%/1   | Total:  3m 02s | Avg:  3m 02s | Max:  3m 02s
      🟩 Clang15            Pass: 100%/1   | Total:  3m 01s | Avg:  3m 01s | Max:  3m 01s
      🟩 Clang16            Pass: 100%/1   | Total:  3m 33s | Avg:  3m 33s | Max:  3m 33s
      🟩 Clang17            Pass: 100%/1   | Total:  3m 26s | Avg:  3m 26s | Max:  3m 26s
      🟩 Clang18            Pass: 100%/4   | Total: 21m 12s | Avg:  5m 18s | Max: 12m 28s
      🟩 GCC10              Pass: 100%/1   | Total:  3m 13s | Avg:  3m 13s | Max:  3m 13s
      🟩 GCC11              Pass: 100%/1   | Total:  3m 05s | Avg:  3m 05s | Max:  3m 05s
      🟩 GCC12              Pass: 100%/2   | Total: 16m 11s | Avg:  8m 05s | Max: 12m 56s
      🟩 GCC13              Pass: 100%/4   | Total: 11m 51s | Avg:  2m 57s | Max:  3m 45s
      🟩 MSVC14.36          Pass: 100%/1   | Total:  8m 47s | Avg:  8m 47s | Max:  8m 47s | Hits: 388%/261   
      🟩 MSVC14.39          Pass: 100%/1   | Total: 11m 58s | Avg: 11m 58s | Max: 11m 58s | Hits: 388%/261   
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 10m 46s | Avg:  5m 23s | Max:  5m 32s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/8   | Total: 34m 14s | Avg:  4m 16s | Max: 12m 28s
      🟩 GCC                Pass: 100%/8   | Total: 34m 20s | Avg:  4m 17s | Max: 12m 56s
      🟩 MSVC               Pass: 100%/2   | Total: 20m 45s | Avg: 10m 22s | Max: 11m 58s | Hits: 388%/522   
      🟩 NVHPC              Pass: 100%/2   | Total: 10m 46s | Avg:  5m 23s | Max:  5m 32s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/4   | Total: 32m 04s | Avg:  8m 01s | Max: 12m 56s
      🟩 v100               Pass: 100%/16  | Total:  1h 08m | Avg:  4m 15s | Max: 11m 58s | Hits: 388%/522   
    🟩 jobs
      🟩 Build              Pass: 100%/18  | Total:  1h 14m | Avg:  4m 08s | Max: 11m 58s | Hits: 388%/522   
      🟩 Test               Pass: 100%/2   | Total: 25m 24s | Avg: 12m 42s | Max: 12m 56s
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total:  2m 46s | Avg:  2m 46s | Max:  2m 46s
      🟩 90a                Pass: 100%/1   | Total:  2m 45s | Avg:  2m 45s | Max:  2m 45s
    🟩 std
      🟩 17                 Pass: 100%/4   | Total: 14m 23s | Avg:  3m 35s | Max:  5m 14s
      🟩 20                 Pass: 100%/16  | Total:  1h 25m | Avg:  5m 21s | Max: 12m 56s | Hits: 388%/522   
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 6m 53s | Avg: 3m 26s | Max: 4m 55s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total:  6m 53s | Avg:  3m 26s | Max:  4m 55s
    🟩 ctk
      🟩 12.6               Pass: 100%/2   | Total:  6m 53s | Avg:  3m 26s | Max:  4m 55s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/2   | Total:  6m 53s | Avg:  3m 26s | Max:  4m 55s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total:  6m 53s | Avg:  3m 26s | Max:  4m 55s
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total:  6m 53s | Avg:  3m 26s | Max:  4m 55s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total:  6m 53s | Avg:  3m 26s | Max:  4m 55s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/2   | Total:  6m 53s | Avg:  3m 26s | Max:  4m 55s
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  1m 58s | Avg:  1m 58s | Max:  1m 58s
      🟩 Test               Pass: 100%/1   | Total:  4m 55s | Avg:  4m 55s | Max:  4m 55s
    
  • 🟩 python: Pass: 100%/1 | Total: 27m 23s | Avg: 27m 23s | Max: 27m 23s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 27m 23s | Avg: 27m 23s | Max: 27m 23s
    🟩 ctk
      🟩 12.6               Pass: 100%/1   | Total: 27m 23s | Avg: 27m 23s | Max: 27m 23s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/1   | Total: 27m 23s | Avg: 27m 23s | Max: 27m 23s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 27m 23s | Avg: 27m 23s | Max: 27m 23s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 27m 23s | Avg: 27m 23s | Max: 27m 23s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 27m 23s | Avg: 27m 23s | Max: 27m 23s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/1   | Total: 27m 23s | Avg: 27m 23s | Max: 27m 23s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 27m 23s | Avg: 27m 23s | Max: 27m 23s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
+/- libcu++
CUB
Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
+/- libcu++
+/- CUB
+/- Thrust
+/- CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 152)

# Runner
110 linux-amd64-cpu16
14 windows-amd64-cpu16
10 linux-arm64-cpu16
8 linux-amd64-gpu-rtx2080-latest-1
6 linux-amd64-gpu-rtxa6000-latest-1
3 linux-amd64-gpu-rtx4090-latest-1
1 linux-amd64-gpu-h100-latest-1

@bernhardmgruber bernhardmgruber merged commit 0f52dd5 into NVIDIA:main Jan 30, 2025
164 of 168 checks passed
Copy link
Contributor

Git push to origin failed for branch/2.8.x with exitcode 128

@bernhardmgruber bernhardmgruber deleted the ptx_cp_async_mbrarrier branch January 30, 2025 12:43
bernhardmgruber added a commit that referenced this pull request Jan 31, 2025
miscco pushed a commit that referenced this pull request Jan 31, 2025
* Sync ptx_dot_variants.h with libcuda-ptx (#3564)

* Update ptx_isa.h to include 8.6 and 8.7 (#3563)

* PTX: Update generated files with Blackwell instructions (#3568)

* ptx: Update existing instructions
* ptx: Add new instructions
* Fix returning error out values
See:
- https://gitlab-master.nvidia.com/CCCL/libcuda-ptx/-/merge_requests/74
- https://gitlab-master.nvidia.com/CCCL/libcuda-ptx/-/merge_requests/73
* ptx: Fix out var declaration
See  https://gitlab-master.nvidia.com/CCCL/libcuda-ptx/-/merge_requests/75
* mbarrier.{test,try}_wait: Fix test. Wrong files were included.
* docs: Fix special registers include
* Allow non-included documentation pages
* Workaround NVRTC

Co-authored-by: Allard Hendriksen <[email protected]>

* PTX: Remove internal instructions (#3583)

* barrier.cluster.aligned: Remove
This is not supposed to be exposed in CCCL.

* elect.sync: Remove
Not ready for inclusion yet. This needs to handle the optional extra
output mask as well.

* mapa: Remove
This has compiler bugs. We should use intrinsics instead.

Co-authored-by: Allard Hendriksen <[email protected]>

* PTX: Update existing instructions (#3584)

* mbarrier.expect_tx: Add missing source and test
It was already documented(!)

* cp.async.bulk.tensor: Add .{gather,scatter}4
* fence: Add .sync_restrict, .proxy.async.sync_restrict

Co-authored-by: Allard Hendriksen <[email protected]>

* PTX: Add clusterlaunchcontrol (#3589)

Co-authored-by: Allard Hendriksen <[email protected]>

* PTX: Add cp.async.mbarrier.arrive{.noinc} (#3602)

Co-authored-by: Allard Hendriksen <[email protected]>

* PTX: Add multimem instructions (#3603)

* Add multimem.ld_reduce
* Add multimem.red
* Add multimem.st

Co-authored-by: Allard Hendriksen <[email protected]>

* PTX: Add st.bulk (#3604)

Co-authored-by: Allard Hendriksen <[email protected]>

* PTX: Add tcgen05 instructions (#3607)

* ptx: Add tcgen05.alloc

* ptx: Add tcgen05.commit

* ptx: Add tcgen05.cp

* ptx: Add tcgen05.fence

* ptx: Add tcgen05.ld

* ptx: Add tcgen05.mma

* ptx: Add tcgen05.mma.ws

* ptx: Add tcgen05.shift

* ptx: Add tcgen05.st

* ptx: Add tcgen05.wait

* fix docs

---------

Co-authored-by: Allard Hendriksen <[email protected]>

---------

Co-authored-by: Allard Hendriksen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants