Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PTX: Add st.bulk #3604

Merged
merged 2 commits into from
Jan 30, 2025
Merged

PTX: Add st.bulk #3604

merged 2 commits into from
Jan 30, 2025

Conversation

bernhardmgruber
Copy link
Contributor

No description provided.

@bernhardmgruber bernhardmgruber enabled auto-merge (squash) January 30, 2025 11:07
Copy link
Contributor

🟩 CI finished in 1h 49m: Pass: 100%/152 | Total: 3d 02h | Avg: 29m 15s | Max: 1h 15m | Hits: 430%/21587
  • 🟩 cub: Pass: 100%/44 | Total: 1d 14h | Avg: 52m 06s | Max: 1h 15m | Hits: 294%/3552

    🟩 cpu
      🟩 amd64              Pass: 100%/42  | Total:  1d 12h | Avg: 52m 09s | Max:  1h 15m | Hits: 294%/3552  
      🟩 arm64              Pass: 100%/2   | Total:  1h 41m | Avg: 50m 54s | Max: 52m 06s
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  4h 47m | Avg: 57m 25s | Max:  1h 01m | Hits: 376%/888   
      🟩 12.5               Pass: 100%/2   | Total:  2h 17m | Avg:  1h 08m | Max:  1h 12m
      🟩 12.6               Pass: 100%/37  | Total:  1d 07h | Avg: 50m 29s | Max:  1h 15m | Hits: 267%/2664  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  1h 55m | Avg: 57m 36s | Max: 59m 18s
      🟩 nvcc12.0           Pass: 100%/5   | Total:  4h 47m | Avg: 57m 25s | Max:  1h 01m | Hits: 376%/888   
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 17m | Avg:  1h 08m | Max:  1h 12m
      🟩 nvcc12.6           Pass: 100%/35  | Total:  1d 05h | Avg: 50m 05s | Max:  1h 15m | Hits: 267%/2664  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  1h 55m | Avg: 57m 36s | Max: 59m 18s
      🟩 nvcc               Pass: 100%/42  | Total:  1d 12h | Avg: 51m 50s | Max:  1h 15m | Hits: 294%/3552  
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  3h 48m | Avg: 57m 14s | Max:  1h 01m
      🟩 Clang15            Pass: 100%/2   | Total:  1h 53m | Avg: 56m 50s | Max: 57m 22s
      🟩 Clang16            Pass: 100%/2   | Total:  1h 57m | Avg: 58m 58s | Max:  1h 00m
      🟩 Clang17            Pass: 100%/2   | Total:  1h 57m | Avg: 58m 54s | Max:  1h 00m
      🟩 Clang18            Pass: 100%/7   | Total:  5h 31m | Avg: 47m 22s | Max:  1h 00m
      🟩 GCC7               Pass: 100%/2   | Total:  1h 53m | Avg: 56m 37s | Max: 58m 08s
      🟩 GCC8               Pass: 100%/1   | Total: 55m 02s | Avg: 55m 02s | Max: 55m 02s
      🟩 GCC9               Pass: 100%/2   | Total:  1h 59m | Avg: 59m 51s | Max:  1h 02m
      🟩 GCC10              Pass: 100%/2   | Total:  1h 51m | Avg: 55m 44s | Max: 55m 54s
      🟩 GCC11              Pass: 100%/2   | Total:  1h 52m | Avg: 56m 27s | Max: 57m 14s
      🟩 GCC12              Pass: 100%/4   | Total:  2h 57m | Avg: 44m 26s | Max:  1h 05m
      🟩 GCC13              Pass: 100%/8   | Total:  4h 39m | Avg: 34m 57s | Max: 58m 03s
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 04m | Avg:  1h 02m | Max:  1h 04m | Hits: 344%/1776  
      🟩 MSVC14.39          Pass: 100%/2   | Total:  2h 30m | Avg:  1h 15m | Max:  1h 15m | Hits: 245%/1776  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 17m | Avg:  1h 08m | Max:  1h 12m
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total: 15h 10m | Avg: 53m 31s | Max:  1h 01m
      🟩 GCC                Pass: 100%/21  | Total: 16h 09m | Avg: 46m 10s | Max:  1h 05m
      🟩 MSVC               Pass: 100%/4   | Total:  4h 35m | Avg:  1h 08m | Max:  1h 15m | Hits: 294%/3552  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 17m | Avg:  1h 08m | Max:  1h 12m
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 57m 01s | Avg: 28m 30s | Max: 29m 28s
      🟩 rtxa6000           Pass: 100%/8   | Total:  4h 07m | Avg: 30m 59s | Max: 59m 11s
      🟩 v100               Pass: 100%/34  | Total:  1d 09h | Avg: 58m 27s | Max:  1h 15m | Hits: 294%/3552  
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total:  1d 11h | Avg: 57m 37s | Max:  1h 15m | Hits: 294%/3552  
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 20m 25s | Avg: 20m 25s | Max: 20m 25s
      🟩 GraphCapture       Pass: 100%/1   | Total: 16m 26s | Avg: 16m 26s | Max: 16m 26s
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 22m | Avg: 27m 39s | Max: 29m 28s
      🟩 TestGPU            Pass: 100%/2   | Total: 40m 49s | Avg: 20m 24s | Max: 20m 26s
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 57m 01s | Avg: 28m 30s | Max: 29m 28s
      🟩 90a                Pass: 100%/1   | Total: 27m 40s | Avg: 27m 40s | Max: 27m 40s
    🟩 std
      🟩 17                 Pass: 100%/20  | Total: 19h 54m | Avg: 59m 42s | Max:  1h 14m | Hits: 311%/2664  
      🟩 20                 Pass: 100%/24  | Total: 18h 18m | Avg: 45m 45s | Max:  1h 15m | Hits: 244%/888   
    
  • 🟩 libcudacxx: Pass: 100%/43 | Total: 10h 59m | Avg: 15m 20s | Max: 34m 14s | Hits: 649%/10129

    🟩 cpu
      🟩 amd64              Pass: 100%/41  | Total: 10h 35m | Avg: 15m 29s | Max: 34m 14s | Hits: 649%/10129 
      🟩 arm64              Pass: 100%/2   | Total: 24m 47s | Avg: 12m 23s | Max: 21m 31s
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  1h 13m | Avg: 14m 44s | Max: 21m 37s | Hits: 620%/2487  
      🟩 12.5               Pass: 100%/2   | Total: 41m 03s | Avg: 20m 31s | Max: 28m 10s
      🟩 12.6               Pass: 100%/36  | Total:  9h 05m | Avg: 15m 08s | Max: 34m 14s | Hits: 659%/7642  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/4   | Total:  1h 09m | Avg: 17m 25s | Max: 21m 27s
      🟩 nvcc12.0           Pass: 100%/5   | Total:  1h 13m | Avg: 14m 44s | Max: 21m 37s | Hits: 620%/2487  
      🟩 nvcc12.5           Pass: 100%/2   | Total: 41m 03s | Avg: 20m 31s | Max: 28m 10s
      🟩 nvcc12.6           Pass: 100%/32  | Total:  7h 55m | Avg: 14m 51s | Max: 34m 14s | Hits: 659%/7642  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/4   | Total:  1h 09m | Avg: 17m 25s | Max: 21m 27s
      🟩 nvcc               Pass: 100%/39  | Total:  9h 50m | Avg: 15m 07s | Max: 34m 14s | Hits: 649%/10129 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 58m 13s | Avg: 14m 33s | Max: 21m 19s
      🟩 Clang15            Pass: 100%/2   | Total: 38m 11s | Avg: 19m 05s | Max: 24m 27s
      🟩 Clang16            Pass: 100%/2   | Total: 38m 15s | Avg: 19m 07s | Max: 23m 54s
      🟩 Clang17            Pass: 100%/2   | Total: 13m 00s | Avg:  6m 30s | Max:  6m 47s
      🟩 Clang18            Pass: 100%/8   | Total:  2h 16m | Avg: 17m 06s | Max: 21m 47s
      🟩 GCC7               Pass: 100%/2   | Total: 24m 16s | Avg: 12m 08s | Max: 20m 15s
      🟩 GCC8               Pass: 100%/1   | Total: 16m 25s | Avg: 16m 25s | Max: 16m 25s
      🟩 GCC9               Pass: 100%/2   | Total: 20m 10s | Avg: 10m 05s | Max: 16m 28s
      🟩 GCC10              Pass: 100%/2   | Total: 38m 15s | Avg: 19m 07s | Max: 24m 48s
      🟩 GCC11              Pass: 100%/2   | Total: 23m 28s | Avg: 11m 44s | Max: 19m 38s
      🟩 GCC12              Pass: 100%/2   | Total: 36m 45s | Avg: 18m 22s | Max: 21m 56s
      🟩 GCC13              Pass: 100%/8   | Total:  1h 08m | Avg:  8m 34s | Max: 19m 43s
      🟩 MSVC14.29          Pass: 100%/2   | Total: 47m 37s | Avg: 23m 48s | Max: 26m 00s | Hits: 653%/4984  
      🟩 MSVC14.39          Pass: 100%/2   | Total: 58m 47s | Avg: 29m 23s | Max: 34m 14s | Hits: 646%/5145  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 41m 03s | Avg: 20m 31s | Max: 28m 10s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/18  | Total:  4h 44m | Avg: 15m 48s | Max: 24m 27s
      🟩 GCC                Pass: 100%/19  | Total:  3h 47m | Avg: 11m 59s | Max: 24m 48s
      🟩 MSVC               Pass: 100%/4   | Total:  1h 46m | Avg: 26m 36s | Max: 34m 14s | Hits: 649%/10129 
      🟩 NVHPC              Pass: 100%/2   | Total: 41m 03s | Avg: 20m 31s | Max: 28m 10s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/6   | Total:  1h 22m | Avg: 13m 45s | Max: 21m 47s
      🟩 v100               Pass: 100%/37  | Total:  9h 37m | Avg: 15m 36s | Max: 34m 14s | Hits: 649%/10129 
    🟩 jobs
      🟩 Build              Pass: 100%/38  | Total: 10h 01m | Avg: 15m 49s | Max: 34m 14s | Hits: 649%/10129 
      🟩 NVRTC              Pass: 100%/2   | Total: 38m 58s | Avg: 19m 29s | Max: 19m 43s
      🟩 Test               Pass: 100%/2   | Total: 17m 43s | Avg:  8m 51s | Max:  8m 54s
      🟩 VerifyCodegen      Pass: 100%/1   | Total:  1m 52s | Avg:  1m 52s | Max:  1m 52s
    🟩 sm
      🟩 75                 Pass: 100%/2   | Total: 38m 58s | Avg: 19m 29s | Max: 19m 43s
      🟩 90                 Pass: 100%/1   | Total: 13m 04s | Avg: 13m 04s | Max: 13m 04s
      🟩 90a                Pass: 100%/2   | Total: 21m 35s | Avg: 10m 47s | Max: 14m 00s
    🟩 std
      🟩 17                 Pass: 100%/21  | Total:  5h 31m | Avg: 15m 46s | Max: 28m 10s | Hits: 661%/7481  
      🟩 20                 Pass: 100%/21  | Total:  5h 26m | Avg: 15m 33s | Max: 34m 14s | Hits: 615%/2648  
    
  • 🟩 thrust: Pass: 100%/42 | Total: 22h 29m | Avg: 32m 07s | Max: 1h 02m | Hits: 196%/7384

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 36m 55s | Avg: 18m 27s | Max: 25m 41s
    🟩 cpu
      🟩 amd64              Pass: 100%/40  | Total: 21h 48m | Avg: 32m 43s | Max:  1h 02m | Hits: 196%/7384  
      🟩 arm64              Pass: 100%/2   | Total: 40m 28s | Avg: 20m 14s | Max: 21m 46s
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  3h 00m | Avg: 36m 01s | Max: 53m 27s | Hits: 223%/1846  
      🟩 12.5               Pass: 100%/2   | Total:  1h 49m | Avg: 54m 40s | Max: 55m 48s
      🟩 12.6               Pass: 100%/35  | Total: 17h 39m | Avg: 30m 16s | Max:  1h 02m | Hits: 187%/5538  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 57m 01s | Avg: 28m 30s | Max: 28m 52s
      🟩 nvcc12.0           Pass: 100%/5   | Total:  3h 00m | Avg: 36m 01s | Max: 53m 27s | Hits: 223%/1846  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  1h 49m | Avg: 54m 40s | Max: 55m 48s
      🟩 nvcc12.6           Pass: 100%/33  | Total: 16h 42m | Avg: 30m 23s | Max:  1h 02m | Hits: 187%/5538  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 57m 01s | Avg: 28m 30s | Max: 28m 52s
      🟩 nvcc               Pass: 100%/40  | Total: 21h 32m | Avg: 32m 18s | Max:  1h 02m | Hits: 196%/7384  
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  2h 07m | Avg: 31m 47s | Max: 32m 38s
      🟩 Clang15            Pass: 100%/2   | Total: 58m 49s | Avg: 29m 24s | Max: 29m 38s
      🟩 Clang16            Pass: 100%/2   | Total:  1h 02m | Avg: 31m 11s | Max: 31m 33s
      🟩 Clang17            Pass: 100%/2   | Total:  1h 06m | Avg: 33m 16s | Max: 35m 18s
      🟩 Clang18            Pass: 100%/7   | Total:  2h 42m | Avg: 23m 13s | Max: 37m 57s
      🟩 GCC7               Pass: 100%/2   | Total:  1h 05m | Avg: 32m 54s | Max: 34m 13s
      🟩 GCC8               Pass: 100%/1   | Total: 34m 16s | Avg: 34m 16s | Max: 34m 16s
      🟩 GCC9               Pass: 100%/2   | Total:  1h 07m | Avg: 33m 50s | Max: 35m 38s
      🟩 GCC10              Pass: 100%/2   | Total:  1h 03m | Avg: 31m 41s | Max: 33m 18s
      🟩 GCC11              Pass: 100%/2   | Total:  1h 02m | Avg: 31m 06s | Max: 31m 42s
      🟩 GCC12              Pass: 100%/2   | Total:  1h 15m | Avg: 37m 41s | Max: 39m 35s
      🟩 GCC13              Pass: 100%/8   | Total:  2h 44m | Avg: 20m 32s | Max: 33m 34s
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 47m | Avg: 53m 42s | Max: 53m 57s | Hits: 215%/3692  
      🟩 MSVC14.39          Pass: 100%/2   | Total:  2h 02m | Avg:  1h 01m | Max:  1h 02m | Hits: 177%/3692  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  1h 49m | Avg: 54m 40s | Max: 55m 48s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  7h 57m | Avg: 28m 05s | Max: 37m 57s
      🟩 GCC                Pass: 100%/19  | Total:  8h 53m | Avg: 28m 03s | Max: 39m 35s
      🟩 MSVC               Pass: 100%/4   | Total:  3h 49m | Avg: 57m 22s | Max:  1h 02m | Hits: 196%/7384  
      🟩 NVHPC              Pass: 100%/2   | Total:  1h 49m | Avg: 54m 40s | Max: 55m 48s
    🟩 gpu
      🟩 rtx4090            Pass: 100%/8   | Total:  2h 18m | Avg: 17m 17s | Max: 33m 34s
      🟩 v100               Pass: 100%/34  | Total: 20h 11m | Avg: 35m 37s | Max:  1h 02m | Hits: 196%/7384  
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total: 21h 40m | Avg: 35m 09s | Max:  1h 02m | Hits: 196%/7384  
      🟩 TestCPU            Pass: 100%/2   | Total: 16m 03s | Avg:  8m 01s | Max:  8m 26s
      🟩 TestGPU            Pass: 100%/3   | Total: 32m 25s | Avg: 10m 48s | Max: 11m 22s
    🟩 sm
      🟩 90a                Pass: 100%/1   | Total: 19m 38s | Avg: 19m 38s | Max: 19m 38s
    🟩 std
      🟩 17                 Pass: 100%/20  | Total: 12h 19m | Avg: 36m 57s | Max:  1h 00m | Hits: 202%/5538  
      🟩 20                 Pass: 100%/20  | Total:  9h 33m | Avg: 28m 39s | Max:  1h 02m | Hits: 177%/1846  
    
  • 🟩 cudax: Pass: 100%/20 | Total: 1h 50m | Avg: 5m 31s | Max: 16m 24s | Hits: 388%/522

    🟩 cpu
      🟩 amd64              Pass: 100%/16  | Total:  1h 39m | Avg:  6m 14s | Max: 16m 24s | Hits: 388%/522   
      🟩 arm64              Pass: 100%/4   | Total: 10m 39s | Avg:  2m 39s | Max:  2m 45s
    🟩 ctk
      🟩 12.0               Pass: 100%/1   | Total:  9m 23s | Avg:  9m 23s | Max:  9m 23s | Hits: 388%/261   
      🟩 12.5               Pass: 100%/2   | Total: 11m 35s | Avg:  5m 47s | Max:  5m 51s
      🟩 12.6               Pass: 100%/17  | Total:  1h 29m | Avg:  5m 16s | Max: 16m 24s | Hits: 388%/261   
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/1   | Total:  9m 23s | Avg:  9m 23s | Max:  9m 23s | Hits: 388%/261   
      🟩 nvcc12.5           Pass: 100%/2   | Total: 11m 35s | Avg:  5m 47s | Max:  5m 51s
      🟩 nvcc12.6           Pass: 100%/17  | Total:  1h 29m | Avg:  5m 16s | Max: 16m 24s | Hits: 388%/261   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/20  | Total:  1h 50m | Avg:  5m 31s | Max: 16m 24s | Hits: 388%/522   
    🟩 cxx
      🟩 Clang14            Pass: 100%/1   | Total:  4m 01s | Avg:  4m 01s | Max:  4m 01s
      🟩 Clang15            Pass: 100%/1   | Total:  4m 14s | Avg:  4m 14s | Max:  4m 14s
      🟩 Clang16            Pass: 100%/1   | Total:  3m 46s | Avg:  3m 46s | Max:  3m 46s
      🟩 Clang17            Pass: 100%/1   | Total:  4m 01s | Avg:  4m 01s | Max:  4m 01s
      🟩 Clang18            Pass: 100%/4   | Total: 21m 43s | Avg:  5m 25s | Max: 12m 22s
      🟩 GCC10              Pass: 100%/1   | Total:  4m 05s | Avg:  4m 05s | Max:  4m 05s
      🟩 GCC11              Pass: 100%/1   | Total:  3m 44s | Avg:  3m 44s | Max:  3m 44s
      🟩 GCC12              Pass: 100%/2   | Total: 20m 29s | Avg: 10m 14s | Max: 16m 24s
      🟩 GCC13              Pass: 100%/4   | Total: 11m 40s | Avg:  2m 55s | Max:  3m 16s
      🟩 MSVC14.36          Pass: 100%/1   | Total:  9m 23s | Avg:  9m 23s | Max:  9m 23s | Hits: 388%/261   
      🟩 MSVC14.39          Pass: 100%/1   | Total: 11m 57s | Avg: 11m 57s | Max: 11m 57s | Hits: 388%/261   
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 11m 35s | Avg:  5m 47s | Max:  5m 51s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/8   | Total: 37m 45s | Avg:  4m 43s | Max: 12m 22s
      🟩 GCC                Pass: 100%/8   | Total: 39m 58s | Avg:  4m 59s | Max: 16m 24s
      🟩 MSVC               Pass: 100%/2   | Total: 21m 20s | Avg: 10m 40s | Max: 11m 57s | Hits: 388%/522   
      🟩 NVHPC              Pass: 100%/2   | Total: 11m 35s | Avg:  5m 47s | Max:  5m 51s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/4   | Total: 36m 48s | Avg:  9m 12s | Max: 16m 24s
      🟩 v100               Pass: 100%/16  | Total:  1h 13m | Avg:  4m 36s | Max: 11m 57s | Hits: 388%/522   
    🟩 jobs
      🟩 Build              Pass: 100%/18  | Total:  1h 21m | Avg:  4m 32s | Max: 11m 57s | Hits: 388%/522   
      🟩 Test               Pass: 100%/2   | Total: 28m 46s | Avg: 14m 23s | Max: 16m 24s
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total:  3m 09s | Avg:  3m 09s | Max:  3m 09s
      🟩 90a                Pass: 100%/1   | Total:  3m 16s | Avg:  3m 16s | Max:  3m 16s
    🟩 std
      🟩 17                 Pass: 100%/4   | Total: 14m 05s | Avg:  3m 31s | Max:  5m 44s
      🟩 20                 Pass: 100%/16  | Total:  1h 36m | Avg:  6m 02s | Max: 16m 24s | Hits: 388%/522   
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 7m 04s | Avg: 3m 32s | Max: 4m 48s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total:  7m 04s | Avg:  3m 32s | Max:  4m 48s
    🟩 ctk
      🟩 12.6               Pass: 100%/2   | Total:  7m 04s | Avg:  3m 32s | Max:  4m 48s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/2   | Total:  7m 04s | Avg:  3m 32s | Max:  4m 48s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total:  7m 04s | Avg:  3m 32s | Max:  4m 48s
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total:  7m 04s | Avg:  3m 32s | Max:  4m 48s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total:  7m 04s | Avg:  3m 32s | Max:  4m 48s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/2   | Total:  7m 04s | Avg:  3m 32s | Max:  4m 48s
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 16s | Avg:  2m 16s | Max:  2m 16s
      🟩 Test               Pass: 100%/1   | Total:  4m 48s | Avg:  4m 48s | Max:  4m 48s
    
  • 🟩 python: Pass: 100%/1 | Total: 26m 43s | Avg: 26m 43s | Max: 26m 43s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 26m 43s | Avg: 26m 43s | Max: 26m 43s
    🟩 ctk
      🟩 12.6               Pass: 100%/1   | Total: 26m 43s | Avg: 26m 43s | Max: 26m 43s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/1   | Total: 26m 43s | Avg: 26m 43s | Max: 26m 43s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 26m 43s | Avg: 26m 43s | Max: 26m 43s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 26m 43s | Avg: 26m 43s | Max: 26m 43s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 26m 43s | Avg: 26m 43s | Max: 26m 43s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/1   | Total: 26m 43s | Avg: 26m 43s | Max: 26m 43s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 26m 43s | Avg: 26m 43s | Max: 26m 43s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
+/- libcu++
CUB
Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
+/- libcu++
+/- CUB
+/- Thrust
+/- CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 152)

# Runner
110 linux-amd64-cpu16
14 windows-amd64-cpu16
10 linux-arm64-cpu16
8 linux-amd64-gpu-rtx2080-latest-1
6 linux-amd64-gpu-rtxa6000-latest-1
3 linux-amd64-gpu-rtx4090-latest-1
1 linux-amd64-gpu-h100-latest-1

@bernhardmgruber bernhardmgruber merged commit b1f2e63 into NVIDIA:main Jan 30, 2025
164 of 168 checks passed
@bernhardmgruber bernhardmgruber deleted the ptx_st_bulk branch January 30, 2025 12:10
Copy link
Contributor

Git push to origin failed for branch/2.8.x with exitcode 128

bernhardmgruber added a commit that referenced this pull request Jan 31, 2025
Co-authored-by: Allard Hendriksen <[email protected]>
miscco pushed a commit that referenced this pull request Jan 31, 2025
* Sync ptx_dot_variants.h with libcuda-ptx (#3564)

* Update ptx_isa.h to include 8.6 and 8.7 (#3563)

* PTX: Update generated files with Blackwell instructions (#3568)

* ptx: Update existing instructions
* ptx: Add new instructions
* Fix returning error out values
See:
- https://gitlab-master.nvidia.com/CCCL/libcuda-ptx/-/merge_requests/74
- https://gitlab-master.nvidia.com/CCCL/libcuda-ptx/-/merge_requests/73
* ptx: Fix out var declaration
See  https://gitlab-master.nvidia.com/CCCL/libcuda-ptx/-/merge_requests/75
* mbarrier.{test,try}_wait: Fix test. Wrong files were included.
* docs: Fix special registers include
* Allow non-included documentation pages
* Workaround NVRTC

Co-authored-by: Allard Hendriksen <[email protected]>

* PTX: Remove internal instructions (#3583)

* barrier.cluster.aligned: Remove
This is not supposed to be exposed in CCCL.

* elect.sync: Remove
Not ready for inclusion yet. This needs to handle the optional extra
output mask as well.

* mapa: Remove
This has compiler bugs. We should use intrinsics instead.

Co-authored-by: Allard Hendriksen <[email protected]>

* PTX: Update existing instructions (#3584)

* mbarrier.expect_tx: Add missing source and test
It was already documented(!)

* cp.async.bulk.tensor: Add .{gather,scatter}4
* fence: Add .sync_restrict, .proxy.async.sync_restrict

Co-authored-by: Allard Hendriksen <[email protected]>

* PTX: Add clusterlaunchcontrol (#3589)

Co-authored-by: Allard Hendriksen <[email protected]>

* PTX: Add cp.async.mbarrier.arrive{.noinc} (#3602)

Co-authored-by: Allard Hendriksen <[email protected]>

* PTX: Add multimem instructions (#3603)

* Add multimem.ld_reduce
* Add multimem.red
* Add multimem.st

Co-authored-by: Allard Hendriksen <[email protected]>

* PTX: Add st.bulk (#3604)

Co-authored-by: Allard Hendriksen <[email protected]>

* PTX: Add tcgen05 instructions (#3607)

* ptx: Add tcgen05.alloc

* ptx: Add tcgen05.commit

* ptx: Add tcgen05.cp

* ptx: Add tcgen05.fence

* ptx: Add tcgen05.ld

* ptx: Add tcgen05.mma

* ptx: Add tcgen05.mma.ws

* ptx: Add tcgen05.shift

* ptx: Add tcgen05.st

* ptx: Add tcgen05.wait

* fix docs

---------

Co-authored-by: Allard Hendriksen <[email protected]>

---------

Co-authored-by: Allard Hendriksen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants