Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CUDAX] Add host launch API allowing stream ordered host execution #3555

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

pciolkosz
Copy link
Contributor

host_launch allows to execute the supplied callable in stream order on the supplied stream.

It takes all arguments by copy and internally it will move them to a dynamic allocation to store them until the callable is asynchronously called.

There is also a special overload that takes the callable wrapped in reference_wrapper and no arguments, that will skip the internal allocation.

Copy link

copy-pr-bot bot commented Jan 28, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@pciolkosz
Copy link
Contributor Author

/ok to test

Copy link
Contributor

🟩 CI finished in 3h 08m: Pass: 100%/20 | Total: 2h 13m | Avg: 6m 41s | Max: 22m 06s | Hits: 308%/524
  • 🟩 cudax: Pass: 100%/20 | Total: 2h 13m | Avg: 6m 41s | Max: 22m 06s | Hits: 308%/524

    🟩 cpu
      🟩 amd64              Pass: 100%/16  | Total:  1h 58m | Avg:  7m 26s | Max: 22m 06s | Hits: 308%/524   
      🟩 arm64              Pass: 100%/4   | Total: 14m 58s | Avg:  3m 44s | Max:  4m 28s
    🟩 ctk
      🟩 12.0               Pass: 100%/1   | Total:  9m 49s | Avg:  9m 49s | Max:  9m 49s | Hits: 308%/262   
      🟩 12.5               Pass: 100%/2   | Total: 12m 52s | Avg:  6m 26s | Max:  6m 27s
      🟩 12.6               Pass: 100%/17  | Total:  1h 51m | Avg:  6m 32s | Max: 22m 06s | Hits: 308%/262   
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/1   | Total:  9m 49s | Avg:  9m 49s | Max:  9m 49s | Hits: 308%/262   
      🟩 nvcc12.5           Pass: 100%/2   | Total: 12m 52s | Avg:  6m 26s | Max:  6m 27s
      🟩 nvcc12.6           Pass: 100%/17  | Total:  1h 51m | Avg:  6m 32s | Max: 22m 06s | Hits: 308%/262   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/20  | Total:  2h 13m | Avg:  6m 41s | Max: 22m 06s | Hits: 308%/524   
    🟩 cxx
      🟩 Clang14            Pass: 100%/1   | Total:  3m 55s | Avg:  3m 55s | Max:  3m 55s
      🟩 Clang15            Pass: 100%/1   | Total:  4m 20s | Avg:  4m 20s | Max:  4m 20s
      🟩 Clang16            Pass: 100%/1   | Total:  4m 18s | Avg:  4m 18s | Max:  4m 18s
      🟩 Clang17            Pass: 100%/1   | Total:  4m 17s | Avg:  4m 17s | Max:  4m 17s
      🟩 Clang18            Pass: 100%/4   | Total: 33m 12s | Avg:  8m 18s | Max: 22m 06s
      🟩 GCC10              Pass: 100%/1   | Total:  3m 59s | Avg:  3m 59s | Max:  3m 59s
      🟩 GCC11              Pass: 100%/1   | Total:  3m 52s | Avg:  3m 52s | Max:  3m 52s
      🟩 GCC12              Pass: 100%/2   | Total: 25m 54s | Avg: 12m 57s | Max: 21m 42s
      🟩 GCC13              Pass: 100%/4   | Total: 14m 31s | Avg:  3m 37s | Max:  4m 28s
      🟩 MSVC14.36          Pass: 100%/1   | Total:  9m 49s | Avg:  9m 49s | Max:  9m 49s | Hits: 308%/262   
      🟩 MSVC14.39          Pass: 100%/1   | Total: 12m 55s | Avg: 12m 55s | Max: 12m 55s | Hits: 308%/262   
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 12m 52s | Avg:  6m 26s | Max:  6m 27s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/8   | Total: 50m 02s | Avg:  6m 15s | Max: 22m 06s
      🟩 GCC                Pass: 100%/8   | Total: 48m 16s | Avg:  6m 02s | Max: 21m 42s
      🟩 MSVC               Pass: 100%/2   | Total: 22m 44s | Avg: 11m 22s | Max: 12m 55s | Hits: 308%/524   
      🟩 NVHPC              Pass: 100%/2   | Total: 12m 52s | Avg:  6m 26s | Max:  6m 27s
    🟩 gpu
      🟩 v100               Pass: 100%/20  | Total:  2h 13m | Avg:  6m 41s | Max: 22m 06s | Hits: 308%/524   
    🟩 jobs
      🟩 Build              Pass: 100%/18  | Total:  1h 30m | Avg:  5m 00s | Max: 12m 55s | Hits: 308%/524   
      🟩 Test               Pass: 100%/2   | Total: 43m 48s | Avg: 21m 54s | Max: 22m 06s
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total:  3m 12s | Avg:  3m 12s | Max:  3m 12s
      🟩 90a                Pass: 100%/1   | Total:  3m 25s | Avg:  3m 25s | Max:  3m 25s
    🟩 std
      🟩 17                 Pass: 100%/4   | Total: 16m 30s | Avg:  4m 07s | Max:  6m 25s
      🟩 20                 Pass: 100%/16  | Total:  1h 57m | Avg:  7m 20s | Max: 22m 06s | Hits: 308%/524   
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

🏃‍ Runner counts (total jobs: 20)

# Runner
12 linux-amd64-cpu16
4 linux-arm64-cpu16
2 windows-amd64-cpu16
2 linux-amd64-gpu-v100-latest-1

@pciolkosz pciolkosz marked this pull request as ready for review January 29, 2025 00:49
@pciolkosz pciolkosz requested a review from a team as a code owner January 29, 2025 00:49
Copy link
Contributor

🟩 CI finished in 4h 36m: Pass: 100%/20 | Total: 1h 54m | Avg: 5m 44s | Max: 17m 53s | Hits: 311%/524
  • 🟩 cudax: Pass: 100%/20 | Total: 1h 54m | Avg: 5m 44s | Max: 17m 53s | Hits: 311%/524

    🟩 cpu
      🟩 amd64              Pass: 100%/16  | Total:  1h 43m | Avg:  6m 29s | Max: 17m 53s | Hits: 311%/524   
      🟩 arm64              Pass: 100%/4   | Total: 10m 54s | Avg:  2m 43s | Max:  2m 47s
    🟩 ctk
      🟩 12.0               Pass: 100%/1   | Total:  9m 45s | Avg:  9m 45s | Max:  9m 45s | Hits: 311%/262   
      🟩 12.5               Pass: 100%/2   | Total: 12m 36s | Avg:  6m 18s | Max:  6m 20s
      🟩 12.6               Pass: 100%/17  | Total:  1h 32m | Avg:  5m 26s | Max: 17m 53s | Hits: 311%/262   
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/1   | Total:  9m 45s | Avg:  9m 45s | Max:  9m 45s | Hits: 311%/262   
      🟩 nvcc12.5           Pass: 100%/2   | Total: 12m 36s | Avg:  6m 18s | Max:  6m 20s
      🟩 nvcc12.6           Pass: 100%/17  | Total:  1h 32m | Avg:  5m 26s | Max: 17m 53s | Hits: 311%/262   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/20  | Total:  1h 54m | Avg:  5m 44s | Max: 17m 53s | Hits: 311%/524   
    🟩 cxx
      🟩 Clang14            Pass: 100%/1   | Total:  3m 29s | Avg:  3m 29s | Max:  3m 29s
      🟩 Clang15            Pass: 100%/1   | Total:  3m 23s | Avg:  3m 23s | Max:  3m 23s
      🟩 Clang16            Pass: 100%/1   | Total:  3m 24s | Avg:  3m 24s | Max:  3m 24s
      🟩 Clang17            Pass: 100%/1   | Total:  3m 37s | Avg:  3m 37s | Max:  3m 37s
      🟩 Clang18            Pass: 100%/4   | Total: 26m 41s | Avg:  6m 40s | Max: 17m 47s
      🟩 GCC10              Pass: 100%/1   | Total:  3m 12s | Avg:  3m 12s | Max:  3m 12s
      🟩 GCC11              Pass: 100%/1   | Total:  3m 13s | Avg:  3m 13s | Max:  3m 13s
      🟩 GCC12              Pass: 100%/2   | Total: 21m 27s | Avg: 10m 43s | Max: 17m 53s
      🟩 GCC13              Pass: 100%/4   | Total: 11m 22s | Avg:  2m 50s | Max:  2m 56s
      🟩 MSVC14.36          Pass: 100%/1   | Total:  9m 45s | Avg:  9m 45s | Max:  9m 45s | Hits: 311%/262   
      🟩 MSVC14.39          Pass: 100%/1   | Total: 12m 38s | Avg: 12m 38s | Max: 12m 38s | Hits: 311%/262   
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 12m 36s | Avg:  6m 18s | Max:  6m 20s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/8   | Total: 40m 34s | Avg:  5m 04s | Max: 17m 47s
      🟩 GCC                Pass: 100%/8   | Total: 39m 14s | Avg:  4m 54s | Max: 17m 53s
      🟩 MSVC               Pass: 100%/2   | Total: 22m 23s | Avg: 11m 11s | Max: 12m 38s | Hits: 311%/524   
      🟩 NVHPC              Pass: 100%/2   | Total: 12m 36s | Avg:  6m 18s | Max:  6m 20s
    🟩 gpu
      🟩 v100               Pass: 100%/20  | Total:  1h 54m | Avg:  5m 44s | Max: 17m 53s | Hits: 311%/524   
    🟩 jobs
      🟩 Build              Pass: 100%/18  | Total:  1h 19m | Avg:  4m 23s | Max: 12m 38s | Hits: 311%/524   
      🟩 Test               Pass: 100%/2   | Total: 35m 40s | Avg: 17m 50s | Max: 17m 53s
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total:  2m 54s | Avg:  2m 54s | Max:  2m 54s
      🟩 90a                Pass: 100%/1   | Total:  2m 56s | Avg:  2m 56s | Max:  2m 56s
    🟩 std
      🟩 17                 Pass: 100%/4   | Total: 14m 35s | Avg:  3m 38s | Max:  6m 16s
      🟩 20                 Pass: 100%/16  | Total:  1h 40m | Avg:  6m 15s | Max: 17m 53s | Hits: 311%/524   
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

🏃‍ Runner counts (total jobs: 20)

# Runner
12 linux-amd64-cpu16
4 linux-arm64-cpu16
2 windows-amd64-cpu16
2 linux-amd64-gpu-v100-latest-1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Review
Development

Successfully merging this pull request may close these issues.

1 participant