[CUDAX] Add host launch API allowing stream ordered host execution #3555

pciolkosz · 2025-01-28T02:54:52Z

host_launch allows to execute the supplied callable in stream order on the supplied stream.

It takes all arguments by copy and internally it will move them to a dynamic allocation to store them until the callable is asynchronously called.

There is also a special overload that takes the callable wrapped in reference_wrapper and no arguments, that will skip the internal allocation.

copy-pr-bot · 2025-01-28T02:54:56Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

pciolkosz · 2025-01-28T02:55:18Z

/ok to test

github-actions · 2025-01-28T06:04:34Z

🟩 CI finished in 3h 08m: Pass: 100%/20 | Total: 2h 13m | Avg: 6m 41s | Max: 22m 06s | Hits: 308%/524

🟩 cudax: Pass: 100%/20 | Total: 2h 13m | Avg: 6m 41s | Max: 22m 06s | Hits: 308%/524

🟩 cpu
  🟩 amd64              Pass: 100%/16  | Total:  1h 58m | Avg:  7m 26s | Max: 22m 06s | Hits: 308%/524   
  🟩 arm64              Pass: 100%/4   | Total: 14m 58s | Avg:  3m 44s | Max:  4m 28s
🟩 ctk
  🟩 12.0               Pass: 100%/1   | Total:  9m 49s | Avg:  9m 49s | Max:  9m 49s | Hits: 308%/262   
  🟩 12.5               Pass: 100%/2   | Total: 12m 52s | Avg:  6m 26s | Max:  6m 27s
  🟩 12.6               Pass: 100%/17  | Total:  1h 51m | Avg:  6m 32s | Max: 22m 06s | Hits: 308%/262   
🟩 cudacxx
  🟩 nvcc12.0           Pass: 100%/1   | Total:  9m 49s | Avg:  9m 49s | Max:  9m 49s | Hits: 308%/262   
  🟩 nvcc12.5           Pass: 100%/2   | Total: 12m 52s | Avg:  6m 26s | Max:  6m 27s
  🟩 nvcc12.6           Pass: 100%/17  | Total:  1h 51m | Avg:  6m 32s | Max: 22m 06s | Hits: 308%/262   
🟩 cudacxx_family
  🟩 nvcc               Pass: 100%/20  | Total:  2h 13m | Avg:  6m 41s | Max: 22m 06s | Hits: 308%/524   
🟩 cxx
  🟩 Clang14            Pass: 100%/1   | Total:  3m 55s | Avg:  3m 55s | Max:  3m 55s
  🟩 Clang15            Pass: 100%/1   | Total:  4m 20s | Avg:  4m 20s | Max:  4m 20s
  🟩 Clang16            Pass: 100%/1   | Total:  4m 18s | Avg:  4m 18s | Max:  4m 18s
  🟩 Clang17            Pass: 100%/1   | Total:  4m 17s | Avg:  4m 17s | Max:  4m 17s
  🟩 Clang18            Pass: 100%/4   | Total: 33m 12s | Avg:  8m 18s | Max: 22m 06s
  🟩 GCC10              Pass: 100%/1   | Total:  3m 59s | Avg:  3m 59s | Max:  3m 59s
  🟩 GCC11              Pass: 100%/1   | Total:  3m 52s | Avg:  3m 52s | Max:  3m 52s
  🟩 GCC12              Pass: 100%/2   | Total: 25m 54s | Avg: 12m 57s | Max: 21m 42s
  🟩 GCC13              Pass: 100%/4   | Total: 14m 31s | Avg:  3m 37s | Max:  4m 28s
  🟩 MSVC14.36          Pass: 100%/1   | Total:  9m 49s | Avg:  9m 49s | Max:  9m 49s | Hits: 308%/262   
  🟩 MSVC14.39          Pass: 100%/1   | Total: 12m 55s | Avg: 12m 55s | Max: 12m 55s | Hits: 308%/262   
  🟩 NVHPC24.7          Pass: 100%/2   | Total: 12m 52s | Avg:  6m 26s | Max:  6m 27s
🟩 cxx_family
  🟩 Clang              Pass: 100%/8   | Total: 50m 02s | Avg:  6m 15s | Max: 22m 06s
  🟩 GCC                Pass: 100%/8   | Total: 48m 16s | Avg:  6m 02s | Max: 21m 42s
  🟩 MSVC               Pass: 100%/2   | Total: 22m 44s | Avg: 11m 22s | Max: 12m 55s | Hits: 308%/524   
  🟩 NVHPC              Pass: 100%/2   | Total: 12m 52s | Avg:  6m 26s | Max:  6m 27s
🟩 gpu
  🟩 v100               Pass: 100%/20  | Total:  2h 13m | Avg:  6m 41s | Max: 22m 06s | Hits: 308%/524   
🟩 jobs
  🟩 Build              Pass: 100%/18  | Total:  1h 30m | Avg:  5m 00s | Max: 12m 55s | Hits: 308%/524   
  🟩 Test               Pass: 100%/2   | Total: 43m 48s | Avg: 21m 54s | Max: 22m 06s
🟩 sm
  🟩 90                 Pass: 100%/1   | Total:  3m 12s | Avg:  3m 12s | Max:  3m 12s
  🟩 90a                Pass: 100%/1   | Total:  3m 25s | Avg:  3m 25s | Max:  3m 25s
🟩 std
  🟩 17                 Pass: 100%/4   | Total: 16m 30s | Avg:  4m 07s | Max:  6m 25s
  🟩 20                 Pass: 100%/16  | Total:  1h 57m | Avg:  7m 20s | Max: 22m 06s | Hits: 308%/524

👃 Inspect Changes

Modifications in project?

	Project
	CCCL Infrastructure
	libcu++
	CUB
	Thrust
+/-	CUDA Experimental
	python
	CCCL C Parallel Library
	Catch2Helper

Modifications in project or dependencies?

	Project
	CCCL Infrastructure
	libcu++
	CUB
	Thrust
+/-	CUDA Experimental
	python
	CCCL C Parallel Library
	Catch2Helper

🏃‍ Runner counts (total jobs: 20)

#	Runner
12	`linux-amd64-cpu16`
4	`linux-arm64-cpu16`
2	`windows-amd64-cpu16`
2	`linux-amd64-gpu-v100-latest-1`

github-actions · 2025-01-29T05:27:17Z

🟩 CI finished in 4h 36m: Pass: 100%/20 | Total: 1h 54m | Avg: 5m 44s | Max: 17m 53s | Hits: 311%/524

🟩 cudax: Pass: 100%/20 | Total: 1h 54m | Avg: 5m 44s | Max: 17m 53s | Hits: 311%/524

🟩 cpu
  🟩 amd64              Pass: 100%/16  | Total:  1h 43m | Avg:  6m 29s | Max: 17m 53s | Hits: 311%/524   
  🟩 arm64              Pass: 100%/4   | Total: 10m 54s | Avg:  2m 43s | Max:  2m 47s
🟩 ctk
  🟩 12.0               Pass: 100%/1   | Total:  9m 45s | Avg:  9m 45s | Max:  9m 45s | Hits: 311%/262   
  🟩 12.5               Pass: 100%/2   | Total: 12m 36s | Avg:  6m 18s | Max:  6m 20s
  🟩 12.6               Pass: 100%/17  | Total:  1h 32m | Avg:  5m 26s | Max: 17m 53s | Hits: 311%/262   
🟩 cudacxx
  🟩 nvcc12.0           Pass: 100%/1   | Total:  9m 45s | Avg:  9m 45s | Max:  9m 45s | Hits: 311%/262   
  🟩 nvcc12.5           Pass: 100%/2   | Total: 12m 36s | Avg:  6m 18s | Max:  6m 20s
  🟩 nvcc12.6           Pass: 100%/17  | Total:  1h 32m | Avg:  5m 26s | Max: 17m 53s | Hits: 311%/262   
🟩 cudacxx_family
  🟩 nvcc               Pass: 100%/20  | Total:  1h 54m | Avg:  5m 44s | Max: 17m 53s | Hits: 311%/524   
🟩 cxx
  🟩 Clang14            Pass: 100%/1   | Total:  3m 29s | Avg:  3m 29s | Max:  3m 29s
  🟩 Clang15            Pass: 100%/1   | Total:  3m 23s | Avg:  3m 23s | Max:  3m 23s
  🟩 Clang16            Pass: 100%/1   | Total:  3m 24s | Avg:  3m 24s | Max:  3m 24s
  🟩 Clang17            Pass: 100%/1   | Total:  3m 37s | Avg:  3m 37s | Max:  3m 37s
  🟩 Clang18            Pass: 100%/4   | Total: 26m 41s | Avg:  6m 40s | Max: 17m 47s
  🟩 GCC10              Pass: 100%/1   | Total:  3m 12s | Avg:  3m 12s | Max:  3m 12s
  🟩 GCC11              Pass: 100%/1   | Total:  3m 13s | Avg:  3m 13s | Max:  3m 13s
  🟩 GCC12              Pass: 100%/2   | Total: 21m 27s | Avg: 10m 43s | Max: 17m 53s
  🟩 GCC13              Pass: 100%/4   | Total: 11m 22s | Avg:  2m 50s | Max:  2m 56s
  🟩 MSVC14.36          Pass: 100%/1   | Total:  9m 45s | Avg:  9m 45s | Max:  9m 45s | Hits: 311%/262   
  🟩 MSVC14.39          Pass: 100%/1   | Total: 12m 38s | Avg: 12m 38s | Max: 12m 38s | Hits: 311%/262   
  🟩 NVHPC24.7          Pass: 100%/2   | Total: 12m 36s | Avg:  6m 18s | Max:  6m 20s
🟩 cxx_family
  🟩 Clang              Pass: 100%/8   | Total: 40m 34s | Avg:  5m 04s | Max: 17m 47s
  🟩 GCC                Pass: 100%/8   | Total: 39m 14s | Avg:  4m 54s | Max: 17m 53s
  🟩 MSVC               Pass: 100%/2   | Total: 22m 23s | Avg: 11m 11s | Max: 12m 38s | Hits: 311%/524   
  🟩 NVHPC              Pass: 100%/2   | Total: 12m 36s | Avg:  6m 18s | Max:  6m 20s
🟩 gpu
  🟩 v100               Pass: 100%/20  | Total:  1h 54m | Avg:  5m 44s | Max: 17m 53s | Hits: 311%/524   
🟩 jobs
  🟩 Build              Pass: 100%/18  | Total:  1h 19m | Avg:  4m 23s | Max: 12m 38s | Hits: 311%/524   
  🟩 Test               Pass: 100%/2   | Total: 35m 40s | Avg: 17m 50s | Max: 17m 53s
🟩 sm
  🟩 90                 Pass: 100%/1   | Total:  2m 54s | Avg:  2m 54s | Max:  2m 54s
  🟩 90a                Pass: 100%/1   | Total:  2m 56s | Avg:  2m 56s | Max:  2m 56s
🟩 std
  🟩 17                 Pass: 100%/4   | Total: 14m 35s | Avg:  3m 38s | Max:  6m 16s
  🟩 20                 Pass: 100%/16  | Total:  1h 40m | Avg:  6m 15s | Max: 17m 53s | Hits: 311%/524

👃 Inspect Changes

Modifications in project?

	Project
	CCCL Infrastructure
	libcu++
	CUB
	Thrust
+/-	CUDA Experimental
	python
	CCCL C Parallel Library
	Catch2Helper

Modifications in project or dependencies?

	Project
	CCCL Infrastructure
	libcu++
	CUB
	Thrust
+/-	CUDA Experimental
	python
	CCCL C Parallel Library
	Catch2Helper

🏃‍ Runner counts (total jobs: 20)

#	Runner
12	`linux-amd64-cpu16`
4	`linux-arm64-cpu16`
2	`windows-amd64-cpu16`
2	`linux-amd64-gpu-v100-latest-1`

pciolkosz added 2 commits January 27, 2025 18:51

First take on the implementation

1b59f05

Add more tests and fix issue with const

83a8fdd

pciolkosz added 2 commits January 28, 2025 16:43

Some extra comments

5fc9ab8

Explain why cudaSteamAddCallback is used

3c7e395

pciolkosz marked this pull request as ready for review January 29, 2025 00:49

pciolkosz requested a review from a team as a code owner January 29, 2025 00:49

pciolkosz requested review from ericniebler and miscco January 29, 2025 20:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CUDAX] Add host launch API allowing stream ordered host execution #3555

[CUDAX] Add host launch API allowing stream ordered host execution #3555

pciolkosz commented Jan 28, 2025

copy-pr-bot bot commented Jan 28, 2025

pciolkosz commented Jan 28, 2025

github-actions bot commented Jan 28, 2025

🟩 cudax: Pass: 100%/20 | Total: 2h 13m | Avg: 6m 41s | Max: 22m 06s | Hits: 308%/524

👃 Inspect Changes

Modifications in project?

Modifications in project or dependencies?

🏃‍ Runner counts (total jobs: 20)

github-actions bot commented Jan 29, 2025

🟩 cudax: Pass: 100%/20 | Total: 1h 54m | Avg: 5m 44s | Max: 17m 53s | Hits: 311%/524

👃 Inspect Changes

Modifications in project?

Modifications in project or dependencies?

🏃‍ Runner counts (total jobs: 20)

[CUDAX] Add host launch API allowing stream ordered host execution #3555

Are you sure you want to change the base?

[CUDAX] Add host launch API allowing stream ordered host execution #3555

Conversation

pciolkosz commented Jan 28, 2025

copy-pr-bot bot commented Jan 28, 2025

pciolkosz commented Jan 28, 2025

github-actions bot commented Jan 28, 2025

🟩 cudax: Pass: 100%/20 | Total: 2h 13m | Avg: 6m 41s | Max: 22m 06s | Hits: 308%/524

👃 Inspect Changes

Modifications in project?

Modifications in project or dependencies?

🏃‍ Runner counts (total jobs: 20)

github-actions bot commented Jan 29, 2025

🟩 cudax: Pass: 100%/20 | Total: 1h 54m | Avg: 5m 44s | Max: 17m 53s | Hits: 311%/524

👃 Inspect Changes

Modifications in project?

Modifications in project or dependencies?

🏃‍ Runner counts (total jobs: 20)