Cuda stream #238

svdree-shell · 2024-07-16T15:27:27Z

This PR provides support for running CUDA compression / decompression on a CUDA stream. This allows for potential overlap between memory transfers and kernel execution, or memory transfers in different directions on devices that support it. Overlapping transfers and execution enables large performance gains in certain scenarios.

To that end I've introduced a zfp_exec_params_cuda struct (similar to zfp_exec_params_omp) that contains a cudaStream_t member. Setting and querying the stream can be done using the new zfp_stream_set_cuda_stream and zfp_stream_cuda_stream functions.

When no stream is specified, the default 0-stream is used. Since several CUDA calls in the code have been changed to their *Async equivalent, a cudaStreamSynchronize(0) is performed when stream 0 is used, to retain blocking behavior in that case.

lindstro · 2024-07-16T20:56:46Z

@svdree-shell Thank you so much for this contribution--we've been wanting to support this capability for some time. Unfortunately, as also discussed on PR #232, this is not a great time to merge CUDA/HIP PRs due to extensive changes already being made to the GPU implementations on the staging branch. Let me revisit this PR once our work on staging has been merged into develop.

On a technical note, would it be possible to keep zfp.h insulated from CUDA headers? Keep in mind that zfp.h is C code, and even if ZFP_WITH_CUDA is defined, I'm wary of including CUDA headers into this C header. As best I can tell, cudaStream_t is a pointer, so maybe this could be addressed using a void* in zfp.h.

siebrenreker · 2024-07-18T09:42:33Z

Hi @lindstro, would you then also prefer this PR to be targeted to the staging branch?

lindstro · 2024-07-18T15:14:14Z

Hi @lindstro, would you then also prefer this PR to be targeted to the staging branch?

In general, we favor the GitFlow model of developing new features on separate feature branches so that they can be well tested before they are merged into develop. The staging branch was created to consolidate CUDA and HIP work done on several different branches that we knew would take some time to massage before this work was ready for develop.

What we have done in the past when a new feature has been developed and submitted as a PR is create a target feature branch. I think that would be preferable to targeting staging, which I'd like to leave open as a pre-develop sandbox for the zfp team. But we would need you to pick up what we ultimately merge from staging into develop. Let me get back to you once that's done, and I'll then set up a feature branch for you. Thanks for your patience.

svdree-shell added 3 commits July 11, 2024 13:51

Add ability to execute CUDA operations in a stream

d8f6d93

Re-introduce blocking behavior when no stream was specified

97d9889

Add documentation

ff296a7

svdree-shell closed this Sep 4, 2024

svdree-shell deleted the cuda-stream branch September 4, 2024 11:59

svdree-shell restored the cuda-stream branch September 4, 2024 11:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cuda stream #238

Cuda stream #238

svdree-shell commented Jul 16, 2024

lindstro commented Jul 16, 2024

siebrenreker commented Jul 18, 2024

lindstro commented Jul 18, 2024

Cuda stream #238

Cuda stream #238

Conversation

svdree-shell commented Jul 16, 2024

lindstro commented Jul 16, 2024

siebrenreker commented Jul 18, 2024

lindstro commented Jul 18, 2024