Skip to content

Make a few memory management objects public + Miscellaneous doc updates #693

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Jun 11, 2025

Conversation

leofang
Copy link
Member

@leofang leofang commented Jun 7, 2025

Description

closes #596.

  • Buffer and MemoryResource are already public (ex: returned by Device.memory_resource.allocate()). We just expose them under the cuda.core namespace for documentation purposes (Doc: Document cuda.core public (but non-entry-point) objects #601)
    • I noticed that Buffer's destructor would use the default stream if stream is not passed explicitly. This isn't right because it takes away the control from the memory resource authors. We now always let the memory resource make the call.
    • I noticed Buffer has an __init__(). This isn't right (since we want buffers to be returned by MR.allocate()) if we want to make it public. I moved this support to from_handle().
  • The new exposed objects DeviceMemoryResource and LegacyPinnedMemoryResource are named after their respective cccl-rt/cudax counterparts.
  • The __cuda_stream__ protocol has a protocol type now (IsStreamT)
  • Internal refactoring to consolidate support for objects that have __cuda_stream__
  • Documentation is clarified and expanded to reflect the new public exposure of these classes, in particular many type hints are updated/fixed
  • Update the docs to mention all other merged PRs targeting this release (v0.3.0)

Checklist

  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

Copy link
Contributor

copy-pr-bot bot commented Jun 7, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@leofang leofang added this to the cuda.core beta 4 milestone Jun 7, 2025
@leofang
Copy link
Member Author

leofang commented Jun 7, 2025

/ok to test a6387a0

@leofang leofang self-assigned this Jun 7, 2025
@leofang leofang added P1 Medium priority - Should do feature New feature or request cuda.core Everything related to the cuda.core module labels Jun 7, 2025

This comment has been minimized.

@leofang leofang marked this pull request as draft June 7, 2025 04:22
@leofang
Copy link
Member Author

leofang commented Jun 9, 2025

/ok to test cc0f6ce

@leofang leofang added P0 High priority - Must do! and removed P1 Medium priority - Should do labels Jun 9, 2025
@leofang leofang changed the title Make a few memory management objects public Make a few memory management objects public + Miscellaneous doc updates Jun 9, 2025
@leofang leofang marked this pull request as ready for review June 9, 2025 01:48
@leofang leofang requested a review from rwgk June 9, 2025 01:49
@leofang leofang marked this pull request as draft June 9, 2025 02:22
@leofang leofang added the breaking Breaking changes are introduced label Jun 9, 2025
@leofang
Copy link
Member Author

leofang commented Jun 9, 2025

/ok to test 58323ac

@leofang leofang marked this pull request as ready for review June 9, 2025 03:25
@@ -34,7 +35,7 @@ def _lazy_init():
_inited = True


def launch(stream, config, kernel, *kernel_args):
def launch(stream: Union[Stream, IsStreamT], config: LaunchConfig, kernel: Kernel, *kernel_args):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit inconsistent where only this API supports __cuda_stream__ protocol, but many other APIs only work with explicit Stream type objects. We should probably be consistent.

Maybe push this to a follow up PR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have public examples for PyTorch and CuPy showcasing the use of this protocol. With this PR we made it slightly faster in favor of our own Stream (through try-except). Perhaps we should instead encourage users to use our native objects, and state that using the protocol could add a slight overhead?

@@ -52,7 +52,8 @@ Then such objects can be understood by ``cuda.core`` anywhere a stream-like obje
is needed.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should discuss this. This will introduce non-negligible overhead, similar to DLPack, where we might be better off having a user explicitly wrap their stream objects in a cuda.core.Stream and reuse that Stream object as needed.

Copy link
Member Author

@leofang leofang Jun 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes right now only launch() and Device.create_stream() support the protocol, due to the exact concern of adding overheads. Overhead aside, I think essentially we are assessing which interoperability story we want to tell:

  • We give people a convenient way for them to convert their types to ours, and only accept our types in our APIs
  • We accept everyone's types in our APIs

Both stories need the protocol so it's not a waste. Just how to use the protocol is different.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we need the protocol regardless. My point was that we document here that an object that supports __cuda_stream__ protocol can be used anywhere cuda.core expects a stream-like object, which isn't currently true. We should do one of following things here:

  1. Update the docs to make it clear that a user is required to wrap their __cuda_stream__ protocol supporting object in a cuda.core.Stream before passing it to other cuda.core APIs.
  2. Update the cuda.core implementation to support IsStreamT objects everywhere Streams are an API argument.
  3. Update the docs to make it clear which APIs explicitly allow IsStreamT objects vs required explicit Stream objects.
    • I'm -1 on this because it is confusing and non-intuitive to users

Based on the current state of implementation, I think we're the closest to 1 (minus the launch() API) where we should probably make the docs reflect that for the time being?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll work on 1 now.

Copy link
Member Author

@leofang leofang Jun 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kkraus14 could you check out commit bbc1c65? I try to rephrase to hint that we want people to wrap their streams, instead of accepting them as-is. We probably have to defer the change of launch() to a later time. Vlad's graph support (nicely) took advantage of launch() supporting __cuda_stream__. We can look into it during graph phase 2.

kkraus14
kkraus14 previously approved these changes Jun 10, 2025
@github-project-automation github-project-automation bot moved this from Todo to In Review in CCCL Jun 10, 2025
@leofang
Copy link
Member Author

leofang commented Jun 10, 2025

/ok to test bbc1c65

@leofang
Copy link
Member Author

leofang commented Jun 10, 2025

CI is green now.

@leofang leofang merged commit bf590e4 into NVIDIA:main Jun 11, 2025
101 of 103 checks passed
@github-project-automation github-project-automation bot moved this from In Review to Done in CCCL Jun 11, 2025
@leofang leofang deleted the default_mr branch June 11, 2025 01:54
Copy link

Doc Preview CI
Preview removed because the pull request was closed or merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking Breaking changes are introduced cuda.core Everything related to the cuda.core module feature New feature or request P0 High priority - Must do!
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Expose pinned memory resource as public API
2 participants