-
Notifications
You must be signed in to change notification settings - Fork 171
Make a few memory management objects public + Miscellaneous doc updates #693
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…Resource public APIs
/ok to test a6387a0 |
This comment has been minimized.
This comment has been minimized.
/ok to test cc0f6ce |
…g of stream=None; more docs
/ok to test 58323ac |
@@ -34,7 +35,7 @@ def _lazy_init(): | |||
_inited = True | |||
|
|||
|
|||
def launch(stream, config, kernel, *kernel_args): | |||
def launch(stream: Union[Stream, IsStreamT], config: LaunchConfig, kernel: Kernel, *kernel_args): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit inconsistent where only this API supports __cuda_stream__
protocol, but many other APIs only work with explicit Stream
type objects. We should probably be consistent.
Maybe push this to a follow up PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have public examples for PyTorch and CuPy showcasing the use of this protocol. With this PR we made it slightly faster in favor of our own Stream
(through try-except). Perhaps we should instead encourage users to use our native objects, and state that using the protocol could add a slight overhead?
@@ -52,7 +52,8 @@ Then such objects can be understood by ``cuda.core`` anywhere a stream-like obje | |||
is needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should discuss this. This will introduce non-negligible overhead, similar to DLPack, where we might be better off having a user explicitly wrap their stream objects in a cuda.core.Stream
and reuse that Stream
object as needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes right now only launch()
and Device.create_stream()
support the protocol, due to the exact concern of adding overheads. Overhead aside, I think essentially we are assessing which interoperability story we want to tell:
- We give people a convenient way for them to convert their types to ours, and only accept our types in our APIs
- We accept everyone's types in our APIs
Both stories need the protocol so it's not a waste. Just how to use the protocol is different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we need the protocol regardless. My point was that we document here that an object that supports __cuda_stream__
protocol can be used anywhere cuda.core
expects a stream-like object, which isn't currently true. We should do one of following things here:
- Update the docs to make it clear that a user is required to wrap their
__cuda_stream__
protocol supporting object in acuda.core.Stream
before passing it to othercuda.core
APIs. - Update the
cuda.core
implementation to supportIsStreamT
objects everywhere Streams are an API argument. - Update the docs to make it clear which APIs explicitly allow
IsStreamT
objects vs required explicitStream
objects.- I'm -1 on this because it is confusing and non-intuitive to users
Based on the current state of implementation, I think we're the closest to 1 (minus the launch()
API) where we should probably make the docs reflect that for the time being?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll work on 1 now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kkraus14 could you check out commit bbc1c65? I try to rephrase to hint that we want people to wrap their streams, instead of accepting them as-is. We probably have to defer the change of launch()
to a later time. Vlad's graph support (nicely) took advantage of launch()
supporting __cuda_stream__
. We can look into it during graph phase 2.
Co-authored-by: Keith Kraus <[email protected]>
/ok to test bbc1c65 |
CI is green now. |
|
Description
closes #596.
Buffer
andMemoryResource
are already public (ex: returned byDevice.memory_resource.allocate()
). We just expose them under thecuda.core
namespace for documentation purposes (Doc: Documentcuda.core
public (but non-entry-point) objects #601)Buffer
's destructor would use the default stream if stream is not passed explicitly. This isn't right because it takes away the control from the memory resource authors. We now always let the memory resource make the call.Buffer
has an__init__()
. This isn't right (since we want buffers to be returned byMR.allocate()
) if we want to make it public. I moved this support tofrom_handle()
.DeviceMemoryResource
andLegacyPinnedMemoryResource
are named after their respective cccl-rt/cudax counterparts.__cuda_stream__
protocol has a protocol type now (IsStreamT
)__cuda_stream__
Checklist