-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC]: C API #134
[RFC]: C API #134
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
rfcs/20240806-c-api/README.md
Outdated
|
||
| NCCL | oneCCL (proposed C) | oneCCL (current, C++) | | ||
|-------------------|------------------------------|-------------------------| | ||
|`cudaError_t` |`onecclResult_t cudaSetDevice(device)(1)`| N/A | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the device? We should be much more specific here. I think we don't have to cover this API. My proposal is to handle all device handles in ...Config APIs, so for example. For example passing users device to onecclCommInit would look like:
onecclCommConfig_t config = ONECCL_COMM_CONFIG_INITIALIZER;
config.sycl.queue = &queue;
onecclCommInitConfig(rank, size, ..., &config);
Similar pattern could be applied to onecclStreamCreate
, so we could get rid of onecclCreateSyclStream
which looks like a workaround rather than an elegant solution:
onecclStream_t stream;
onecclStreamConfig_t stream_config = ONECCL_STREAM_SYCL_CONFIG_INITIALIZER;
// or onecclStreamConfig_t config = ONECCL_STREAM_CONFIG_INITIALIZER;
config.sycl.queue = &queue;
onecclStreamCreateConfig(&stream, &config);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We agreed to use integer index for the devices temporarily, before SYCL releases API for device selection similar to cudaSetDevice
.
For the streams, I think that onecclCreateStreamXpu(&onecclStream_t stream_ptr, void* args)
should be enough for all communications backed we would like to support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
Have you considered this issue intel/llvm#15251 Are there not similar implementation issues in the level_zero backend due to the implicit device setting nature of SYCL? I do not see how implementing a wrapper function to sycl that matches the api of I think that the only current solution is to rely on the assumption that no more than one gpu device will be used per MPI rank and that environment variables such as CUDA_VISIBLE_DEVICES/ intel vendor equivalent, or ONEAPI_DEVICE_SECTOR are used: as detailed in intel/llvm#15251 |
We do need to support the case where one MPI rank will open more than one GPU device. It is hard to understand all the details in the ticket you refer, but it appears more like a memory leak, which I assume can/has to be fixed. In any case, in general a MPI rank can open all the GPU devices. |
rfcs/20240806-c-api/README.md
Outdated
- `onecclResult_t onecclReleaseStream(oneccl_stream)` | ||
|
||
`onecclResult_t onecclStreamDestroy(onecclStream_t oneccl_stream)` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So onecclStreamDestroy
is the final version right?
rfcs/20240806-c-api/README.md
Outdated
|
||
`onecclResult_t onecclStreamDestroy(onecclStream_t oneccl_stream)` | ||
|
||
Once the sycl::queue is registered, it is hidden behind the `cclStream_t` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cclStream_t
-> onecclStream_t
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is a typo. @zhenggb72, can you update this and we can approve and merge?
Thanks @garzaran that link was useful. I've opened a OPENMPI issue with a cuda reproducer that represents what happens in DPC++ here: open-mpi/ompi#12848 |
c3a07a8
to
928e842
Compare
Looks good |
I formatted Gengbin's C APi proposal as an RFC.