-
Notifications
You must be signed in to change notification settings - Fork 875
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
btl/smcuda: add delayed stream initialization #12354
btl/smcuda: add delayed stream initialization #12354
Conversation
I was wondering whether somebody would have time to review this pr, I would like to have it merged before the weekend if possible. |
bot:retest |
Running AWS CI. We test on both x86 and arm. |
@wenduwan ok, thank you! |
FYI our CI passed. |
bot:retest |
ccba7e8
to
71b404b
Compare
Hm, not sure whether the mpi4py failure is truly because of something in this pr
|
|
||
/* Create the events since they can be reused. */ | ||
for (i = 0; i < accelerator_event_max; i++) { | ||
rc = opal_accelerator.create_event(MCA_ACCELERATOR_NO_DEVICE_ID, &accelerator_event_ipc_array[i], opal_accelerator_use_sync_memops ? false : true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm slightly confused here about what this PR really does. You mentioned it improves the handling of the process GPU/stream, but I see here that the events are created without a device association (MCA_ACCELERATOR_NO_DEVICE_ID) as if they were expected to be generic (which is not the case, at least not with CUDA)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moreover, the CUDA accelerator simply ignores the device id, which either suggest that we don't need it or that the CUDA accelerator framework is careless on the handling of the devices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So its a tricky question: neither hipEventCreateWithFlags()
nor cuEventCreate()
take a device_id as an argument, the device_id argument to the API was added for ze devices. However, it seems like the device_id does play internally a role. Specifically, in some ROCm releases, if the code did something like:
hipSetDevice(device_id)
MPI_Init()
everything worked, since the events were created using the device that was used subsequently. If the sequence was however the other way around (i.e. the code set the device using hipSetDevice()
after MPI_Init()
) we had the events (and streams) being created with the default device (0), but afterwards we tried to use it for operations on a different device, and that caused an error. The delayed init is fixing this.
I agree however that I can make this a bit cleaner, by first retrieving the current device id , and using that id as an argument in the event_create and stream_create functions. Would that be ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that would be awesome, and will make the code much more readable.
71b404b
to
4f83e45
Compare
@bosilca thank you for your review, I tried address your comments and updated the code accordingly, please let me know whether this looks ok now. Thanks again! |
introduce two new mca parameterse to the smcuda component: - allow for delayed initialization of the internal ipc stream and the array of events. This allows to handle situations where the user code did not set the device before MPI_Init AND the internal stream and/or event structures have some dependence on the device id used during creation. - add a parameter to control how many events are created during initialization. Signed-off-by: Edgar Gabriel <[email protected]>
4f83e45
to
835eef5
Compare
introduce two new mca parameterse to the smcuda component: