-
Notifications
You must be signed in to change notification settings - Fork 877
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mtl/ofi: avoid accessing request object after completion callback(restart ci) #12175
Conversation
Testing in AWS internal CI. |
f07faa6
to
f5119b0
Compare
ompi/mca/mtl/ofi/mtl_ofi.h
Outdated
@@ -154,9 +154,9 @@ ompi_mtl_ofi_context_progress(int ctxt_id) | |||
ret = ofi_req->event_callback(&ompi_mtl_ofi_wc[i], ofi_req); | |||
if (OMPI_SUCCESS != ret) { | |||
opal_output(0, | |||
"%s:%d: Error returned by request (type: %d) event callback: %zd.\n" | |||
"%s:%d: Error returned by request event callback: %zd.\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you cache the req type before invoking the event_callback and use that in the error message? might help a little with debugging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah that's smart. Let me do that...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated. Running our CI again...
27a9516
to
c90f848
Compare
nvidia ci vomitted...
|
rerunning it doesn't seem to help. |
It appears that the backend server is not available. I tried to restart the ci for #12182 |
Request completion callback function can potentially invalidate the request object. We should avoid accessing the object afterwards. Signed-off-by: Wenduo Wang <[email protected]>
c90f848
to
6d79aae
Compare
Finally worked... Our internal CI also passed. Merging... |
The completion callback can potentially invalidate the request object, so it is not safe to access the object afterwards.