-
Notifications
You must be signed in to change notification settings - Fork 307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAOS-17109 cart: clients URI lookup of rank0 can lead to crash #15891
base: master
Are you sure you want to change the base?
Conversation
- Cart group was incorrectly leaving gp_psr_rank field to 0, causing issues when rank=0 was being looked up. Signed-off-by: Alexander A Oganezov <[email protected]>
Ticket title is 'self_test issues with tcp provider' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested and validated on wolf cluster.
@@ -1090,6 +1091,8 @@ crt_grp_priv_create(struct crt_grp_priv **grp_priv_created, | |||
|
|||
grp_priv->gp_size = 0; | |||
grp_priv->gp_refcount = 1; | |||
grp_priv->gp_psr_rank = CRT_NO_RANK; | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how does setting it to no rank fix the issue ? the title says look up of rank 0, is rank 0 set to CRT_NO_RANK ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
during URI_LOOKUP handling there is a following logic block:
if (base_addr == NULL && !crt_is_service()) { D_RWLOCK_RDLOCK(&grp_priv->gp_rwlock); if (tgt_ep->ep_rank == grp_priv->gp_psr_rank && dst_tag == 0) { D_STRNDUP(uri, grp_priv->gp_psr_uri, CRT_ADDR_STR_MAX_LEN); D_RWLOCK_UNLOCK(&grp_priv->gp_rwlock);
If rank being looked up (tgt_ep->ep_rank) matches a psr_rank then we can use psr's uri.
The problem is that we inialized before gp_psr_rank to 0, causing this block then to assume that psr rank has been set (to 0) and therefor gp_psr_uri is valid.
Setting gp_psr_rank to CRT_NO_RANK at init ensures 'gp_psr_rank' is treated properly in checks like above.
Before requesting gatekeeper:
Features:
(orTest-tag*
) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.Gatekeeper: