-
Notifications
You must be signed in to change notification settings - Fork 382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
prov/efa: Use READ NACK protocol if P2P is not available for RDMA read #10363
Conversation
Can you also update https://github.com/ofiwg/libfabric/blob/main/prov/efa/docs/efa_rdm_protocol_v4.md to explain this feature as part of READ NACK protocol? |
The emulated long-read write subprotocol uses RDMA read to emulate a write operation. When p2p is not available, sender should fall back to emulated long CTS write protocol. Also change the log level to info. Signed-off-by: Jessie Yang <[email protected]>
6cf77aa
to
e1034d4
Compare
I prefer to change it to |
if (err) { | ||
if (err == -FI_ENOMR) { | ||
if (efa_rdm_peer_support_read_nack(peer)) | ||
/* Only set the flag here. The NACK |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where did runtread post read nack pkt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See line 919
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, we are only posting read nack pkt for runt read when we get all pkts that were runt. So we cannot post read nack here. This does make the refactor a bit less useful...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Emulated long-read write did not handle MR exhaustion. The refactor will check for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems we don't need efa_rdm_rxe_map_insert for runt read either.
We don't need efa_rdm_rxe_map_insert
for runt read because it is a multi-req protocol and we already inserted it earlier.
Emulated long-read write did not handle MR exhaustion. The refactor will check for that.
Yeah, I agree. I was just thinking how to make efa_rdm_pke_post_remote_read_or_nack
cover the nack pkt post for all pkt types. It seems we cannot do that for runting read
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am ok with documenting in efa_rdm_pke_post_remote_read_or_nack
that why we don't need map insert and nack post for runting read.
73066f7
to
dbad1ac
Compare
if (OFI_UNLIKELY(err)) { | ||
EFA_WARN(FI_LOG_CQ, | ||
"RDMA post read or queue failed.\n"); | ||
efa_base_ep_write_eq_error(&ep->base_ep, err, FI_EFA_ERR_RDMA_READ_POST); | ||
efa_rdm_rxe_release(rxe); | ||
efa_rdm_pke_release_rx(pkt_entry); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we know why we were releasing pkt entry here again since we already released in L562
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks like a double free to me so I remove it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, seems a bug
prov/efa/src/rdm/efa_rdm_pke_utils.h
Outdated
int p2p_avail; | ||
|
||
pkt_type = efa_rdm_pke_get_base_hdr(pkt_entry)->type; | ||
p2p_avail = efa_rdm_ep_use_p2p(ep, rxe->desc[0]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it safe? efa_rdm_ep_use_p2p can return -FI_ENOSYS right?
prov/efa/src/rdm/efa_rdm_pke_utils.h
Outdated
/* Only set the flag for runting read. The NACK | ||
* packet is sent after all runting read | ||
* RTM packets have been received */ | ||
if (!efa_rdm_pkt_type_is_runtread(pkt_type)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
efa_rdm_rxe_map_insert
is a protocol implementation detail, and only works for pkt type that has msg_id. So we shouldn't check the op code here, but check the pkt type. Since msg_id is only present in rtm pkt type, it should be something like efa_rdm_pkt_type_is_rtm
For cleaner logic, can we also refactor the code block to
if (efa_rdm_pkt_type_is_runtread(pkt_type)) /* do nothing for runting read */
return 0;
if (efa_rdm_pkt_type_is_rtm(pkt_type))
efa_rdm_rxe_map_insert(&ep->rxe_map, pkt_entry, rxe);
err = efa_rdm_ope_post_send_or_queue(rxe, EFA_RDM_READ_NACK_PKT);
cf6533c
to
a66773e
Compare
prov/efa/src/rdm/efa_rdm_pke_utils.h
Outdated
efa_rdm_pke_post_remote_read_or_nack(struct efa_rdm_ep *ep, | ||
struct efa_rdm_pke *pkt_entry, | ||
struct efa_rdm_ope *rxe) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Aligning with spaces (don't care personally). A few other places as well.
prov/efa/docs/efa_rdm_protocol_v4.md
Outdated
when the receiver is unable to register a memory region for the RDMA read operation. | ||
Failure to register the memory region is typically because of a hardware limitation. | ||
when the receiver is unable to register a memory region for the RDMA read operation, | ||
typically because of a hardware limitation, or when P2P support is unavailable for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"p2p support is unavailable" is also a hardware restriction
RDMA read requires p2p. Use READ NACK protocol when p2p is not available. Also change the log level to info. Signed-off-by: Jessie Yang <[email protected]>
Signed-off-by: Jessie Yang <[email protected]>
bot:aws:retest |
p2p is required for RDMA read because the user buffer has to be registered to efa device.
When p2p is not available, receiver should send a NACK packet to make sender fall back to long CTS protocol to avoid RDMA read.