-
Notifications
You must be signed in to change notification settings - Fork 434
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UCT/IB/MLX5/RC: perf tuning - increase dc latency estimation when AR is enabled #10415
base: master
Are you sure you want to change the base?
Conversation
src/uct/ib/mlx5/rc/rc_mlx5_iface.c
Outdated
@@ -188,7 +188,7 @@ static ucs_status_t uct_rc_mlx5_iface_query(uct_iface_h tl_iface, uct_iface_attr | |||
uct_rc_mlx5_iface_common_query(&rc_iface->super, iface_attr, max_am_inline, | |||
UCT_RC_MLX5_TM_EAGER_ZCOPY_MAX_IOV(0)); | |||
iface_attr->cap.flags |= UCT_IFACE_FLAG_EP_CHECK; | |||
iface_attr->latency.m += 1e-9; /* 1 ns per each extra QP */ | |||
iface_attr->latency.m += 32e-11; /* 0.32 ns per each extra QP */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe make it configurable?
…sing RC, added env for it
@@ -140,6 +140,12 @@ ucs_config_field_t uct_dc_mlx5_iface_config_sub_table[] = { | |||
ucs_offsetof(uct_dc_mlx5_iface_config_t, dcis_initial_capacity), | |||
UCS_CONFIG_TYPE_UINT}, | |||
|
|||
{"FULL_HANDSHAKE_ADDED_LATENCY", "110ns", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't we double (or even 3x) the latency since there is another round trip?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general it adds 2.7us (which is ~x30) but I tried to set a number that makes UCX select DC only when it improves latency and bandwidth compared to RC (BTW setting bigger values eventually makes UCX select UD over DC unless we increase estimated UD latency)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean double the current latency estimation. Since FHS adds a round trip for connection establishment.
Does UD really have a better performance than DC FHS?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, UD Latency is much better than DC FHS, 1.62us vs 4.27us in small sizes
@@ -242,6 +248,11 @@ static ucs_status_t uct_dc_mlx5_iface_query(uct_iface_h tl_iface, uct_iface_attr | |||
sizeof(uct_dc_mlx5_iface_addr_t); | |||
iface_attr->latency.c += 60e-9; /* connect packet + cqe */ | |||
|
|||
/* Full handshake is used when AR is enabled */ | |||
if (iface->super.config.dp_ordering == UCT_IB_MLX5_DP_ORDERING_OOO_RW) { | |||
iface_attr->latency.c += ucs_time_to_sec(iface->tx.fhs_added_latency); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should do it also when user forces full handshake or using SL with adaptive routing.
Maybe we could also query if FHS is enabled in HW on a given DCI, from DevX?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't that mean creating a dummy dci?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm right, so better avoid it
looks like the content of this PR does not correpond to the title. Can you pls update it? |
What?
Decrease latency estimation for RC
Why?
Currently, DC is selected too early, for a relatively small number of eps (~60), after some research it seems DC should be selected only when reaching 180 ~ 190 eps.