You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Problem: on MI300 nodes with 96 cores, running a job that requires 97 thru 191 cores without specifying a node count seems to cause the shell to assign the wrong ranks to nodes compared to the taskmap.
Problem: The shell builds its list of ranks in the order they appear
in the R_lite array of Rv1, but when ranks appear out of order with
respect to broker ranks, this breaks assumptions elsewhere (e.g. in
the taskmap code) that shell ranks are a direct index into the sorted
broker ranks idset and associated hostlist.
Since the common case will be a sorted R_lite array, detect if the
ranks are not sorted and, if so, sort the rcalc rank array by broker
rank and reassign shell ranks.
Fixesflux-framework#6582
Problem: on MI300 nodes with 96 cores, running a job that requires 97 thru 191 cores without specifying a node count seems to cause the shell to assign the wrong ranks to nodes compared to the taskmap.
Example:
Adding a
userrc
that prints the shell hostname confirms this:This does not reproduce if
-N2
is added.First noted in flux-framework/flux-coral2#222
The text was updated successfully, but these errors were encountered: