Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

shell: assignment of tasks to shells contradicts taskmap on partially allocated elcap nodes #6582

Closed
garlick opened this issue Jan 27, 2025 · 0 comments · Fixed by #6584
Closed

Comments

@garlick
Copy link
Member

garlick commented Jan 27, 2025

Problem: on MI300 nodes with 96 cores, running a job that requires 97 thru 191 cores without specifying a node count seems to cause the shell to assign the wrong ranks to nodes compared to the taskmap.

Example:

[garlick@tuolumne1005:~]$ flux run --label-io -n 191 hostname | grep 190:
190: tuolumne1005
[garlick@tuolumne1005:~]$ flux job taskmap --to=hosts $(flux job last)
tuolumne1005: 0-95
tuolumne1006: 96-190

Adding a userrc that prints the shell hostname confirms this:

garlick@elcap1063:~]$ flux run -n192 -o mpibind=off -o spindle.level=off -o verbose -o userrc=rc.lua true 2>&1 |grep elcap
0.058s: flux-shell[0]: elcap1063
0.066s: flux-shell[1]: elcap1064
[garlick@elcap1063:~]$ flux run -n191 -o mpibind=off -o spindle.level=off -o verbose -o userrc=rc.lua true 2>&1 |grep elcap
0.066s: flux-shell[0]: elcap1064
0.058s: flux-shell[1]: elcap1063
[garlick@elcap1063:~]$ flux job taskmap --to=hosts $(flux job last)
elcap1063: 0-95
elcap1064: 96-190

This does not reproduce if -N2 is added.

[garlick@elcap1063:~]$ flux run -n97 -N2 -o mpibind=off -o spindle.level=off -o verbose -o userrc=rc.lua true 2>&1 |grep elcap
0.059s: flux-shell[0]: elcap1063
0.092s: flux-shell[1]: elcap1064
[garlick@elcap1063:~]$ flux job taskmap --to=hosts $(flux job last)
elcap1063: 0-48
elcap1064: 49-96

First noted in flux-framework/flux-coral2#222

grondo added a commit to grondo/flux-core that referenced this issue Jan 27, 2025
Problem: The shell builds its list of ranks in the order they appear
in the R_lite array of Rv1, but when ranks appear out of order with
respect to broker ranks, this breaks assumptions elsewhere (e.g. in
the taskmap code) that shell ranks are a direct index into the sorted
broker ranks idset and associated hostlist.

Since the common case will be a sorted R_lite array, detect if the
ranks are not sorted and, if so, sort the rcalc rank array by broker
rank and reassign shell ranks.

Fixes flux-framework#6582
@mergify mergify bot closed this as completed in #6584 Jan 28, 2025
@mergify mergify bot closed this as completed in f7eeac5 Jan 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant