You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a stress test where many pyxis containers are ran in sequence on a node with the same container filesystem (--container-name).
Pyxis calls enroot list -f to check if a container exists and if it's already running.
But it seems that enroot list -f can fail with return code 1 if a running container exits during the invocation of enroot list -f.
I have a stress test where many pyxis containers are ran in sequence on a node with the same container filesystem (
--container-name
).Pyxis calls
enroot list -f
to check if a container exists and if it's already running.But it seems that
enroot list -f
can fail with return code1
if a running container exits during the invocation ofenroot list -f
.It seems to be caused by the
ps -p
command:enroot/src/runtime.sh
Line 535 in 2f0c49e
With the container PID exiting between the call to
lsns
and the call tops -p
.In rare occasions, I also saw a return code of
2
and it's likely caused by a similar race between the lsns and this call:enroot/src/runtime.sh
Line 518 in 2f0c49e
Example where it happens (might need a few attempts and/or tweaking the sleep duration):
The text was updated successfully, but these errors were encountered: