Issues for using MPI in GUI interactive mode #182

rcoreilly · 2023-03-26T07:39:01Z

mpi depends entirely on each proc executing the same sequence of AllGather etc calls at the same times. If any node doesn't, everything just waits and then probably timeouts with an error.. When running a fixed -nogui run, there is no problem here.

But when running interactively, each node needs to get the user's commands to start, stop, step, Init, etc, so they can all stay sync'd. Thus, we need an additional outer-loop of communication where the proc > 0 nodes wait for commands and then run them, all the while checking to see if a stop command has come in.

Probably this should be done using something other than mpi, because it needs to be non-blocking and more dynamic. Someone with appropriate network communication knowledge should probably take this on..

The text was updated successfully, but these errors were encountered:

siboehm · 2023-03-27T12:39:00Z

I wouldn't use a different protocol, mostly because if we just MPI for everything we only have to do the MPI_World setup once. With a different protocol it'll get complicated once we have cross-machine MPI with ssh setups etc.

Can't we just put a MPI.BCast from the root node (where the GUI runs) to all other procs into the GUI loop, that tells the other procs about current user input (start, stop etc)? It should really be blocking, else you'll run into the same issues with timed-out AllReduces. Using blocking will add a ~10μs of latency, which will be fast enough to not be noticeable.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issues for using MPI in GUI interactive mode #182

Issues for using MPI in GUI interactive mode #182

rcoreilly commented Mar 26, 2023

siboehm commented Mar 27, 2023 •

edited

Loading

Issues for using MPI in GUI interactive mode #182

Issues for using MPI in GUI interactive mode #182

Comments

rcoreilly commented Mar 26, 2023

siboehm commented Mar 27, 2023 • edited Loading

siboehm commented Mar 27, 2023 •

edited

Loading