Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow empty partitions in parallel runs #162

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open

Conversation

jrper
Copy link
Contributor

@jrper jrper commented Aug 7, 2017

I'm raising this pull request slightly early in hopes of building a consensus on what the preferred behaviour and testing method should be, as regards continuing with empty partitions after Zoltan driven adaptivity in a Fluidity run.

This changeset allows the code to continue on with an empty partition, with the previous exit messages becoming warnings visible in the log file. If a later adapt reactivates the empty partition this should happen transparently.

@tmbgreaves Could I get a buildbot branch for this, since it's liable to interact with several other features.

@tmbgreaves
Copy link
Contributor

@jrper
Copy link
Contributor Author

jrper commented Aug 22, 2017

I'm finally confident that this will go green on buildbot, while actually testing the new functionality, so I'm tagging @stephankramer for a formal review.

@stephankramer
Copy link
Contributor

Coming back to this, as this is indeed a very useful addition:

I think we may have discussed this previously - the code all looks very good, but I would like some clarification as to what the approach is here, as I can't easily work it out from the changes. It seems to introduce a new communicator MPI_COMM_NONEMPTY but it is unclear to me when this should be used, as MPI_COMM_FEMTOOLS is also still around in many places. So from a developer perspective it's unclear which one they should be using.

Taking a step back to what the original problem. So we go into zoltan, it comes back with a partition where some processes receive 0 nodes/elements. Now naively there should be nothing wrong with that, the empty process just going through the normal code path but all its loops reduced to zero, it allocating a lot of zero-length arrays and in MPI communication just contributing a zero length array, or providing 0. when we're summing up. I'm guessing, but please correct me if I'm wrong, the main issue would be where it tries to short-cut subroutines that should be collective?

Now in your approach, if I understand correctly, you create a MPI_COMM_NONEMPTY in which presumably only the nonempty processes take part. Then you have to be careful with this giving a different process numbering, which I've seen being addressed in the code indeed. My question is though: what do the empty process(es) do at this point. Presumably they'll have to sit back and do nothing, as they can't take part in MPI_COMM_NONEMPTY communication, but how does this then gel with there being lots of places where MPI_COMM_FEMTOOLS is still being used which presumably they do need to be around for?

As you'll notice there's quite a few presumablys in my comment - and I quite likely have completely the wrong end of the stick - so it would be great if you could clarify.

@jrper
Copy link
Contributor Author

jrper commented Feb 13, 2020

I'm addressing @stephankramer's comments here, so we can sort out what commenting needs to be inserted into the code:

This was done a long time ago now, but in general all your suppositions are correct, or correctish.

If one simply turns the "you have empty partitions" into a warning, then the normal symptom observed is a block at the first collective communication which needs to be done following an adapt. At first I just tried patching the scalar calls to do sensible things (e.g. have allmax return a large magnitude negative number on empty partitions), however I eventually realised that 1) many of the difficulties were in generating the logic as to whether a collective call should happen, 2) in several places the actuall FE code we implicitly assume that the node count isn't zero, leading to division by zero errors, bad allocations, deallocation and initialization and 3), this was going to involve a trawl through the entire code base to check that the first two problems were fixed everywhere.

As such, in the interests of expediency I took the mathematician's way out and reduced things to a problem we'd already solved, patching the zoltan routines to create a new MPI communicator over the non-empty processes. This is the MPI_COMM_NONEMPTY, which as far as I remember should be used for all the finite element/static mesh calls.

The preexisting MPI_COMM_FEMTOOLS (which in real life is just an alias to MPI_COMM_WORLD, but I acknowledge there are reasonable reasons not to make the assumption of calling it that) remains for the next Zoltan balancing cycle, in hopes that as the mesh adapts Zoltan will start returning a decomposition using the full computational power available to it.

I freely admit that after 18 months it's probably that I'm forgetting subtleties in the implementation, and it's possible that there are hidden bugs in the code sections I didn't fully explore. @gnikit has indicated he's will to devote effort to getting this merged, so there is man-power to cleaning things up if this summary is enough for you to formulate changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants