-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#2074: support sparse OfflineLB maps #2145
Conversation
744a987
to
09b0807
Compare
Pipelines resultsPR tests (gcc-12, ubuntu, mpich) Build for 30cc7f4 (2024-04-22 15:38:34 UTC)
PR tests (clang-9, ubuntu, mpich) Build for 61d324a (2024-06-24 11:26:07 UTC)
PR tests (gcc-10, ubuntu, openmpi, no LB) Build for 61d324a (2024-06-24 11:26:07 UTC)
PR tests (gcc-9, ubuntu, mpich, zoltan, json schema test) Build for 61d324a (2024-06-24 11:26:07 UTC)
PR tests (clang-12, ubuntu, mpich) Build for 61d324a (2024-06-24 11:26:07 UTC)
PR tests (clang-11, ubuntu, mpich) Build for 61d324a (2024-06-24 11:26:07 UTC)
PR tests (clang-13, ubuntu, mpich) Build for 61d324a (2024-06-24 11:26:07 UTC)
PR tests (clang-14, ubuntu, mpich) Build for 30cc7f4 (2024-04-22 15:38:34 UTC)
PR tests (gcc-8, ubuntu, mpich, address sanitizer) Build for 61d324a (2024-06-24 11:26:07 UTC)
PR tests (gcc-11, ubuntu, mpich, trace runtime, coverage) Build for 61d324a (2024-06-24 11:26:07 UTC)
PR tests (clang-10, ubuntu, mpich) Build for 61d324a (2024-06-24 11:26:07 UTC)
PR tests (intel icpc, ubuntu, mpich) Build for 61d324a (2024-06-24 11:26:07 UTC)
PR tests (nvidia cuda 11.2, gcc-9, ubuntu, mpich) Build for 61d324a (2024-06-24 11:26:07 UTC)
PR tests (clang-13, alpine, mpich) Build for 61d324a (2024-06-24 11:26:07 UTC)
PR tests (intel icpx, ubuntu, mpich) Build for 30cc7f4 (2024-04-22 15:38:34 UTC)
PR tests (nvidia cuda 12.2.0, gcc-9, ubuntu, mpich) Build for 30cc7f4 (2024-04-22 15:38:34 UTC)
PR tests (intel icpx, ubuntu, mpich, verbose) Build for 61d324a (2024-06-24 11:26:07 UTC)
PR tests (gcc-12, ubuntu, mpich, verbose) Build for 9b8200f (2024-06-13 09:03:59 UTC)
PR tests (clang-14, ubuntu, mpich, verbose) Build for 61d324a (2024-06-24 11:26:07 UTC)
PR tests (nvidia cuda 12.2.0, gcc-9, ubuntu, mpich, verbose) Build for 61d324a (2024-06-24 11:26:07 UTC)
PR tests (gcc-12, ubuntu, mpich, verbose, kokkos) Build for 61d324a (2024-06-24 11:26:07 UTC)
|
3fa3511
to
5b75e31
Compare
5b75e31
to
f9c5f29
Compare
void OfflineLB::runLB(TimeType) { | ||
auto const& distro = theLBDataReader()->getDistro(phase_ + 1); | ||
for (auto&& elm : distro) { | ||
auto nextPhase = theLBDataReader()->findNextPhase(phase_); | ||
auto const distro = theLBDataReader()->getDistro(nextPhase); | ||
for (auto&& elm : *distro) { | ||
migrateObjectTo(elm, theContext()->getNode()); | ||
} | ||
theLBDataReader()->clearDistro(phase_ + 1); | ||
theLBDataReader()->clearDistro(nextPhase); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If phase_ + 1
was skipped, does this new logic grab phase_ + 2
if it exists (whether explicitly described in the file or listed as identical) as a surrogate?
@lifflander Is this the desired behavior or do we want to disallow running OfflineLB when we're missing information about the phase?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes exactly, it will grab phase_ + 2
or whatever is the next existing phase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nlslatt I changed the behavior to disallow running of OfflineLB when the phase is skipped.
if (local_changed_distro[i]) { | ||
PhaseType curr = 0, next; | ||
for (;curr < num_phases_ - 1;) { | ||
next = findNextPhase(curr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to my other question, do we want to allow OfflineLB to operate over data from non-consecutive phases (where one is explicitly listed as skipped as opposed to just identical)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nlslatt I updated that to operate only on consecutive phases.
0d2eb73
to
044d7c4
Compare
ef91bad
to
8b20790
Compare
56592af
to
c9120ba
Compare
c9120ba
to
3f0a613
Compare
@nlslatt, @lifflander - I rebased this PR and fixed all conflicts. PR is ready for review. |
3f0a613
to
b3e07a4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The testing still needs work.
tests/unit/lb/test_offlinelb.cc
Outdated
"0 OfflineLB\n" | ||
"1 NoLB\n" | ||
"2 NoLB\n" | ||
"3 NoLB\n" | ||
"4 OfflineLB\n" | ||
"5 OfflineLB\n" | ||
"6 NoLB\n"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this configuration, coupled with your choice of which phases are identical to previous, means that OfflineLB
by definition has nothing to migrate (for example, phase 1 is said to be identical to phase 0, so nothing should migrate when OfflineLB
runs on phase 0). As such, this does not appear to be a good test of the sparse functionality.
Also, I think the phases that you have defined as identical to previous contradict what was loaded into the LBDataHolder
(e.g., phase 1 is marked as identical to phase 0, but the LBDataHolder
content indicates that they are not identical).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I updated the configuration for this test so the OfflineLB
will be able to do some migrations. Also I removed unnecessary phases from LBDataHolder
.
If you have more suggestions, please let me know.
8a72a9f
to
77911e8
Compare
…fflineLB is not configured properly
9b8200f
to
f9ce741
Compare
f9ce741
to
61d324a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Closes #2074
Depends on #2137Depends on #2250