-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add specific communicator for neighborhood communication #1588
base: index-map-pgm
Are you sure you want to change the base?
Conversation
6acf7c4
to
8aa6ab9
Compare
b42ab92
to
8f104fd
Compare
8aa6ab9
to
77398bd
Compare
77398bd
to
d278cad
Compare
8f104fd
to
a0824a8
Compare
d278cad
to
d6112ef
Compare
a0824a8
to
8ad3f2f
Compare
d6112ef
to
1582673
Compare
1582673
to
db9b48a
Compare
8ad3f2f
to
26678b3
Compare
db9b48a
to
72eafff
Compare
26678b3
to
006d67d
Compare
72eafff
to
3c70106
Compare
006d67d
to
b295b11
Compare
3c70106
to
a1567b8
Compare
So right now, I'm leaning to just deactivate the neighborhood communicator if the openmpi version isn't sufficient. The |
Yes, but the fix is missing in older versions of 4.0.x as well. I am not entirely sure which exact version of OpenMPI our container uses, but it is 4.0.x, and 4.0.7 is still missing the fix. I think the fix was added only in 4.1.x. But in general, I agree. I think we can just disable the neighborhood communicator for older versions of OpenMPI. |
ded4dd3
to
9e52a2c
Compare
8e3e932
to
40cd2c0
Compare
9e52a2c
to
ded4dd3
Compare
40cd2c0
to
fe864bb
Compare
fe864bb
to
db7f6ed
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly looks good. Some missing tests and doc updates needed
ded4dd3
to
3ad5eee
Compare
db7f6ed
to
0ad4ee8
Compare
0ad4ee8
to
1f49b91
Compare
Co-authored-by: Pratik Nayak <[email protected]>
Co-authored-by: Pratik Nayak <[email protected]>
- fix include guards - update docs - implement copy/move constructors/assignment with tests - add equality test for collective communicators (needed for testing) - always enable neighborhood comm, just throw if openmpi is too old Co-authored-by: Pratik Nayak <[email protected]>
1f49b91
to
4db050c
Compare
3ad5eee
to
6395054
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First pass, mostly interface and implementation
- What is the moved-from state of a Communicator? Should it match that of an MPI communicator wrapper, and be MPI_NULL, or preserve the MPI communicator?
* @return a request handle | ||
*/ | ||
template <typename SendType, typename RecvType> | ||
request i_all_to_all_v(std::shared_ptr<const Executor> exec, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should probably also be [[nodiscard]]
?
virtual request i_all_to_all_v(std::shared_ptr<const Executor> exec, | ||
const void* send_buffer, | ||
MPI_Datatype send_type, void* recv_buffer, | ||
MPI_Datatype recv_type) const = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make this an internal implementation detail like we do with LinOp::apply_impl
? That will make future generic logging of these events easier.
std::fill(other.send_sizes_.begin(), other.send_sizes_.end(), 0); | ||
std::fill(other.send_offsets_.begin(), other.send_offsets_.end(), 0); | ||
std::fill(other.recv_sizes_.begin(), other.recv_sizes_.end(), 0); | ||
std::fill(other.recv_offsets_.begin(), other.recv_offsets_.end(), 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is move-assigning the base supposed to preserve the communicator? Otherwise we should probably resize them to 0
std::equal(a.send_sizes_.begin(), a.send_sizes_.end(), | ||
b.send_sizes_.begin()) && | ||
std::equal(a.recv_sizes_.begin(), a.recv_sizes_.end(), | ||
b.recv_sizes_.begin()) && | ||
std::equal(a.send_offsets_.begin(), a.send_offsets_.end(), | ||
b.send_offsets_.begin()) && | ||
std::equal(a.recv_offsets_.begin(), a.recv_offsets_.end(), | ||
b.recv_offsets_.begin()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is std::vector::operator==
sufficient here? Also do you need this somewhere beyond tests?
std::equal(a.send_sizes_.begin(), a.send_sizes_.end(), | ||
b.send_sizes_.begin()) && | ||
std::equal(a.recv_sizes_.begin(), a.recv_sizes_.end(), | ||
b.recv_sizes_.begin()) && | ||
std::equal(a.send_offsets_.begin(), a.send_offsets_.end(), | ||
b.send_offsets_.begin()) && | ||
std::equal(a.recv_offsets_.begin(), a.recv_offsets_.end(), | ||
b.recv_offsets_.begin()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
operator==
?
This PR adds a communicator that only handles neighborhood all-to-all communication. It implements the new interface
collective_communicator
, which provides different implementations for a selected set of collective mpi routines. Currently, this only includes the non-blocking all-to-all.The communication uses a fixed pattern, i.e. the send/recv sizes are fixed when the neighborhood communicator is constructed. I would have liked to decouple that, but this would require some knowledge of how the sizes are stored at the interface level. If someone has an idea for that, please let me know.
This is the first part of splitting up #1546.
The neighborhood all-to-all has a bug in OpenMPI < v4.1.0, which makes it necessary to disable the neighborhood communicator in this case. As replacement, there is also a dense all-to-all communicator.
Todo:
PR Stack: