Add specific communicator for neighborhood communication #1588

MarcelKoch · 2024-04-04T10:40:53Z

This PR adds a communicator that only handles neighborhood all-to-all communication. It implements the new interface collective_communicator, which provides different implementations for a selected set of collective mpi routines. Currently, this only includes the non-blocking all-to-all.

The communication uses a fixed pattern, i.e. the send/recv sizes are fixed when the neighborhood communicator is constructed. I would have liked to decouple that, but this would require some knowledge of how the sizes are stored at the interface level. If someone has an idea for that, please let me know.

This is the first part of splitting up #1546.

The neighborhood all-to-all has a bug in OpenMPI < v4.1.0, which makes it necessary to disable the neighborhood communicator in this case. As replacement, there is also a dense all-to-all communicator.

Todo:

documentation

PR Stack:

MarcelKoch · 2024-07-18T14:34:37Z

So right now, I'm leaning to just deactivate the neighborhood communicator if the openmpi version isn't sufficient. The CollectiveCommunicator interface is meant to provide different implementations of the all-to-all call, so using a dense all-to-all here feels odd. I think I would rather have a derived class that only does the dense all-to-all and use that instead.

pratikvn · 2024-07-18T15:02:46Z

Yes, but the fix is missing in older versions of 4.0.x as well. I am not entirely sure which exact version of OpenMPI our container uses, but it is 4.0.x, and 4.0.7 is still missing the fix. I think the fix was added only in 4.1.x.

But in general, I agree. I think we can just disable the neighborhood communicator for older versions of OpenMPI.

pratikvn

Mostly looks good. Some missing tests and doc updates needed

core/test/mpi/distributed/dense_communicator.cpp

core/distributed/dense_communicator.cpp

include/ginkgo/core/distributed/dense_communicator.hpp

core/distributed/neighborhood_communicator.cpp

core/test/mpi/distributed/dense_communicator.cpp

include/ginkgo/core/distributed/dense_communicator.hpp

include/ginkgo/core/distributed/neighborhood_communicator.hpp

Co-authored-by: Pratik Nayak <[email protected]>

- fix include guards - update docs - implement copy/move constructors/assignment with tests - add equality test for collective communicators (needed for testing) - always enable neighborhood comm, just throw if openmpi is too old Co-authored-by: Pratik Nayak <[email protected]>

upsj

First pass, mostly interface and implementation

What is the moved-from state of a Communicator? Should it match that of an MPI communicator wrapper, and be MPI_NULL, or preserve the MPI communicator?

upsj · 2024-10-18T12:53:15Z

include/ginkgo/core/distributed/collective_communicator.hpp

+     * @return  a request handle
+     */
+    template <typename SendType, typename RecvType>
+    request i_all_to_all_v(std::shared_ptr<const Executor> exec,


this should probably also be [[nodiscard]]?

upsj · 2024-10-18T12:53:48Z

include/ginkgo/core/distributed/collective_communicator.hpp

+    virtual request i_all_to_all_v(std::shared_ptr<const Executor> exec,
+                                   const void* send_buffer,
+                                   MPI_Datatype send_type, void* recv_buffer,
+                                   MPI_Datatype recv_type) const = 0;


Can we make this an internal implementation detail like we do with LinOp::apply_impl? That will make future generic logging of these events easier.

upsj · 2024-10-18T12:57:45Z

core/distributed/dense_communicator.cpp

+        std::fill(other.send_sizes_.begin(), other.send_sizes_.end(), 0);
+        std::fill(other.send_offsets_.begin(), other.send_offsets_.end(), 0);
+        std::fill(other.recv_sizes_.begin(), other.recv_sizes_.end(), 0);
+        std::fill(other.recv_offsets_.begin(), other.recv_offsets_.end(), 0);


Is move-assigning the base supposed to preserve the communicator? Otherwise we should probably resize them to 0

upsj · 2024-10-18T12:59:53Z

core/distributed/dense_communicator.cpp

+           std::equal(a.send_sizes_.begin(), a.send_sizes_.end(),
+                      b.send_sizes_.begin()) &&
+           std::equal(a.recv_sizes_.begin(), a.recv_sizes_.end(),
+                      b.recv_sizes_.begin()) &&
+           std::equal(a.send_offsets_.begin(), a.send_offsets_.end(),
+                      b.send_offsets_.begin()) &&
+           std::equal(a.recv_offsets_.begin(), a.recv_offsets_.end(),
+                      b.recv_offsets_.begin());


is std::vector::operator== sufficient here? Also do you need this somewhere beyond tests?

upsj · 2024-10-18T13:03:08Z

core/distributed/neighborhood_communicator.cpp

+           std::equal(a.send_sizes_.begin(), a.send_sizes_.end(),
+                      b.send_sizes_.begin()) &&
+           std::equal(a.recv_sizes_.begin(), a.recv_sizes_.end(),
+                      b.recv_sizes_.begin()) &&
+           std::equal(a.send_offsets_.begin(), a.send_offsets_.end(),
+                      b.send_offsets_.begin()) &&
+           std::equal(a.recv_offsets_.begin(), a.recv_offsets_.end(),
+                      b.recv_offsets_.begin());


operator==?

MarcelKoch self-assigned this Apr 4, 2024

ginkgo-bot added reg:build This is related to the build system. reg:testing This is related to testing. mod:core This is related to the core module. labels Apr 4, 2024

MarcelKoch requested a review from pratikvn April 4, 2024 10:41

MarcelKoch force-pushed the neighborhood-communicator branch from 6acf7c4 to 8aa6ab9 Compare April 4, 2024 11:00

MarcelKoch force-pushed the read-distributed-with-index-map branch from b42ab92 to 8f104fd Compare April 4, 2024 11:00

MarcelKoch modified the milestone: Ginkgo 1.8.0 Apr 5, 2024

MarcelKoch force-pushed the neighborhood-communicator branch from 8aa6ab9 to 77398bd Compare April 17, 2024 16:28

MarcelKoch requested a review from upsj April 19, 2024 09:19

MarcelKoch mentioned this pull request Apr 19, 2024

Adds sparse communicator class #1546

Closed

7 tasks

MarcelKoch force-pushed the neighborhood-communicator branch from 77398bd to d278cad Compare April 19, 2024 14:39

MarcelKoch force-pushed the read-distributed-with-index-map branch from 8f104fd to a0824a8 Compare April 19, 2024 14:39

This was referenced Apr 22, 2024

Add segmented array type #1545

Merged

Distributed Index Map #1543

Merged

Adds Index map device kernels #1579

Merged

Use index_map in distributed::matrix #1544

Merged

MarcelKoch force-pushed the neighborhood-communicator branch from d278cad to d6112ef Compare April 22, 2024 11:11

MarcelKoch force-pushed the read-distributed-with-index-map branch from a0824a8 to 8ad3f2f Compare April 22, 2024 11:11

MarcelKoch force-pushed the neighborhood-communicator branch from d6112ef to 1582673 Compare April 25, 2024 07:16

MarcelKoch mentioned this pull request Apr 30, 2024

Adds distributed row gatherer #1589

Open

6 tasks

MarcelKoch force-pushed the neighborhood-communicator branch from 1582673 to db9b48a Compare April 30, 2024 13:41

MarcelKoch force-pushed the read-distributed-with-index-map branch from 8ad3f2f to 26678b3 Compare April 30, 2024 13:41

MarcelKoch force-pushed the neighborhood-communicator branch from db9b48a to 72eafff Compare April 30, 2024 15:20

MarcelKoch force-pushed the read-distributed-with-index-map branch from 26678b3 to 006d67d Compare April 30, 2024 15:20

MarcelKoch force-pushed the neighborhood-communicator branch from 72eafff to 3c70106 Compare May 2, 2024 10:04

MarcelKoch force-pushed the read-distributed-with-index-map branch from 006d67d to b295b11 Compare May 2, 2024 10:04

MarcelKoch force-pushed the neighborhood-communicator branch from 3c70106 to a1567b8 Compare May 2, 2024 10:06

MarcelKoch force-pushed the index-map-pgm branch from ded4dd3 to 9e52a2c Compare July 18, 2024 16:39

MarcelKoch force-pushed the neighborhood-communicator branch from 8e3e932 to 40cd2c0 Compare July 18, 2024 16:39

MarcelKoch force-pushed the index-map-pgm branch from 9e52a2c to ded4dd3 Compare July 18, 2024 16:40

MarcelKoch force-pushed the neighborhood-communicator branch from 40cd2c0 to fe864bb Compare July 18, 2024 17:00

MarcelKoch requested a review from pratikvn July 18, 2024 17:11

MarcelKoch force-pushed the neighborhood-communicator branch from fe864bb to db7f6ed Compare July 19, 2024 08:18

pratikvn requested changes Jul 23, 2024

View reviewed changes

core/test/mpi/distributed/dense_communicator.cpp Show resolved Hide resolved

core/distributed/dense_communicator.cpp Show resolved Hide resolved

include/ginkgo/core/distributed/dense_communicator.hpp Outdated Show resolved Hide resolved

MarcelKoch force-pushed the index-map-pgm branch from ded4dd3 to 3ad5eee Compare August 9, 2024 13:40

MarcelKoch force-pushed the neighborhood-communicator branch from db7f6ed to 0ad4ee8 Compare August 9, 2024 13:40

MarcelKoch requested a review from pratikvn August 9, 2024 13:41

pratikvn reviewed Aug 16, 2024

View reviewed changes

MarcelKoch force-pushed the neighborhood-communicator branch from 0ad4ee8 to 1f49b91 Compare August 16, 2024 15:21

pratikvn approved these changes Aug 16, 2024

View reviewed changes

MarcelKoch requested review from upsj and removed request for upsj August 27, 2024 12:05

MarcelKoch and others added 10 commits October 7, 2024 13:05

Make comm_index_type avaiable in the mpi namespace

56f005b

create owning communicator

bf88f67

[imap] create variant type of supported index maps

13b61d3

[coll-comm] adds interface for collective communicator

87bec08

Co-authored-by: Pratik Nayak <[email protected]>

[coll-comm] add dense communication

f55aeb5

[coll-comm] adds neighborhood implementation of collective communicator

1f897aa

Co-authored-by: Pratik Nayak <[email protected]>

[coll-comm] disable neighbor comm if open-mpi version < 4.1.0

699c24e

[mpi] define moved-from state for communicator

dc808a4

[mpi] add explicit comparison for identical and congruent

5c22649

MarcelKoch force-pushed the neighborhood-communicator branch from 1f49b91 to 4db050c Compare October 7, 2024 13:06

MarcelKoch force-pushed the index-map-pgm branch from 3ad5eee to 6395054 Compare October 7, 2024 13:06

upsj reviewed Oct 18, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add specific communicator for neighborhood communication #1588

Add specific communicator for neighborhood communication #1588

MarcelKoch commented Apr 4, 2024 •

edited

Loading

MarcelKoch commented Jul 18, 2024

pratikvn commented Jul 18, 2024

pratikvn left a comment

upsj left a comment

upsj Oct 18, 2024

upsj Oct 18, 2024

upsj Oct 18, 2024

upsj Oct 18, 2024

upsj Oct 18, 2024

Add specific communicator for neighborhood communication #1588

Are you sure you want to change the base?

Add specific communicator for neighborhood communication #1588

Conversation

MarcelKoch commented Apr 4, 2024 • edited Loading

MarcelKoch commented Jul 18, 2024

pratikvn commented Jul 18, 2024

pratikvn left a comment

Choose a reason for hiding this comment

upsj left a comment

Choose a reason for hiding this comment

upsj Oct 18, 2024

Choose a reason for hiding this comment

upsj Oct 18, 2024

Choose a reason for hiding this comment

upsj Oct 18, 2024

Choose a reason for hiding this comment

upsj Oct 18, 2024

Choose a reason for hiding this comment

upsj Oct 18, 2024

Choose a reason for hiding this comment

MarcelKoch commented Apr 4, 2024 •

edited

Loading