Additive read distributed #1650

fritzgoebel · 2024-07-16T12:18:58Z

This PR adds an option to communicate overlap in the distributed matrix' read_distributed. If the option is used, nonzero entries present in multiple ranks are added up on the owning rank rather than thrown away in read_distributed.

This can be useful e.g. if in a domain decomposed finite element setting, each rank assembles their local contribution to a global system matrix and when assembling the global system matrix information on the subdomain boundaries has to be exchanged.

TODO:

Implement reference kernel tests
Implement device kernels
Implement device kernel tests

MarcelKoch

Mostly smaller stuff. Only the communication change would be significant, if that would be done.

MarcelKoch · 2024-08-08T12:00:49Z

include/ginkgo/core/distributed/matrix.hpp

+ * - local_only does not communicate any overlap but ignores all non-local
+ *   indices.
+ */
+enum class assembly { communicate, local_only };


nit:

Suggested change

enum class assembly { communicate, local_only };

enum class assembly_mode { communicate, local_only };

MarcelKoch · 2024-08-08T12:14:05Z

core/distributed/matrix.cpp

+        exec->copy_from(exec, n_recv, overlap_recv_values.get_data(),
+                        all_values.get_data() + num_entries);
+        all_data = device_matrix_data<value_type, global_index_type>{
+            exec, global_dim, all_row_idxs, all_col_idxs, all_values};


Suggested change

exec, global_dim, all_row_idxs, all_col_idxs, all_values};

exec, global_dim, std::move(all_row_idxs), std::move(all_col_idxs), std::move(all_values)};

otherwise we would have copies.

MarcelKoch · 2024-08-08T12:29:01Z

core/distributed/matrix.cpp

+        array<global_index_type> overlap_send_row_idxs{exec, n_send};
+        array<global_index_type> overlap_send_col_idxs{exec, n_send};
+        array<value_type> overlap_send_values{exec, n_send};
+        array<global_index_type> overlap_recv_row_idxs{exec, n_recv};
+        array<global_index_type> overlap_recv_col_idxs{exec, n_recv};
+        array<value_type> overlap_recv_values{exec, n_recv};


Suggested change

array<global_index_type> overlap_send_row_idxs{exec, n_send};

array<global_index_type> overlap_send_col_idxs{exec, n_send};

array<value_type> overlap_send_values{exec, n_send};

array<global_index_type> overlap_recv_row_idxs{exec, n_recv};

array<global_index_type> overlap_recv_col_idxs{exec, n_recv};

array<value_type> overlap_recv_values{exec, n_recv};

device_matrix_data<value_type, global_index_type> overlap_send_data{exec, n_send};

device_matrix_data<value_type, global_index_type> overlap_recv_data{exec, n_send};

A bit less repetitive.

I think this would mean that we would have to add to add a set_executor to device_matrix_data for the use_host_buffer check, and having the separate arrays is consistent with the arrays for local_row_idxs, local)col_idxs and so on just below. I would prefer leaving these arrays.

MarcelKoch · 2024-08-08T13:54:38Z

core/distributed/matrix.cpp

+            overlap_recv_values.set_executor(exec);
+        }
+
+        array<global_index_type> all_row_idxs{exec, num_entries + n_recv};


num_entries + n_recv means that the full device_matrix_data also contains the elements that were sent to other processes right? But I realized, that we can't modify the input matrix data, so we have to work with it anyway.

Yes, but they get ignored when the local and non_local matrices are created, just as before.

MarcelKoch · 2024-08-08T13:56:49Z

core/distributed/matrix.cpp

+    if (assembly_type == assembly::communicate) {
+        size_type num_entries = data.get_num_stored_elements();
+        size_type num_parts = comm.size();
+        array<comm_index_type> overlap_count{exec, num_parts};


nit:

Suggested change

array<comm_index_type> overlap_count{exec, num_parts};

array<comm_index_type> send_sizes{exec, num_parts};

IMO using overlap here is not ideal, since that implies a bidirectional communication, but we don't require that. Instead I would just use send and recv respectively.

MarcelKoch · 2024-08-08T14:21:07Z