-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Additive read distributed #1650
base: develop
Are you sure you want to change the base?
Conversation
71c1a41
to
7ab6bf6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly smaller stuff. Only the communication change would be significant, if that would be done.
* - local_only does not communicate any overlap but ignores all non-local | ||
* indices. | ||
*/ | ||
enum class assembly { communicate, local_only }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
enum class assembly { communicate, local_only }; | |
enum class assembly_mode { communicate, local_only }; |
core/distributed/matrix.cpp
Outdated
exec->copy_from(exec, n_recv, overlap_recv_values.get_data(), | ||
all_values.get_data() + num_entries); | ||
all_data = device_matrix_data<value_type, global_index_type>{ | ||
exec, global_dim, all_row_idxs, all_col_idxs, all_values}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exec, global_dim, all_row_idxs, all_col_idxs, all_values}; | |
exec, global_dim, std::move(all_row_idxs), std::move(all_col_idxs), std::move(all_values)}; |
otherwise we would have copies.
core/distributed/matrix.cpp
Outdated
array<global_index_type> overlap_send_row_idxs{exec, n_send}; | ||
array<global_index_type> overlap_send_col_idxs{exec, n_send}; | ||
array<value_type> overlap_send_values{exec, n_send}; | ||
array<global_index_type> overlap_recv_row_idxs{exec, n_recv}; | ||
array<global_index_type> overlap_recv_col_idxs{exec, n_recv}; | ||
array<value_type> overlap_recv_values{exec, n_recv}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
array<global_index_type> overlap_send_row_idxs{exec, n_send}; | |
array<global_index_type> overlap_send_col_idxs{exec, n_send}; | |
array<value_type> overlap_send_values{exec, n_send}; | |
array<global_index_type> overlap_recv_row_idxs{exec, n_recv}; | |
array<global_index_type> overlap_recv_col_idxs{exec, n_recv}; | |
array<value_type> overlap_recv_values{exec, n_recv}; | |
device_matrix_data<value_type, global_index_type> overlap_send_data{exec, n_send}; | |
device_matrix_data<value_type, global_index_type> overlap_recv_data{exec, n_send}; |
A bit less repetitive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would mean that we would have to add to add a set_executor
to device_matrix_data
for the use_host_buffer
check, and having the separate arrays is consistent with the arrays for local_row_idxs
, local)col_idxs
and so on just below. I would prefer leaving these arrays.
overlap_recv_values.set_executor(exec); | ||
} | ||
|
||
array<global_index_type> all_row_idxs{exec, num_entries + n_recv}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
num_entries + n_recv
means that the full device_matrix_data also contains the elements that were sent to other processes right? But I realized, that we can't modify the input matrix data, so we have to work with it anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but they get ignored when the local
and non_local
matrices are created, just as before.
core/distributed/matrix.cpp
Outdated
if (assembly_type == assembly::communicate) { | ||
size_type num_entries = data.get_num_stored_elements(); | ||
size_type num_parts = comm.size(); | ||
array<comm_index_type> overlap_count{exec, num_parts}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
array<comm_index_type> overlap_count{exec, num_parts}; | |
array<comm_index_type> send_sizes{exec, num_parts}; |
IMO using overlap
here is not ideal, since that implies a bidirectional communication, but we don't require that. Instead I would just use send
and recv
respectively.
gko::kernels::reference::distributed_matrix::count_overlap_entries( | ||
this->ref, input, partition.get(), i, overlap_count, | ||
overlap_positions, original_positions); | ||
GKO_ASSERT_ARRAY_EQ(overlap_count, overlap_count_ref[i]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
GKO_ASSERT_ARRAY_EQ(overlap_count, overlap_count_ref[i]); | |
GKO_ASSERT_ARRAY_EQ(overlap_count, overlap_count_ref[i]); |
{ | ||
using lit = typename TestFixture::local_index_type; | ||
using git = typename TestFixture::global_index_type; | ||
using vt = typename TestFixture::value_type; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using vt = typename TestFixture::value_type; |
using lit = typename TestFixture::local_index_type; | ||
using git = typename TestFixture::global_index_type; | ||
using vt = typename TestFixture::value_type; | ||
using ca = gko::array<comm_index_type>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using ca = gko::array<comm_index_type>; |
this->ref, input, partition.get(), i, overlap_positions[i], | ||
original_positions[i], overlap_row_idxs, overlap_col_idxs, | ||
overlap_values); | ||
GKO_ASSERT_ARRAY_EQ(overlap_row_idxs, overlap_row_idxs_ref[i]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GKO_ASSERT_ARRAY_EQ(overlap_row_idxs, overlap_row_idxs_ref[i]); | |
GKO_ASSERT_ARRAY_EQ(overlap_row_idxs, overlap_row_idxs_ref[i]); |
overlap_row_idxs.resize_and_reset(num_entries); | ||
overlap_col_idxs.resize_and_reset(num_entries); | ||
overlap_values.resize_and_reset(num_entries); | ||
gko::kernels::reference::distributed_matrix::fill_overlap_send_buffers( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gko::kernels::reference::distributed_matrix::fill_overlap_send_buffers( | |
gko::kernels::reference::distributed_matrix::fill_overlap_send_buffers( |
7ab6bf6
to
69d8d86
Compare
This PR adds an option to communicate overlap in the distributed matrix'
read_distributed
. If the option is used, nonzero entries present in multiple ranks are added up on the owning rank rather than thrown away inread_distributed
.This can be useful e.g. if in a domain decomposed finite element setting, each rank assembles their local contribution to a global system matrix and when assembling the global system matrix information on the subdomain boundaries has to be exchanged.
TODO: