-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add DistributedTree Nearest query with callback #737
base: master
Are you sure you want to change the base?
Add DistributedTree Nearest query with callback #737
Conversation
00d1ef2
to
c74aa9e
Compare
This should work now replaying the found matches on the remote MPI rank and communicating the results back. Currently, I don't treat the default callback special to validate that this behaves as expected. |
This should be ready for inital review. As said before, the strategy here is to do a regular query that get's us the MPI ranks, and primitive indices for each query, send them back to the MPI rank owning the primitives, execute the callback, and send them back to the MPI rank owning the queries. |
I will still adopt some more tests to use the callback. |
@aprokop ping |
{ | ||
auto data = ArborX::getData(predicate); | ||
auto const &point = points(primitive_index); | ||
printf("Intersection for query %d from MPI rank %d on MPI rank %d for " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to guard it with SYCL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like our saying "intersection". Commenting so we can get back to it.
test/tstDistributedTree.cpp
Outdated
using PairIndexRank = Kokkos::pair<int, int>; | ||
using PairRankIndex = Kokkos::pair<int, int>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely not. Aliasing Kokkos::pair
was (probably me) already being lazy, this does not protect us against bugs and is awfully confusing.
Any good reason to have both PairIndexRank
and PairRankIndex
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I said before, I need to work more on the tests. Here, I wanted the reverse order to make sure that the new test really executes the callback version properly. I would have gotten the same result ignoring the callback otherwise.
c47b028
to
9402639
Compare
1375f7b
to
3f66b82
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, I think the PR is in the right direction. So far, I've only looked inside src/
. I think we need to fix few things in the proposed implementation (most of my comments are superfluous and can be ignored for now):
- We need to support pure callbacks
They would use mostly the same code, except the API would not have offset and output views, and they won't need the extra round of backward communication - For non-pure callbacks, we should allow a callback to output variable number of elements, which would necessitate some version of two-pass code.
src/geometry/ArborX_HyperSphere.hpp
Outdated
struct ArborX::GeometryTraits::tag<ArborX::ExperimentalHyperGeometry::Sphere<DIM>> | ||
struct ArborX::GeometryTraits::tag< | ||
ArborX::ExperimentalHyperGeometry::Sphere<DIM>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this an intentional change? I think it should be reverted.
// Execute the callback on the process owning the primitives. | ||
OutputView remote_out( | ||
Kokkos::view_alloc(space, Kokkos::WithoutInitializing, | ||
"ArborX::DistributedTree::query::remote_out"), | ||
n_imports); | ||
KokkosExt::reallocWithoutInitializing(space, indices, n_imports); | ||
Kokkos::parallel_for( | ||
"ArborX::DistributedTree::query::execute_callbacks", | ||
Kokkos::RangePolicy<ExecutionSpace>(space, 0, | ||
imported_queries_with_indices.size()), | ||
KOKKOS_LAMBDA(int i) { | ||
callback(imported_queries_with_indices(i).query, | ||
imported_queries_with_indices(i).primitive_index, | ||
[&](typename OutputView::value_type const &value) { | ||
remote_out(i) = value; | ||
indices(i) = imported_queries_with_indices(i).query_id; | ||
}); | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems wrong for a few reasons.
First, it assumes that callback will output a single element for each match. In practice, we don't know how many elements the callback will spit out per query. It could be 0, or it could be more than 1. So we need to do a two pass here, similar to the CrsGraphWrapperImpl
. I'm not sure what is the easiest way to do it without massive code duplication.
Second, it assumes inline callback and not a post one. We could try to support post callback, but its meaning would have to be changed as we cannot execute it on all of the results coming from different ranks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also just assume that the callback only returns a single element to get started, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could. The thing is, this condition is undetectable, so the failure for a user would likely be an undefined behavior or a segfault. Which in the distributed setting would be tricky to debug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added an assertion.
typename DeviceType::memory_space memory_space; | ||
Kokkos::View<int *, DeviceType> destinations( | ||
Kokkos::view_alloc(space, Kokkos::WithoutInitializing, | ||
"ArborX::DistributedTree::query::destinations"), | ||
dest.size()); | ||
Kokkos::deep_copy(space, destinations, host_destinations); | ||
Kokkos::View<int *, DeviceType> offsets( | ||
Kokkos::view_alloc(space, Kokkos::WithoutInitializing, | ||
"ArborX::DistributedTree::query::offsets"), | ||
off.size()); | ||
Kokkos::deep_copy(space, offsets, host_offsets); | ||
auto const n_imports_back = | ||
back_distributor.createFromSends(space, destinations, offsets); | ||
KokkosExt::reallocWithoutInitializing(space, out, n_imports_back); | ||
Kokkos::View<int *, DeviceType> query_ids( | ||
Kokkos::view_alloc(space, Kokkos::WithoutInitializing, | ||
"ArborX::DistributedTree::query::nearest::query_ids"), | ||
n_imports_back); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be simplified by using UnmanagedView
from the distributor data pointers, and doing an immediate deep_copoy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't that what I'm doing already?
That's the same as #733, right? |
No, that was for spatial queries. I need to think a bit about that one. |
The strategy here is to do a regular query that gets us the MPI ranks, and primitive indices for each query, send them back to the MPI rank owning the primitives, execute the callback, and send them back to the MPI rank owning the queries.