Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DistributedTree Nearest query with callback #737

Open
wants to merge 53 commits into
base: master
Choose a base branch
from

Conversation

masterleinad
Copy link
Collaborator

@masterleinad masterleinad commented Sep 9, 2022

The strategy here is to do a regular query that gets us the MPI ranks, and primitive indices for each query, send them back to the MPI rank owning the primitives, execute the callback, and send them back to the MPI rank owning the queries.

@masterleinad masterleinad force-pushed the distributed_knn_callback branch from 00d1ef2 to c74aa9e Compare September 9, 2022 19:55
@masterleinad
Copy link
Collaborator Author

This should work now replaying the found matches on the remote MPI rank and communicating the results back. Currently, I don't treat the default callback special to validate that this behaves as expected.

@masterleinad
Copy link
Collaborator Author

This should be ready for inital review. As said before, the strategy here is to do a regular query that get's us the MPI ranks, and primitive indices for each query, send them back to the MPI rank owning the primitives, execute the callback, and send them back to the MPI rank owning the queries.

@masterleinad masterleinad marked this pull request as ready for review September 15, 2022 15:07
@masterleinad
Copy link
Collaborator Author

I will still adopt some more tests to use the callback.

@masterleinad
Copy link
Collaborator Author

@aprokop ping

@aprokop aprokop added the enhancement New feature or request label Sep 22, 2022
examples/distributed_tree/distributed_knn_callback.cpp Outdated Show resolved Hide resolved
{
auto data = ArborX::getData(predicate);
auto const &point = points(primitive_index);
printf("Intersection for query %d from MPI rank %d on MPI rank %d for "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to guard it with SYCL?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like our saying "intersection". Commenting so we can get back to it.

examples/distributed_tree/distributed_knn_callback.cpp Outdated Show resolved Hide resolved
src/details/ArborX_DetailsDistributedTreeImpl.hpp Outdated Show resolved Hide resolved
src/geometry/ArborX_HyperSphere.hpp Outdated Show resolved Hide resolved
Comment on lines 30 to 31
using PairIndexRank = Kokkos::pair<int, int>;
using PairRankIndex = Kokkos::pair<int, int>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely not. Aliasing Kokkos::pair was (probably me) already being lazy, this does not protect us against bugs and is awfully confusing.
Any good reason to have both PairIndexRank and PairRankIndex?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I said before, I need to work more on the tests. Here, I wanted the reverse order to make sure that the new test really executes the callback version properly. I would have gotten the same result ignoring the callback otherwise.

src/details/ArborX_DetailsDistributedTreeImpl.hpp Outdated Show resolved Hide resolved
@masterleinad masterleinad force-pushed the distributed_knn_callback branch from c47b028 to 9402639 Compare October 11, 2022 17:37
@masterleinad masterleinad force-pushed the distributed_knn_callback branch from 1375f7b to 3f66b82 Compare October 12, 2022 17:02
Copy link
Contributor

@aprokop aprokop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I think the PR is in the right direction. So far, I've only looked inside src/. I think we need to fix few things in the proposed implementation (most of my comments are superfluous and can be ignored for now):

  • We need to support pure callbacks
    They would use mostly the same code, except the API would not have offset and output views, and they won't need the extra round of backward communication
  • For non-pure callbacks, we should allow a callback to output variable number of elements, which would necessitate some version of two-pass code.

Comment on lines 59 to 77
struct ArborX::GeometryTraits::tag<ArborX::ExperimentalHyperGeometry::Sphere<DIM>>
struct ArborX::GeometryTraits::tag<
ArborX::ExperimentalHyperGeometry::Sphere<DIM>>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this an intentional change? I think it should be reverted.

.jenkins/continuous.groovy Outdated Show resolved Hide resolved
src/details/ArborX_DetailsDistributedTreeImpl.hpp Outdated Show resolved Hide resolved
src/details/ArborX_DetailsDistributedTreeImpl.hpp Outdated Show resolved Hide resolved
src/details/ArborX_DetailsDistributedTreeImpl.hpp Outdated Show resolved Hide resolved
src/details/ArborX_DetailsDistributedTreeImpl.hpp Outdated Show resolved Hide resolved
src/details/ArborX_DetailsDistributedTreeImpl.hpp Outdated Show resolved Hide resolved
Comment on lines 759 to 776
// Execute the callback on the process owning the primitives.
OutputView remote_out(
Kokkos::view_alloc(space, Kokkos::WithoutInitializing,
"ArborX::DistributedTree::query::remote_out"),
n_imports);
KokkosExt::reallocWithoutInitializing(space, indices, n_imports);
Kokkos::parallel_for(
"ArborX::DistributedTree::query::execute_callbacks",
Kokkos::RangePolicy<ExecutionSpace>(space, 0,
imported_queries_with_indices.size()),
KOKKOS_LAMBDA(int i) {
callback(imported_queries_with_indices(i).query,
imported_queries_with_indices(i).primitive_index,
[&](typename OutputView::value_type const &value) {
remote_out(i) = value;
indices(i) = imported_queries_with_indices(i).query_id;
});
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems wrong for a few reasons.

First, it assumes that callback will output a single element for each match. In practice, we don't know how many elements the callback will spit out per query. It could be 0, or it could be more than 1. So we need to do a two pass here, similar to the CrsGraphWrapperImpl. I'm not sure what is the easiest way to do it without massive code duplication.

Second, it assumes inline callback and not a post one. We could try to support post callback, but its meaning would have to be changed as we cannot execute it on all of the results coming from different ranks.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also just assume that the callback only returns a single element to get started, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could. The thing is, this condition is undetectable, so the failure for a user would likely be an undefined behavior or a segfault. Which in the distributed setting would be tricky to debug.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added an assertion.

Comment on lines +787 to +805
typename DeviceType::memory_space memory_space;
Kokkos::View<int *, DeviceType> destinations(
Kokkos::view_alloc(space, Kokkos::WithoutInitializing,
"ArborX::DistributedTree::query::destinations"),
dest.size());
Kokkos::deep_copy(space, destinations, host_destinations);
Kokkos::View<int *, DeviceType> offsets(
Kokkos::view_alloc(space, Kokkos::WithoutInitializing,
"ArborX::DistributedTree::query::offsets"),
off.size());
Kokkos::deep_copy(space, offsets, host_offsets);
auto const n_imports_back =
back_distributor.createFromSends(space, destinations, offsets);
KokkosExt::reallocWithoutInitializing(space, out, n_imports_back);
Kokkos::View<int *, DeviceType> query_ids(
Kokkos::view_alloc(space, Kokkos::WithoutInitializing,
"ArborX::DistributedTree::query::nearest::query_ids"),
n_imports_back);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be simplified by using UnmanagedView from the distributor data pointers, and doing an immediate deep_copoy.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't that what I'm doing already?

src/details/ArborX_DetailsDistributor.hpp Outdated Show resolved Hide resolved
@masterleinad
Copy link
Collaborator Author

We need to support pure callbacks
They would use mostly the same code, except the API would not have offset and output views, and they won't need the extra round of backward communication

That's the same as #733, right?

@aprokop
Copy link
Contributor

aprokop commented Dec 16, 2022

That's the same as #733, right?

No, that was for spatial queries. I need to think a bit about that one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants