Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ray tracing benchmark #842

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open

Conversation

Rombur
Copy link
Collaborator

@Rombur Rombur commented Mar 13, 2023

This is a benchmark based on an adamantine use case. We have an IR camera which produces a file of rays and temperatures. We then need to find the first intersection of the rays with the mesh. In adamantine, we are interested in the exact coordinates of the intersections but I've removed that computation here since it doesn't use ArborX. The image contains about 300,000 rays with some of them missing the mesh. By increasing the number of processors we increase the number of cells but the number of rays is constant. All the processors read the camera file.

Here is part of the profiling (bottom-up and top-bottom trees) using 4 processors

BOTTOM-UP TIME TREE:
<average time> <percent of total time> <percent time in Kokkos> <percent MPI imbalance> <number of calls> <name> [type]
===================
|-> 3.52e+00 sec 14.7% 0.0% 0.0% 0.0% 0.00e+00 2 ArborX::DistributedTree::sendAcrossNetwork (ArborX::DistributedTree::query::forwardQueries::exports) [region]
|   |-> 3.52e+00 sec 14.7% 0.0% 0.0% 0.0% 0.00e+00 2 ArborX::DistributedTree::forwardQueries [region]
|       |-> 3.52e+00 sec 14.7% 0.0% 0.0% 0.0% 0.00e+00 2 ArborX::DistributedTree::query::nearest [region] 
|           |-> 3.52e+00 sec 14.7% 0.0% 0.0% 0.0% 0.00e+00 2 Ray::Query [region]
|-> 3.27e+00 sec 13.7% 100.0% 0.0% ------ 3 ArborX::TreeTraversal::nearest [for]
|   |-> 3.27e+00 sec 13.7% 100.0% 0.0% 0.0% 0.00e+00 3 ArborX::BVH::query::nearest [region]
|       |-> 3.27e+00 sec 13.7% 100.0% 0.0% 0.0% 0.00e+00 3 ArborX::CrsGraphWrapper::two_pass::first_pass [region]
|           |-> 3.27e+00 sec 13.7% 100.0% 0.0% 0.0% 0.00e+00 3 ArborX::CrsGraphWrapper::two_pass [region] 
|               |-> 3.27e+00 sec 13.7% 100.0% 0.0% 0.0% 0.00e+00 3 ArborX::CrsGraphWrapper::query::nearest [region]
|                   |-> 3.27e+00 sec 13.7% 100.0% 0.0% 0.0% 0.00e+00 3 ArborX::query [region]
|                       |-> 3.03e+00 sec 12.7% 100.0% 0.0% 0.0% 0.00e+00 2 ArborX::DistributedTree::query::nearest [region]
|                       |   |-> 3.03e+00 sec 12.7% 100.0% 0.0% 0.0% 0.00e+00 2 Ray::Query [region]
|                       |-> 2.45e-01 sec 1.0% 100.0% 0.0% 0.0% 0.00e+00 1 ArborX::DistributedTree::deviseStrategy [region]
|                           |-> 2.45e-01 sec 1.0% 100.0% 0.0% 0.0% 0.00e+00 1 ArborX::DistributedTree::query::nearest [region]
|                               |-> 2.45e-01 sec 1.0% 100.0% 0.0% 0.0% 0.00e+00 1 Ray::Query [region]

TOP-DOWN TIME TREE:                                                                                                                                                                                                
<average time> <percent of total time> <percent time in Kokkos> <percent MPI imbalance> <remainder> <kernels per second> <number of calls> <name> [type]                                                           
===================                                                                                                                                                                                                
|-> 3.47e+00 sec 58.0% 30.4% 0.5% 0.0% 4.53e+01 1 Ray::Query [region]                                                                                                                                              
|   |-> 3.47e+00 sec 58.0% 30.4% 0.5% 0.0% 4.47e+01 1 ArborX::DistributedTree::query::nearest [region]                                                                                                             
|   |   |-> 1.39e+00 sec 23.2% 0.8% 0.7% 7.2% 2.59e+01 2 ArborX::DistributedTree::forwardQueries [region]                                                                                                          
|   |   |   |-> 8.86e-01 sec 14.8% 0.6% 0.9% 99.4% 6.77e+00 2 ArborX::DistributedTree::sendAcrossNetwork (ArborX::DistributedTree::query::forwardQueries::exports) [region]                                        
|   |   |   |-> 2.06e-01 sec 3.4% 0.1% 8.7% 99.9% 9.71e+00 2 ArborX::DistributedTree::sendAcrossNetwork (ArborX::DistributedTree::query::forwardQueries::export_ranks) [region]                                    
|   |   |   |-> 1.91e-01 sec 3.2% 0.2% 0.0% 99.8% 1.05e+01 2 ArborX::DistributedTree::sendAcrossNetwork (ArborX::DistributedTree::query::forwardQueries::export_ids) [region]                                      
|   |   |-> 1.02e+00 sec 17.1% 0.1% 13.0% 22.1% 3.91e+00 2 ArborX::DistributedTree::communicateResultsBack [region]                                                                                                
|   |   |   |-> 2.16e-01 sec 3.6% 0.0% 7.4% 100.0% 0.00e+00 2 ArborX::DistributedTree::sendAcrossNetwork (ArborX::DistributedTree::query::nearest::indices) [region]                                               
|   |   |   |-> 2.08e-01 sec 3.5% 0.0% 7.7% 100.0% 0.00e+00 2 ArborX::DistributedTree::sendAcrossNetwork (ArborX::DistributedTree::query::nearest::distances) [region]                                             
|   |   |   |-> 1.92e-01 sec 3.2% 0.0% 0.0% 100.0% 0.00e+00 2 ArborX::DistributedTree::sendAcrossNetwork (ArborX::DistributedTree::query::forwardQueries::import_ids) [region]                                     
|   |   |   |-> 1.82e-01 sec 3.0% 0.0% 5.7% 100.0% 0.00e+00 2 ArborX::DistributedTree::sendAcrossNetwork (ArborX::DistributedTree::query::forwardQueries::import_ranks) [region]                                   
|   |   |-> 7.63e-01 sec 12.8% 100.0% 18.9% 0.0% 3.93e+01 2 ArborX::query [region]                                                                                                                                 
|   |   |   |-> 7.63e-01 sec 12.8% 100.0% 18.9% 0.0% 3.93e+01 2 ArborX::CrsGraphWrapper::query::nearest [region]                                                                                                   
|   |   |       |-> 7.59e-01 sec 12.7% 100.0% 18.8% 0.0% 2.37e+01 2 ArborX::CrsGraphWrapper::two_pass [region]                                                                                                     
|   |   |       |   |-> 7.58e-01 sec 12.7% 100.0% 18.8% 0.0% 1.58e+01 2 ArborX::CrsGraphWrapper::two_pass::first_pass [region]                                                                                     
|   |   |       |   |   |-> 7.58e-01 sec 12.7% 100.0% 18.8% 0.0% 1.06e+01 2 ArborX::BVH::query::nearest [region]                                                                                                   
|   |   |       |   |   |   |-> 7.57e-01 sec 12.7% 100.0% 18.9% ------ 2 ArborX::TreeTraversal::nearest [for]  

Copy link
Contributor

@aprokop aprokop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 33MB csv file is too large for merging.

benchmarks/CMakeLists.txt Outdated Show resolved Hide resolved
benchmarks/rays/rays.cpp Outdated Show resolved Hide resolved
benchmarks/rays/rays.cpp Outdated Show resolved Hide resolved
Comment on lines 28 to 29
std::pair<Kokkos::View<ArborX::Experimental::Ray[n_rays], MemorySpace>,
Kokkos::View<double[n_rays], MemorySpace>>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to use non-dynamic views?

Comment on lines 120 to 128
Kokkos::View<ArborX::Nearest<ArborX::Experimental::Ray> *, MemorySpace>
queries(Kokkos::view_alloc("queries", Kokkos::WithoutInitializing),
rays.size());
Kokkos::parallel_for(
"Ray::Build_queries",
Kokkos::RangePolicy<ExecutionSpace>(0, rays.extent(0)),
KOKKOS_LAMBDA(int i) {
queries(i) = ArborX::nearest<ArborX::Experimental::Ray>(rays(i), 1);
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use access traits instead?

Kokkos::View<ArborX::Box[n_divisions_x * n_divisions_y * n_divisions_z],
MemorySpace>
bounding_boxes(
Kokkos::view_alloc("bounding_boxes", Kokkos::WithoutInitializing));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You would want to use execution space here too, I think.

Comment on lines 137 to 139
constexpr int n_divisions_x = 100;
constexpr int n_divisions_y = 100;
constexpr int n_divisions_z = 100;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not simply nx, ny, nz?

MPI_Barrier(comm);

// Build the distributed tree
Kokkos::Profiling::pushRegion("Ray::Setup");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe prefix labels in the file with Example::?

Comment on lines 38 to 39
std::ifstream file;
file.open(filename);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::ifstream file;
file.open(filename);
std::ifstream file(filename);

You would want to have a check for failure.

std::ifstream file;
file.open(filename);
std::string line;
std::getline(file, line);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment about why the first line is ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants