-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use 64-bit Morton indices by default in the BVH construction #637
Conversation
Currently the tests fail because some of them have a point in the There are two approaches to fix this.
--- a/src/details/ArborX_DetailsTreeConstruction.hpp
+++ b/src/details/ArborX_DetailsTreeConstruction.hpp
@@ -206,7 +206,15 @@ public:
// Morton comparison. Thus, we add INT_MIN to it.
// We also avoid if/else statement by doing a "x + !x*<blah>" trick.
auto x = _sorted_morton_codes(i) ^ _sorted_morton_codes(i + 1);
- return x + (!x) * (LLONG_MIN + (i ^ (i + 1)));
+ if (x != 0)
+ {
+ // When using 63 bits for Morton codes, the LLONG_MAX is actually a valid
+ // code. As we want the return statement above to return a value always
+ // greater than anything here, we downshift by 1.
+ return x - 1;
+ }
+
+ return LLONG_MIN + (i ^ (i + 1));
}
KOKKOS_FUNCTION Node *getNodePtr(int i) const Because I don't think this fundamentally affects any of the posted results. |
2b69f24
to
22c53c4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partial review. Haven't had a chance to look into enabling both the use of unsigned int
and unsigned long long
yet.
22c53c4
to
b2f84ee
Compare
@dalg24 It does not have to be a part of this PR. The default behavior will still be changed to use 64-bit. |
b2f84ee
to
82f9f5f
Compare
82f9f5f
to
31192a8
Compare
Force rebased on master due to conflicts with #642. |
Original motivation
ArborX is unable to run GeoLife (https://www.microsoft.com/en-us/research/publication/geolife-gps-trajectory-dataset-user-guide/) dataset in any reasonable manner (i.e., it takes hundreds of seconds to find a closest neighbor for this 24M problem). The problem is that hundreds of thousands of points are assigned the same Morton index. As we just "randomly" combine such pairs during the hierarchy construction, it leads to dramatic overlap in the bounding volumes for the lowest levels.
Statistics for some datasets
I have checked the number of duplicate codes and their distribution for two 3D datasets: HACC and GeoLife (those are the ones I had readily available).
"# M codes (>3 reps)": number of Morton codes that have more than 3 duplicates, i.e. number of Morton grid cells with more than 3 points in them
"# points with duplicate M code": number of points that share their Morton code with another point
" max # duplicate": highest number of points with the same Morton code
Number of hierarchy levels
Because of the way Karras' LBVH construction works, increasing resolution in Morton indices will typically result in a deeper hierarchy. This is particularly true for a hierarchy with multiple duplicate Morton codes corresponding to a single Morton box.
Despite this, as I will show, the queries actually run faster.
Results (master 707a5b3 vs branch c5852e9 (5407512 after rebase onmaster)
Standard Benchmark
morton64_results.zip
Summary: expected penalty in the construction, which is < 20% everywhere except HIP where it is 50% (HIP Thrust not optimized for
long long
?).CUDA V100
10-20% penalty in construction. No difference in any of the searches.
HIP MI100
Up to 50% penalty in construction (which is significantly higher than Cuda; possibly HIP Thrust is less optimized for sorting
long long
than Cuda Thrust). No difference in radius and knn (except for the smallest size).Serial (Power9)
Somehow, the construction is faster on Power9 for small sizes, and no difference for larger. No difference in radius and knn (except in knn for the smallest size).
Serial (AMD EPYC)
10% penalty in construction and a 3 random slowdowns in radius search (all for very small sizes).
V100 (with also using 64-bit for sorting queries)
Here is just recording why I opted out of using 64-bit for sorting queries (using eb43475 for branch):
Algorithms on actual datasets
Compared to the benchmark, where we are using uniform distributions (whether in volume or on the surface), real datasets exhibit different characteristics. The data is typically more localized, with density increased towards interesting features (clusters in HACC, roads in GPS tracking datasets like GeoLife, etc).
DBSCAN
Run with
minpts = 2
to stress just a single traversal.HAC Run with
./ArborX_DBSCAN.exe --core_min_size 2 --eps [eps] --binary --filename [filename]
GeoLife I only ran with 5M samples (out of 24M total points) as
master
because unbelievably slow with increasing sample sizes (as the density increases).Nearest neighbor
Here we use the time for the first iteration of Boruvka in MST (with
--core-min-size 1 --algorithm mst
) as a our time for the nearest neighbor. Note that it is an approximation as we set the radius to already something pretty before the traversal.HACC
GeoLife
Further thoughts
While the situation seems to be fully resolved for the HACC dataset, GeoLife may still be problematic. It does seem that the overall approach is not really suitable for this kind of datasets as they are very far from ray tracing that BVH was developed for. We may want to explore other approaches, like recursive partitioning of data (e.g., kd-tree). Something like this paper. But it will be the whole new area to explore, and it's low priority for us given lack of interested applications.