Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failing clustering helpers test #1186

Open
aprokop opened this issue Oct 24, 2024 · 1 comment · May be fixed by #1205
Open

Failing clustering helpers test #1186

aprokop opened this issue Oct 24, 2024 · 1 comment · May be fixed by #1205
Labels
bug Something isn't working

Comments

@aprokop
Copy link
Contributor

aprokop commented Oct 24, 2024

In #1034, an update to HIP 6.2 resulted in two failed tests, ArborX_Test_DetailsClusteringHelpers.exe and ArborX_Test_Clustering.

I tried to reproduce the failure on Frontier, and I actually can't make the helper test pass at all. I tried these combinations:

ArborX version Kokkos version HIP version
master 4.4.1 6.2.3
master 4.4.1 6.1 3
master 4.3.1 6.1.3
master 4.3.1 5.7.1
1.7 4.3.1 5.7.1
1.6 4.3.1 5.7.1
1.5 4.3.1 5.7.1

No idea what's going on at the moment. It results in memory violation withing the Boruvka loop.

@aprokop aprokop added the bug Something isn't working label Nov 13, 2024
@aprokop
Copy link
Contributor Author

aprokop commented Jan 16, 2025

There's something extremely bizarre going on on Frontier with the Serial backend. After spending hours tracking things, I stumbled upon the following incomprehensible stuff:

  Kokkos::parallel_for(
      "ArborX::MST::reset_shared_radii", Kokkos::RangePolicy(space, 0, n - 1),
      KOKKOS_LAMBDA(int i) {
        int const j = i + 1;
        auto const label_i = labels(i);
        auto const label_j = labels(j);
        if (label_i == label_j)
            return;

        auto const r =
            metric(HappyTreeFriends::getValue(bvh, i).index,
                   HappyTreeFriends::getValue(bvh, j).index,
                   distance(HappyTreeFriends::getIndexable(bvh, i),
                            HappyTreeFriends::getIndexable(bvh, j)));

        auto const d1 = distance(HappyTreeFriends::getIndexable(bvh, i),
                                 HappyTreeFriends::getIndexable(bvh, j));

        if (i == 257) {
            auto const d2 = distance(HappyTreeFriends::getIndexable(bvh, i),
                                     HappyTreeFriends::getIndexable(bvh, j));
            printf("[%d] RRR(r): %.9g\n", i, r);
            printf("[%d] RRR(d1): %.9g\n", i, d1);
            printf("[%d] RRR(d2): %.9g\n", i, d2);
            printf("[%d] RRR(i): %.9g\n", i,
                   distance(HappyTreeFriends::getIndexable(bvh, i),
                            HappyTreeFriends::getIndexable(bvh, j)));
        }

        Kokkos::atomic_min(&radii(label_i), r);
        Kokkos::atomic_min(&radii(label_j), r);
      });

It prints the following (for the golden test):

[257] RRR(r): 0.0358776711
[257] RRR(d1): 0.0358776711
[257] RRR(d2): 0.0358776748
[257] RRR(i): 0.0358776748

What The Actual F.

This results in the mutual reachability distance (r) being smaller than the actual distance which is completely wrong. This later results in some points not finding any candidates with a different label within that radius. Which messes everything up.

This does not make any sense. The compiler seems to be doing something bizarre that I don't understand.

The problem disappears when compiling with -O0. So it must be some kind of optimization that completely messes things up.

@aprokop aprokop linked a pull request Jan 16, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant