Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failing CUDA tests #861

Open
wants to merge 10 commits into
base: master
Choose a base branch
from
Open

Failing CUDA tests #861

wants to merge 10 commits into from

Conversation

luraess
Copy link
Contributor

@luraess luraess commented Aug 12, 2024

Also, bump AMDGPU to 1.0 release (addresses #860 also for tests)
[EDIT]
AMDGPU compat handled in #869 to focus on CUDA tests here

@luraess
Copy link
Contributor Author

luraess commented Aug 12, 2024

  • Seems that ubuntu-latest test for test_datatype.jl fail for nightly on both default and openmpi-jll.
  • CUDA tests on >= v1.9 struggle with test_allreduce and test_allgather and more collectives (tried excluding them for now).

@luraess
Copy link
Contributor Author

luraess commented Aug 13, 2024

On CPU, we get failing test in Ubuntu-latest, [EDIT] Julia 1.9 and 1.10, for PrimitiveType = Primitive80 on:

Any hint what could go wrong there?

@giordano
Copy link
Member

#853

@giordano
Copy link
Member

On CPU, we get failing test in Ubuntu-latest, Julia 1.9 and 1.10

Where do you see failures with julia 1.9 and 1.10? It looks to me only Julia nightly is failing

@luraess
Copy link
Contributor Author

luraess commented Aug 13, 2024

Where do you see failures with julia 1.9 and 1.10? It looks to me only Julia nightly is failing

Correct, only nightly is failing. 1.9 and 1.10 fail with CUDA MPI

@giordano giordano requested a review from vchuravy August 13, 2024 10:09
@luraess
Copy link
Contributor Author

luraess commented Aug 13, 2024

Now CUDA tests segfault on test_basic.jl https://buildkite.com/julialang/mpi-dot-jl/builds/1520#01914b09-c528-4d9b-9c31-d8273912270d/286-489, which suggests it's not related to collective but to something else that brakes CUDA-aware MPI in CI. I will revert the excluded tests and one would need to dig further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants