Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MPISpace: element operators with atomic trait aren't atomic #99

Closed
brian-kelley opened this issue May 6, 2024 · 7 comments
Closed

MPISpace: element operators with atomic trait aren't atomic #99

brian-kelley opened this issue May 6, 2024 · 7 comments
Assignees

Comments

@brian-kelley
Copy link
Contributor

brian-kelley commented May 6, 2024

The operator overloads for proxy type MPIDataElement with atomic trait have the same implementation as the non-atomic version, for example at

  KOKKOS_INLINE_FUNCTION
  void inc() const {
    T val = T();
    mpi_type_g(val, offset, pe, *win);
    val++;
    mpi_type_p(val, offset, pe, *win);
  }

These operators should use the one-sided atomic functions: MPI_Accumulate, MPI_Get_accumulate, or MPI_Fetch_and_op.

MPI_Compare_and_swap in a loop could be used for non-builtin operations like in Desul.

@vmiheer
Copy link
Contributor

vmiheer commented May 6, 2024

Thanks for opening the issue, Brian!!! Is there trait which says, associative reordering is okay? Although in distributed scenario I wonder how it would be not okay.

@brian-kelley
Copy link
Contributor Author

brian-kelley commented May 6, 2024

@vmiheer It should be safe to assume that it's always OK, since without that assumption Kokkos couldn't do parallel reduce or scan. Do the MPI one-sided atomics give you a choice in that?

@vmiheer
Copy link
Contributor

vmiheer commented May 6, 2024

I was looking at semantics for https://docs.nvidia.com/nvshmem/archives/nvshmem-113/api/docs/gen/mem-model.html#differences-between-nvshmem-and-openshmem and was wondering.
Although this is question for later. For now I am going for MPI_Accumulate.

@devreal
Copy link
Collaborator

devreal commented May 6, 2024

MPI RMA accumulate operations are only defined on operator/type pairs that are associative.

@brian-kelley
Copy link
Contributor Author

@vmiheer I see, so this is about how atomic operations are ordered. I wasn't familiar with this detail but the nvshmem behavior is actually different from Kokkos core, where if you do the two atomic fetch-adds from the same thread then they will always execute in order.

But KRS is supposed to be fully portable, so it only makes guarantees that all its backends make. And KRS doesn't add fences to all the nvshmem atomics (I assume this would be horrible for performance). So for the MPISpace backend, you don't have to worry about how atomics are ordered.

@janciesko
Copy link
Contributor

Related to #25

@brian-kelley
Copy link
Contributor Author

@janciesko Sorry, I didn't notice this was already an open issue!

@brian-kelley brian-kelley closed this as not planned Won't fix, can't repro, duplicate, stale May 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants