Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: MPIX_Stream + MPIX_Irecv/Isend_enqueue() Multiple SEGV #6528

Closed
Jacobfaib opened this issue May 17, 2023 · 9 comments · Fixed by #6561
Closed

bug: MPIX_Stream + MPIX_Irecv/Isend_enqueue() Multiple SEGV #6528

Jacobfaib opened this issue May 17, 2023 · 9 comments · Fixed by #6561
Assignees

Comments

@Jacobfaib
Copy link

Jacobfaib commented May 17, 2023

The Problem(s)

Thread 1 "mpich_segv" received signal SIGSEGV, Segmentation fault.
0x00007f8c91432874 in MPIR_Waitall_enqueue_impl (count=1, array_of_requests=0x61c00017eedc, array_of_statuses=0x1) at src/mpi/stream/stream_enqueue.c:497
497	            if (p2->host_buf) {
(gdb) bt
#0  0x00007f8c91432874 in MPIR_Waitall_enqueue_impl (count=1,
    array_of_requests=0x61c00017eedc, array_of_statuses=0x1)
    at src/mpi/stream/stream_enqueue.c:497
#1  0x00007f8c914b3a7e in MPID_Waitall_enqueue (count=1,
    array_of_requests=0x61c00017eedc, array_of_statuses=0x1)
    at src/mpid/ch4/src/ch4_stream_enqueue.c:401
#2  0x00007f8c9121cba9 in internalX_Waitall_enqueue (count=1,
    array_of_requests=0x61c00017eedc, array_of_statuses=0x1)
    at src/binding/c/c_binding.c:76246
#3  0x00007f8c9121ccaa in PMPIX_Waitall_enqueue (count=1,
    array_of_requests=0x61c00017eedc, array_of_statuses=0x1)
    at src/binding/c/c_binding.c:76296
Thread 1 "mpich_segv" received signal SIGSEGV, Segmentation fault.
MPIR_Handle_obj_alloc_unsafe (objmem=0x7f0e56b7cce8 <MPIR_Request_mem+136>, max_blocks=256, max_indices=1024) at ./src/include/mpir_handlemem.h:279
279	        objmem->avail = objmem->avail->next;
(gdb) bt
#0  MPIR_Handle_obj_alloc_unsafe (
    objmem=0x7f0e56b7cce8 <MPIR_Request_mem+136>, max_blocks=256,
    max_indices=1024) at ./src/include/mpir_handlemem.h:279
#1  0x00007f0e4863422f in MPIR_Request_create_from_pool (
    kind=MPIR_REQUEST_KIND__ENQUEUE, pool=1, ref_count=1)
    at ./src/include/mpir_request.h:380
#2  0x00007f0e486347f7 in MPIR_Request_create_from_pool_safe (
    kind=MPIR_REQUEST_KIND__ENQUEUE, pool=1, ref_count=1)
    at ./src/include/mpir_request.h:449
#3  0x00007f0e48635bec in MPIR_allocate_enqueue_request (
    comm_ptr=0x7f0ba804bda0, req=0x7ffd09cff308)
    at src/mpi/stream/stream_util.c:36
#4  0x00007f0e48631420 in MPIR_Irecv_enqueue_impl (buf=0x4202000400, count=1,
    datatype=1275070475, source=1, tag=268435425, comm_ptr=0x7f0ba804bda0,
    req=0x7ffd09cff308) at src/mpi/stream/stream_enqueue.c:306
#5  0x00007f0e486b336a in MPID_Irecv_enqueue (buf=0x4202000400, count=1,
    datatype=1275070475, source=1, tag=268435425, comm_ptr=0x7f0ba804bda0,
    req=0x7ffd09cff308) at src/mpid/ch4/src/ch4_stream_enqueue.c:299
#6  0x00007f0e4841b4b6 in internalX_Irecv_enqueue (buf=0x4202000400, count=1,
    datatype=1275070475, source=1, tag=268435425, comm=-1006632959,
    request=0x61c00017eefc) at src/binding/c/c_binding.c:75847
#7  0x00007f0e4841b649 in PMPIX_Irecv_enqueue (buf=0x4202000400, count=1,
    datatype=1275070475, source=1, tag=268435425, comm=-1006632959,
    request=0x61c00017eefc) at src/binding/c/c_binding.c:75904
[1684336515.914697] [petsc-gpu-01:3422596:1]           debug.c:1289 UCX  WARN  ucs_debug_disable_signal: signal 8 was not set in ucs
[1684336515.914705] [petsc-gpu-01:3422596:1]           debug.c:1289 UCX  WARN  ucs_debug_disable_signal: signal 1 was not set in ucs
[1684336515.914708] [petsc-gpu-01:3422596:1]           debug.c:1289 UCX  WARN  ucs_debug_disable_signal: signal 11 was not set in ucs
[1684336515.914710] [petsc-gpu-01:3422596:1]           debug.c:1289 UCX  WARN  ucs_debug_disable_signal: signal 7 was not set in ucs
[petsc-gpu-01:3422596:1:3422917]  ucp_worker.c:2786 Assertion `--worker->inprogress == 0' failed
[petsc-gpu-01:3422596:0:3422596]  ucp_worker.c:2781 Assertion `worker->inprogress++ == 0' failed
[petsc-gpu-01:3422597:0:3422919] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x200498)
==== backtrace (tid:3422596) ====
 0  /home/ac.jfaibussowitsch/petsc/main-arch-cuda-opt/lib/libucs.so.0(ucs_handle_error+0x2e4) [0x7f19545f4604]
 1  /home/ac.jfaibussowitsch/petsc/main-arch-cuda-opt/lib/libucs.so.0(ucs_fatal_error_message+0xca) [0x7f19545f155a]
 2  /home/ac.jfaibussowitsch/petsc/main-arch-cuda-opt/lib/libucs.so.0(ucs_fatal_error_format+0x122) [0x7f19545f1682]
 3  /home/ac.jfaibussowitsch/petsc/main-arch-cuda-opt/lib/libucp.so.0(ucp_worker_progress+0x98) [0x7f1954685348]
 4  /scratch/jfaibussowitsch/petsc/main-arch-cuda-opt/lib/libmpi.so.0(+0x922a69) [0x7f1956b7ca69]
 5  /scratch/jfaibussowitsch/petsc/main-arch-cuda-opt/lib/libmpi.so.0(+0x923ede) [0x7f1956b7dede]
 6  /scratch/jfaibussowitsch/petsc/main-arch-cuda-opt/lib/libmpi.so.0(+0x91e548) [0x7f1956b78548]
 7  /scratch/jfaibussowitsch/petsc/main-arch-cuda-opt/lib/libmpi.so.0(+0x94d625) [0x7f1956ba7625]
 8  /scratch/jfaibussowitsch/petsc/main-arch-cuda-opt/lib/libmpi.so.0(MPI_Finalize+0x17) [0x7f1956a61987]
 9  ./mpich_segv(main+0x502) [0x560f1efc2a02]
10  /lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7f1955c01d90]
11  /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80) [0x7f1955c01e40]
12  ./mpich_segv(_start+0x25) [0x560f1efc3885]
=================================
==== backtrace (tid:3422919) ====
 0  /home/ac.jfaibussowitsch/petsc/main-arch-cuda-opt/lib/libucs.so.0(ucs_handle_error+0x2e4) [0x7fe932dd8604]
 1  /home/ac.jfaibussowitsch/petsc/main-arch-cuda-opt/lib/libucs.so.0(+0x34807) [0x7fe932dd8807]
 2  /home/ac.jfaibussowitsch/petsc/main-arch-cuda-opt/lib/libucs.so.0(+0x34aee) [0x7fe932dd8aee]
 3  /lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x7fe93441a520]
 4  /home/ac.jfaibussowitsch/petsc/main-arch-cuda-opt/lib/libucp.so.0(ucp_tag_send_nbx+0x5f) [0x7fe932ef473f]
 5  /scratch/jfaibussowitsch/petsc/main-arch-cuda-opt/lib/libmpi.so.0(+0x9114d5) [0x7fe93531b4d5]
 6  /scratch/jfaibussowitsch/petsc/main-arch-cuda-opt/lib/libmpi.so.0(+0x913db7) [0x7fe93531ddb7]
 7  /scratch/jfaibussowitsch/petsc/main-arch-cuda-opt/lib/libmpi.so.0(+0x874c28) [0x7fe93527ec28]
 8  /scratch/jfaibussowitsch/petsc/main-arch-cuda-opt/lib/libmpi.so.0(+0x900820) [0x7fe93530a820]
 9  /scratch/jfaibussowitsch/petsc/main-arch-cuda-opt/lib/libmpi.so.0(+0x900937) [0x7fe93530a937]
10  /scratch/jfaibussowitsch/petsc/main-arch-cuda-opt/lib/libmpi.so.0(+0x902453) [0x7fe93530c453]
11  /scratch/jfaibussowitsch/petsc/main-arch-cuda-opt/lib/libmpi.so.0(+0x96ec6a) [0x7fe935378c6a]
12  /lib/x86_64-linux-gnu/libcuda.so.1(+0x262808) [0x7fe9331d6808]
13  /lib/x86_64-linux-gnu/libcuda.so.1(+0x272958) [0x7fe9331e6958]
14  /lib/x86_64-linux-gnu/libc.so.6(+0x94b43) [0x7fe93446cb43]
15  /lib/x86_64-linux-gnu/libc.so.6(+0x126a00) [0x7fe9344fea00]
=================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 3422596 RUNNING AT petsc-gpu-01
=   EXIT CODE: 134
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================

To Reproduce

// mpicxx mpich_segv.cpp -o mpich_segv -lcudart
// /path/to/mpiexec -n 2 ./mpich_segv
#include <mpi.h>
#include <cuda_runtime.h>

#include <cstdlib>
#include <iostream>
#include <memory>

#define MPI_SAFE_CALL(...) do {                                                         \
    int mpi_errno_ = __VA_ARGS__;                                                       \
    if (mpi_errno_ != MPI_SUCCESS) {                                                    \
      int len_;                                                                         \
      char mpi_errorstring_[MPI_MAX_ERROR_STRING];                                      \
      MPI_Error_string(mpi_errno_, mpi_errorstring_, &len_);                            \
      std::cerr << "MPI Error: " << mpi_errno_ << " (" << mpi_errorstring_ << ")\n";    \
      MPI_Abort(MPI_COMM_WORLD, mpi_errno_);                                              \
  }                                                                                     \
  } while (0)

#define CUDA_SAFE_CALL(...) do { \
  cudaError_t cuda_errno_ = __VA_ARGS__; \
  if (cuda_errno_ != cudaSuccess) {                                                            \
    std::cerr << "CUDA Error: " << cuda_errno_ << " (" << cudaGetErrorName(cuda_errno_) << "): " \
              << cudaGetErrorString(cuda_errno_) << '\n';                                      \
    MPI_Abort(MPI_COMM_WORLD, static_cast<int>(cuda_errno_));                                  \
  } \
  } while (0)

int main(int argc, char *argv[])
{
  (void)setenv("MPIR_CVAR_CH4_RESERVE_VCIS", "1", 0);
  MPI_SAFE_CALL(MPI_Init(&argc, &argv));

  cudaStream_t stream;

  CUDA_SAFE_CALL(cudaStreamCreate(&stream));

  MPI_Info    info;
  MPIX_Stream mpi_stream;

  MPI_SAFE_CALL(MPI_Info_create(&info));
  MPI_SAFE_CALL(MPI_Info_set(info, "type",  "cudaStream_t"));
  MPI_SAFE_CALL(MPIX_Info_set_hex(info, "value", &stream, sizeof(stream)));
  MPI_SAFE_CALL(MPIX_Stream_create(info, &mpi_stream));
  MPI_SAFE_CALL(MPI_Info_free(&info));

  const auto comm = MPI_COMM_WORLD;
  MPI_Comm   scomm;

  MPI_SAFE_CALL(MPIX_Stream_comm_create(comm, mpi_stream, &scomm));

  int rank, size;

  MPI_SAFE_CALL(MPI_Comm_rank(comm, &rank));
  MPI_SAFE_CALL(MPI_Comm_size(comm, &size));

  double         *array;
  constexpr auto  cnt         = 100;
  constexpr auto  array_bytes = cnt * sizeof(*array);

  CUDA_SAFE_CALL(cudaMallocAsync((void **)&array, array_bytes, stream));
  CUDA_SAFE_CALL(cudaMemsetAsync(array, 0, array_bytes, stream));

  // round-robin, send to next rank (or wrap to 0), receive from previous rank (or wrap to max)
  const auto send_rank = (rank + 1) % size;
  const auto recv_rank = rank ? rank - 1 : size - 1;
  const auto reqs      = new MPI_Request[2];

  for (auto i = 0; i < 100; ++i) {
    MPI_SAFE_CALL(MPIX_Irecv_enqueue(array, cnt, MPI_DOUBLE, recv_rank, i, scomm, reqs));
    MPI_SAFE_CALL(MPIX_Isend_enqueue(array, cnt, MPI_DOUBLE, send_rank, i, scomm, reqs + 1));

    MPI_SAFE_CALL(MPIX_Waitall_enqueue(1, reqs, MPI_STATUSES_IGNORE));
    MPI_SAFE_CALL(MPIX_Waitall_enqueue(1, reqs + 1, MPI_STATUSES_IGNORE));
  }

  delete[] reqs;

  MPI_SAFE_CALL(MPI_Comm_free(&scomm));
  MPI_SAFE_CALL(MPIX_Stream_free(&mpi_stream));
  CUDA_SAFE_CALL(cudaFreeAsync(array, stream));
  CUDA_SAFE_CALL(cudaStreamDestroy(stream));
  MPI_SAFE_CALL(MPI_Finalize());
  return 0;
}
@Jacobfaib
Copy link
Author

mpich_config.log

@Jacobfaib
Copy link
Author

Note the problem is exacerbated if the MPI_Request objects are loop-local:

for (auto i = 0; i < 100; ++i) {
  MPI_Request recv_req, send_req;

  MPI_SAFE_CALL(MPIX_Irecv_enqueue(array, cnt, MPI_DOUBLE, recv_rank, i, scomm, &recv_req));
  MPI_SAFE_CALL(MPIX_Isend_enqueue(array, cnt, MPI_DOUBLE, send_rank, i, scomm, &send_req));

  MPI_SAFE_CALL(MPIX_Waitall_enqueue(1, &recv_req, MPI_STATUSES_IGNORE));
  MPI_SAFE_CALL(MPIX_Waitall_enqueue(1, &send_req, MPI_STATUSES_IGNORE));
}

in which case things blow up in the first iteration of the loop. But perhaps there is some unwritten rule that the requests should live until the communication is done, so I dynamically allocate them...

@Jacobfaib
Copy link
Author

Also possibly related to pmodels/yaksa#245. Where there's smoke...

@raffenet
Copy link
Contributor

Does your MPICH build include #6454? I know @jczhang07 ran into an issue like this that lead to that PR.

@Jacobfaib
Copy link
Author

Does your MPICH build include #6454? I know @jczhang07 ran into an issue like this that lead to that PR.

The MPICH build was a fresh build of main the morning of this issue being opened. So yes.

@raffenet
Copy link
Contributor

Does your MPICH build include #6454? I know @jczhang07 ran into an issue like this that lead to that PR.

The MPICH build was a fresh build of main the morning of this issue being opened. So yes.

OK we will take a look.

@hzhou
Copy link
Contributor

hzhou commented Jun 13, 2023

@Jacobfaib Please try PR #6561. In addition, please note that --

  1. Use MPI_Init_thread and MPI_THREAD_MULTIPLE. The enqueue function will invoke background threads calling MPI
  2. Call cudaStreamSynchronize before MPI_Finalize. MPI require all communication complete before MPI_Finalize

@hzhou hzhou changed the title MPIX_Stream + MPIX_Irecv/Isend_enqueue() Multiple SEGV bug: MPIX_Stream + MPIX_Irecv/Isend_enqueue() Multiple SEGV Jun 13, 2023
@Jacobfaib
Copy link
Author

Ok will do

@Jacobfaib
Copy link
Author

@hzhou sorry for long hiatus, just wanted to confirm this is indeed fixed on my end!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants