Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AddressSanitizer: SEGV src/util/yaksu_handle_pool.c:181 in yaksu_handle_pool_elem_get() #245

Open
Jacobfaib opened this issue May 12, 2023 · 1 comment

Comments

@Jacobfaib
Copy link

Jacobfaib commented May 12, 2023

The error

Caught signal 11 (Segmentation fault: address not mapped to object at address 0x170)
==== backtrace (tid:1229674) ====
 0  /home/ac.jfaibussowitsch/petsc/arch-cuda-debug/lib/libucs.so.0(ucs_debug_print_backtrace+0x33) [0x7fea6f92bcad]
 1  /home/ac.jfaibussowitsch/petsc/arch-cuda-debug/lib/libucs.so.0(ucs_handle_error+0x77) [0x7fea6f92ce0f]
 2  /home/ac.jfaibussowitsch/petsc/arch-cuda-debug/lib/libucs.so.0(+0x37bca) [0x7fea6f92cbca]
 3  /home/ac.jfaibussowitsch/petsc/arch-cuda-debug/lib/libucs.so.0(+0x37d2c) [0x7fea6f92cd2c]
 4  /home/ac.jfaibussowitsch/petsc/arch-cuda-debug/lib/libmpi.so.0(+0xbc3d12) [0x7fea2a9a1d12]
 5  /home/ac.jfaibussowitsch/petsc/arch-cuda-debug/lib/libmpi.so.0(+0xbd6a69) [0x7fea2a9b4a69]
 6  /home/ac.jfaibussowitsch/petsc/arch-cuda-debug/lib/libmpi.so.0(+0xbd376e) [0x7fea2a9b176e]
 7  /home/ac.jfaibussowitsch/petsc/arch-cuda-debug/lib/libmpi.so.0(+0xa1084b) [0x7fea2a7ee84b]
 8  /home/ac.jfaibussowitsch/petsc/arch-cuda-debug/lib/libmpi.so.0(+0xa4ab68) [0x7fea2a828b68]
 9  /home/ac.jfaibussowitsch/petsc/arch-cuda-debug/lib/libmpi.so.0(+0xa4a8bb) [0x7fea2a8288bb]
10  /home/ac.jfaibussowitsch/petsc/arch-cuda-debug/lib/libmpi.so.0(+0x84fad5) [0x7fea2a62dad5]
11  /home/ac.jfaibussowitsch/petsc/arch-cuda-debug/lib/libmpi.so.0(PMPI_Init+0x27) [0x7fea2a62db72]
12  ./yaksa_test(+0x125f) [0x559bf2c2325f]
13  /lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7fea283a6d90]
14  /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80) [0x7fea283a6e40]
15  ./yaksa_test(_start+0x25) [0x559bf2c230e5]
=================================
AddressSanitizer:DEADLYSIGNAL
=================================================================
==1229674==ERROR: AddressSanitizer: SEGV on unknown address 0x25320012c36a (pc 0x7fea2a9a1d12 bp 0x7fff52366090 sp 0x7fff52366040 T0)
==1229674==The signal is caused by a READ memory access.
    #0 0x7fea2a9a1d12 in yaksu_handle_pool_elem_get src/util/yaksu_handle_pool.c:181
    #1 0x7fea2a9b4a68 in yaksi_type_get src/frontend/types/yaksi_type.c:49
    #2 0x7fea2a9b176d in yaksa_type_create_contig src/frontend/types/yaksa_contig.c:76
    #3 0x7fea2a7ee84a in MPIR_Typerep_init src/mpi/datatype/typerep/src/typerep_yaksa_init.c:420
    #4 0x7fea2a828b67 in MPII_Init_thread src/mpi/init/mpir_init.c:165
    #5 0x7fea2a8288ba in MPIR_Init_impl src/mpi/init/mpir_init.c:102
    #6 0x7fea2a62dad4 in internal_Init src/binding/c/c_binding.c:45678
    #7 0x7fea2a62db71 in PMPI_Init src/binding/c/c_binding.c:45730
    #8 0x559bf2c2325e in main /home/ac.jfaibussowitsch/petsc/src/ksp/ksp/tests/yaksa_test.c:8
    #9 0x7fea283a6d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
    #10 0x7fea283a6e3f in __libc_start_main_impl ../csu/libc-start.c:392
    #11 0x559bf2c230e4 in _start (/scratch/jfaibussowitsch/petsc/src/ksp/ksp/tests/yaksa_test+0x10e4)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV src/util/yaksu_handle_pool.c:181 in yaksu_handle_pool_elem_get
==1229674==ABORTING

To reproduce

// mpicc -fsanitize=address yaksa_segv.c -o yaksa_segv
#include <mpi.h>

int main(int argc, char *argv[])
{
  MPI_Init(&argc, &argv);
}

The problem

$ gdb ./yaksa_segv
...
Thread 1 "bench_debug" received signal SIGSEGV, Segmentation fault.
0x00007fff973a1d12 in yaksu_handle_pool_elem_get (pool=0x0, handle=38, data=0x7fffffffbf28) at src/util/yaksu_handle_pool.c:181
181	        assert(handle_pool->handle_cache[handle]);
(gdb) p handle_pool
$1 = (handle_pool_s *) 0x0

yaksa_config.log
mpich_config.log

@Jacobfaib
Copy link
Author

Jacobfaib commented May 25, 2023

I have reduced this problem down to interference from ASAN (google/sanitizers#629) with CUDA runtime. This causes CUDA allocation functions to mysteriously fail, which yaksa apparently fails to check for. This problem manifests later as the above bug. Yaksa should check that allocation functions succeed or fail appropriately.

The fix for users is to globally set the ASAN option protect_shadow_gap=0 via

$ ASAN_OPTIONS=protect_shadow_gap=0 ./user_app

or in source via

extern "C" const char *__asan_default_options() { return "protect_shadow_gap=0"; }

Perhaps yaksa can set this when it detects ASAN when built with CUDA support. This can be done at compile time via

#ifndef __has_feature
  #define __has_feature(x) 0
#endif

#if __has_feature(address_sanitizer) || defined(__SANITIZE_ADDRESS__)
  // ASAN active
#endif

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant