Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault in GEOS-GCM with Intel on AWS ParallelCluster #988

Closed
climbfuji opened this issue Feb 6, 2024 · 16 comments · Fixed by #1108
Closed

Segfault in GEOS-GCM with Intel on AWS ParallelCluster #988

climbfuji opened this issue Feb 6, 2024 · 16 comments · Fixed by #1108
Assignees
Labels
bug Something is not working INFRA JEDI Infrastructure

Comments

@climbfuji
Copy link
Collaborator

Describe the bug
I am getting this error when I try to run GEOS-GCM (or other trivial executables that are part of GEOS-GCM) compiled with the Intel oneAPI compilers:

> ./GEOSgcm.x
./GEOSgcm.x: Relink `/opt/intel/oneapi/compiler/2024.0/lib/libirc.so' with `/lib/x86_64-linux-gnu/libc.so.6' for IFUNC symbol `memmove'
Segmentation fault (core dumped)

I am getting this on two AWS ParallelCluster systems:

  • Ubuntu 20.04 with Intel 2021.6.0 compilers (icc, icpc, ifort)
  • Ubuntu 22.04 with Intel 2024.0.2 compilers (icx, icpx, ifort)

I searched for similar errors on the web and found only a few pages; but none of them had any helpful information for my problem.

I am only getting this problem with GEOS-GCM. All other applications (JEDI-UFS, JEDI-MPAS, ...) are working just fine on the same system with the same spack-stack.

To Reproduce
Build spack-stack develop (or, until merged, PR #977) on AWS ParallelCluster (tried versions 3.2.0 and 3.8.0) using Ubuntu (tried 20.04 and 22.04) with Intel oneAPI compilers (see above), then build and set up GEOS using @mathomp4's TinyBCs tarball/scripts/input data.

Expected behavior
It works (like it does on Discover)

System:
See above

Additional context
The same spack-stack, GEOS code and TinyBCs tarball work fine with [email protected] on AWS ParallelCluster.

I created a ticket for Intel to look at this, too: https://community.intel.com/t5/Intel-C-Compiler/Relink-path-to-libirc-so-with-lib-x86-64-linux-gnu-libc-so-6-for/m-p/1568066/emcs_t/S2h8ZW1haWx8bWVudGlvbl9zdWJzY3JpcHRpb258TFMyQzhDWEMzMk5VRUF8MTU2ODA2NnxBVF9NRU5USU9OU3xoSw#M41713

@climbfuji climbfuji added the bug Something is not working label Feb 6, 2024
@climbfuji climbfuji changed the title GEOS-GCM with Intel on AWS ParallelCluster Segfault in GEOS-GCM with Intel on AWS ParallelCluster Feb 6, 2024
@climbfuji climbfuji self-assigned this Feb 6, 2024
@climbfuji climbfuji added the INFRA JEDI Infrastructure label Feb 6, 2024
@mathomp4
Copy link
Collaborator

mathomp4 commented Feb 6, 2024

Well I have never seen that error.

From my looking online, I thought perhaps it was due to LD_PRELOAD but we only do that with coupled runs.

What does ldd GEOSgcm.x show?

Also, just as a trial, can you try running GEOSgcm.x directly from the install dir? That is, edit gcm_run.j and instead of:

 setenv GEOSEXE $SCRDIR/GEOSgcm.x

do:

 setenv GEOSEXE $GEOSBIN/GEOSgcm.x

Not sure it will help, but it's another clue/test.

@mathomp4
Copy link
Collaborator

mathomp4 commented Feb 6, 2024

Hmm. I don't see the relink error in that last one. Only the fargparse one that I still think might be related to the GCC version used.

@climbfuji
Copy link
Collaborator Author

Hmm. I don't see the relink error in that last one. Only the fargparse one that I still think might be related to the GCC version used.

I am sorry, I got the two issues messed up. No, the relink error is still there for Intel. I will delete the GNU segfault comments above and repost them in that issue.

@climbfuji
Copy link
Collaborator Author

I need to recompile again on AWS, will take a bit. Will give you the ldd output as soon as I have it.

@climbfuji
Copy link
Collaborator Author

@mathomp4 Finally got to this. Interestingly, same for rs_numtiles.x. May have something to do with the order of linking in GEOS? I see that libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 is way up higher (i.e. later in loading order) than libirc.so => /opt/intel/oneapi/compiler/2022.1.0/linux/compiler/lib/intel64_lin/libirc.so (0x000014b385a9f000):

/mnt/experiments-zfs/dom.heinzeller/GEOS_20240204_INTEL/GEOSgcm/install/bin/rs_numtiles.x: Relink `/opt/intel/oneapi/compiler/2022.1.0/linux/compiler/lib/intel64_lin/libirc.so' with `/lib/x86_64-linux-gnu/libc.so.6' for IFUNC symbol `memmove'
>> Error << /opt/intel/mpi/2021.6.0/bin/mpirun  -np 1 /mnt/experiments-zfs/dom.heinzeller/GEOS_20240204_INTEL/GEOSgcm/install/bin/rs_numtiles.x openwater_internal_rst: status = 255; at /mnt/experiments-zfs/dom.heinzeller/GEOS_20240204_INTEL/GEOSgcm/install/bin/esma_mpirun line 377.
Error! Found  tiles in openwater. Expect to find 56625 tiles.
Your restarts are probably for a different ocean.

dom.heinzeller@ip-10-0-1-144:/mnt/experiments-zfs/dom.heinzeller/GEOS_20240204_INTEL/GEOSgcm/experiments/test-c12-20240207$ ldd /mnt/experiments-zfs/dom.heinzeller/GEOS_20240204_INTEL/GEOSgcm/install/bin/rs_numtiles.x
	linux-vdso.so.1 (0x00007ffcbeb18000)
	libMAPL.so => /mnt/experiments-zfs/dom.heinzeller/GEOS_20240204_INTEL/GEOSgcm/install/bin/../lib/libMAPL.so (0x000014b393419000)
	libMAPL.gridcomps.so => /mnt/experiments-zfs/dom.heinzeller/GEOS_20240204_INTEL/GEOSgcm/install/bin/../lib/libMAPL.gridcomps.so (0x000014b393414000)
	libMAPL.cap.so => /mnt/experiments-zfs/dom.heinzeller/GEOS_20240204_INTEL/GEOSgcm/install/bin/../lib/libMAPL.cap.so (0x000014b3931f5000)
	libMAPL.history.so => /mnt/experiments-zfs/dom.heinzeller/GEOS_20240204_INTEL/GEOSgcm/install/bin/../lib/libMAPL.history.so (0x000014b392f46000)
	libMAPL.ExtData.so => /mnt/experiments-zfs/dom.heinzeller/GEOS_20240204_INTEL/GEOSgcm/install/bin/../lib/libMAPL.ExtData.so (0x000014b392db1000)
	libMAPL.ExtData2G.so => /mnt/experiments-zfs/dom.heinzeller/GEOS_20240204_INTEL/GEOSgcm/install/bin/../lib/libMAPL.ExtData2G.so (0x000014b392944000)
	libMAPL.orbit.so => /mnt/experiments-zfs/dom.heinzeller/GEOS_20240204_INTEL/GEOSgcm/install/bin/../lib/libMAPL.orbit.so (0x000014b392927000)
	libMAPL.generic.so => /mnt/experiments-zfs/dom.heinzeller/GEOS_20240204_INTEL/GEOSgcm/install/bin/../lib/libMAPL.generic.so (0x000014b39219e000)
	libMAPL.oomph.so => /mnt/experiments-zfs/dom.heinzeller/GEOS_20240204_INTEL/GEOSgcm/install/bin/../lib/libMAPL.oomph.so (0x000014b392190000)
	libMAPL.griddedio.so => /mnt/experiments-zfs/dom.heinzeller/GEOS_20240204_INTEL/GEOSgcm/install/bin/../lib/libMAPL.griddedio.so (0x000014b391f9a000)
	libMAPL.base.so => /mnt/experiments-zfs/dom.heinzeller/GEOS_20240204_INTEL/GEOSgcm/install/bin/../lib/libMAPL.base.so (0x000014b3914eb000)
	libMAPL.pfio.so => /mnt/experiments-zfs/dom.heinzeller/GEOS_20240204_INTEL/GEOSgcm/install/bin/../lib/libMAPL.pfio.so (0x000014b3909bf000)
	libMAPL.profiler.so => /mnt/experiments-zfs/dom.heinzeller/GEOS_20240204_INTEL/GEOSgcm/install/bin/../lib/libMAPL.profiler.so (0x000014b390920000)
	libMAPL_cfio_r4.so => /mnt/experiments-zfs/dom.heinzeller/GEOS_20240204_INTEL/GEOSgcm/install/bin/../lib/libMAPL_cfio_r4.so (0x000014b390843000)
	libMAPL.field_utils.so => /mnt/experiments-zfs/dom.heinzeller/GEOS_20240204_INTEL/GEOSgcm/install/bin/../lib/libMAPL.field_utils.so (0x000014b39075c000)
	libMAPL.shared.so => /mnt/experiments-zfs/dom.heinzeller/GEOS_20240204_INTEL/GEOSgcm/install/bin/../lib/libMAPL.shared.so (0x000014b39004c000)
	libMAPL.constants.so => /mnt/experiments-zfs/dom.heinzeller/GEOS_20240204_INTEL/GEOSgcm/install/bin/../lib/libMAPL.constants.so (0x000014b390047000)
	/mnt/experiments-zfs/dom.heinzeller/GEOS_20240204_INTEL/spack-stack-geos-20240204/envs/ue-intel-2021.4.0/install/intel/2022.1.0/esmf-8.6.0-eebztba/lib/libesmf.so (0x000014b38da3f000)
	librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x000014b38da18000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x000014b38da12000)
	libnetcdf.so.19 => /mnt/experiments-zfs/dom.heinzeller/GEOS_20240204_INTEL/spack-stack-geos-20240204/envs/ue-intel-2021.4.0/install/intel/2022.1.0/netcdf-c-4.9.2-imgtbig/lib/libnetcdf.so.19 (0x000014b38d789000)
	libnetcdff.so.7 => /mnt/experiments-zfs/dom.heinzeller/GEOS_20240204_INTEL/spack-stack-geos-20240204/envs/ue-intel-2021.4.0/install/intel/2022.1.0/netcdf-fortran-4.6.1-szaq577/lib/libnetcdff.so.7 (0x000014b38d4e0000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x000014b38d391000)
	libpioc.so => /mnt/experiments-zfs/dom.heinzeller/GEOS_20240204_INTEL/spack-stack-geos-20240204/envs/ue-intel-2021.4.0/install/intel/2022.1.0/parallelio-2.6.2-27r4vm4/lib/libpioc.so (0x000014b38d32f000)
	libmpifort.so.12 => /opt/intel/mpi/2021.6.0/lib/libmpifort.so.12 (0x000014b38cf7b000)
	libmpi.so.12 => /opt/intel/mpi/2021.6.0/lib/release/libmpi.so.12 (0x000014b38b733000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x000014b38b710000)
	libiomp5.so => /opt/intel/oneapi/compiler/2022.1.0/linux/compiler/lib/intel64_lin/libiomp5.so (0x000014b38b2d7000)
	libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x000014b38b0f5000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x000014b38b0d8000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x000014b38aee6000)
	/lib64/ld-linux-x86-64.so.2 (0x000014b393422000)
	libifport.so.5 => /opt/intel/oneapi/compiler/2022.1.0/linux/compiler/lib/intel64_lin/libifport.so.5 (0x000014b38acb8000)
	libifcoremt.so.5 => /opt/intel/oneapi/compiler/2022.1.0/linux/compiler/lib/intel64_lin/libifcoremt.so.5 (0x000014b38ab1a000)
	libimf.so => /opt/intel/oneapi/compiler/2022.1.0/linux/compiler/lib/intel64_lin/libimf.so (0x000014b38a48c000)
	libsvml.so => /opt/intel/oneapi/compiler/2022.1.0/linux/compiler/lib/intel64_lin/libsvml.so (0x000014b38842a000)
	libintlc.so.5 => /opt/intel/oneapi/compiler/2022.1.0/linux/compiler/lib/intel64_lin/libintlc.so.5 (0x000014b3881b2000)
	libirng.so => /opt/intel/oneapi/compiler/2022.1.0/linux/compiler/lib/intel64_lin/libirng.so (0x000014b387e48000)
	libcilkrts.so.5 => /opt/intel/oneapi/compiler/2022.1.0/linux/compiler/lib/intel64_lin/libcilkrts.so.5 (0x000014b387c0b000)
	libmpicxx.so.12 => /opt/intel/mpi/2021.6.0/lib/libmpicxx.so.12 (0x000014b3879eb000)
	libhdf5_hl.so.310 => /mnt/experiments-zfs/dom.heinzeller/GEOS_20240204_INTEL/spack-stack-geos-20240204/envs/ue-intel-2021.4.0/install/intel/2022.1.0/hdf5-1.14.3-325mtto/lib/libhdf5_hl.so.310 (0x000014b3879c2000)
	libhdf5.so.310 => /mnt/experiments-zfs/dom.heinzeller/GEOS_20240204_INTEL/spack-stack-geos-20240204/envs/ue-intel-2021.4.0/install/intel/2022.1.0/hdf5-1.14.3-325mtto/lib/libhdf5.so.310 (0x000014b387464000)
	libz.so.1 => /mnt/experiments-zfs/dom.heinzeller/GEOS_20240204_INTEL/spack-stack-geos-20240204/envs/ue-intel-2021.4.0/install/intel/2022.1.0/zlib-1.2.13-ykgtk6v/lib/libz.so.1 (0x000014b387443000)
	libbz2.so.1.0 => /mnt/experiments-zfs/dom.heinzeller/GEOS_20240204_INTEL/spack-stack-geos-20240204/envs/ue-intel-2021.4.0/install/intel/2022.1.0/bzip2-1.0.8-k7harc6/lib/libbz2.so.1.0 (0x000014b38742c000)
	libzstd.so.1 => /mnt/experiments-zfs/dom.heinzeller/GEOS_20240204_INTEL/spack-stack-geos-20240204/envs/ue-intel-2021.4.0/install/intel/2022.1.0/zstd-1.5.2-dxxgszk/lib/libzstd.so.1 (0x000014b387235000)
	libblosc.so.1 => /mnt/experiments-zfs/dom.heinzeller/GEOS_20240204_INTEL/spack-stack-geos-20240204/envs/ue-intel-2021.4.0/install/intel/2022.1.0/c-blosc-1.21.5-ypsa3ts/lib/libblosc.so.1 (0x000014b38721a000)
	libxml2.so.2 => /mnt/experiments-zfs/dom.heinzeller/GEOS_20240204_INTEL/spack-stack-geos-20240204/envs/ue-intel-2021.4.0/install/intel/2022.1.0/libxml2-2.10.3-uxkkozt/lib/libxml2.so.2 (0x000014b386fe8000)
	libcurl.so.4 => /mnt/experiments-zfs/dom.heinzeller/GEOS_20240204_INTEL/spack-stack-geos-20240204/envs/ue-intel-2021.4.0/install/intel/2022.1.0/curl-8.4.0-5iqbwrf/lib/libcurl.so.4 (0x000014b386f06000)
	libpnetcdf.so.4 => /mnt/experiments-zfs/dom.heinzeller/GEOS_20240204_INTEL/spack-stack-geos-20240204/envs/ue-intel-2021.4.0/install/intel/2022.1.0/parallel-netcdf-1.12.3-f4irtcy/lib/libpnetcdf.so.4 (0x000014b3865c0000)
	liblz4.so.1 => /mnt/experiments-zfs/dom.heinzeller/GEOS_20240204_INTEL/spack-stack-geos-20240204/envs/ue-intel-2021.4.0/install/intel/2022.1.0/lz4-1.9.4-7vbdch7/lib/liblz4.so.1 (0x000014b386585000)
	liblzma.so.5 => /mnt/experiments-zfs/dom.heinzeller/GEOS_20240204_INTEL/spack-stack-geos-20240204/envs/ue-intel-2021.4.0/install/intel/2022.1.0/xz-5.4.1-w22j46m/lib/liblzma.so.5 (0x000014b386543000)
	libiconv.so.2 => /mnt/experiments-zfs/dom.heinzeller/GEOS_20240204_INTEL/spack-stack-geos-20240204/envs/ue-intel-2021.4.0/install/intel/2022.1.0/libiconv-1.17-apisgoq/lib/libiconv.so.2 (0x000014b3863df000)
	libnghttp2.so.14 => /mnt/experiments-zfs/dom.heinzeller/GEOS_20240204_INTEL/spack-stack-geos-20240204/envs/ue-intel-2021.4.0/install/intel/2022.1.0/nghttp2-1.57.0-7eoexqx/lib/libnghttp2.so.14 (0x000014b38639a000)
	libssl.so.3 => /mnt/experiments-zfs/dom.heinzeller/GEOS_20240204_INTEL/spack-stack-geos-20240204/envs/ue-intel-2021.4.0/install/intel/2022.1.0/openssl-3.1.3-vfm4fbv/lib64/libssl.so.3 (0x000014b3862e0000)
	libcrypto.so.3 => /mnt/experiments-zfs/dom.heinzeller/GEOS_20240204_INTEL/spack-stack-geos-20240204/envs/ue-intel-2021.4.0/install/intel/2022.1.0/openssl-3.1.3-vfm4fbv/lib64/libcrypto.so.3 (0x000014b385d17000)
	libirc.so => /opt/intel/oneapi/compiler/2022.1.0/linux/compiler/lib/intel64_lin/libirc.so (0x000014b385a9f000)

@climbfuji
Copy link
Collaborator Author

@mathomp4 I got the same problem on your very own Discover (Milan):

/gpfsm/dnb55/projects/p01/s2127/dheinzel/GEOS_20240119/GEOSgcm/install-intel-scu17/bin/rs_numtiles.x: Relink `/gpfsm/dulocal15/sles15/intel/oneapi/2021/compiler/2023.2.1/linux/compiler/lib/intel64_lin/libirc.so' with `/lib64/libc.so.6' for IFUNC symbol `memmove'
>> Error << /usr/local/intel/oneapi/2021/mpi/2021.10.0/bin/mpirun  -np 1 /gpfsm/dnb55/projects/p01/s2127/dheinzel/GEOS_20240119/GEOSgcm/install-intel-scu17/bin/rs_numtiles.x openwater_internal_rst: status = 255; at /gpfsm/dnb55/projects/p01/s2127/dheinzel/GEOS_20240119/GEOSgcm/install-intel-scu17/bin/esma_mpirun line 377.
Error! Found  tiles in openwater. Expect to find 56625 tiles.
Your restarts are probably for a different ocean.

The experiment is in

/discover/nobackup/projects/jcsda/s2127/dheinzel/GEOS_20240119/GEOSgcm/experiments/test-c12-intel-scu17-20240222

the build directory is

/discover/nobackup/projects/jcsda/s2127/dheinzel/GEOS_20240119/GEOSgcm/build-intel-scu17/

and the install directory is

/discover/nobackup/projects/jcsda/s2127/dheinzel/GEOS_20240119/GEOSgcm/install-intel-scu17/

Finally, the modules that I load are in

> cat /discover/nobackup/projects/jcsda/s2127/dheinzel/GEOS_20240119/setup-intel-scu17.sh
#!/bin/bash

module purge
module use /discover/swdev/gmao_SIteam/modulefiles-SLES15
module use /discover/swdev/jcsda/spack-stack/scu17/modulefiles
module load ecflow/5.11.4

module use /gpfsm/dnb55/projects/p01/s2127/dheinzel/spstmil/envs/unified-env-intel-2021.10.0/install/modulefiles/Core
module load stack-intel/2021.10.0
module load stack-intel-oneapi-mpi/2021.10.0
module load stack-python/3.10.13

module load geos-gcm-env

I wonder if the Intel version is a problem? On AWS ParallelCluster, it's [email protected], on SCU17 it's [email protected] - on SCU16, where we don't run into this problem, it's [email protected].

@mathomp4
Copy link
Collaborator

Well, we can use that version of Intel Fortran and Intel MPI. I think the only issue we've had with it is with MAPL3.

But let me be doubly sure. I'm building some Baselibs today anyway, I'll do a "latest with everything" one.

Also, what's the backing GCC for this? In my tests with latest ifort I've been using:

comp/gcc/11.4.0
comp/intel/2024.0.0 
mpi/impi/2021.11 
python/GEOSpyD/Min23.5.2-0_py3.11

@climbfuji
Copy link
Collaborator Author

This is the compiler config:

- compiler:
    spec: intel@=2021.10.0
    paths:
      cc: /usr/local/intel/oneapi/2021/compiler/2023.2.1/linux/bin/intel64/icc
      cxx: /usr/local/intel/oneapi/2021/compiler/2023.2.1/linux/bin/intel64/icpc
      f77: /usr/local/intel/oneapi/2021/compiler/2023.2.1/linux/bin/intel64/ifort
      fc: /usr/local/intel/oneapi/2021/compiler/2023.2.1/linux/bin/intel64/ifort
    flags:
      cflags: -diag-disable=10441
      cxxflags: -diag-disable=10441
      fflags: -diag-disable=10448
    operating_system: sles15
    target: x86_64
    modules:
    - comp/intel/2023.2.1
    environment:
      prepend_path:
        PATH: '/usr/local/other/gcc/11.4.0/bin'
        CPATH: '/usr/local/other/gcc/11.4.0/include'
        LD_LIBRARY_PATH: '/usr/local/intel/oneapi/2021/compiler/2023.2.1/linux/compiler/lib/intel64_lin:/usr/local/other/gcc/11.4.0/lib64'
      #set:
      #  I_MPI_ROOT: '/usr/local/intel/oneapi/2021/mpi/2021.5.0'
    extra_rpaths: []

This is the impi config:

  mpi:
    buildable: False
  intel-oneapi-mpi:
    externals:
    - spec: [email protected]%intel@=2021.10.0
      prefix: /usr/local/intel/oneapi/2021
      modules:
      - mpi/impi/2021.10.0

Finally, Python is built by Spack (3.10.13). To "drive" spack, we use the default/OS Python3.

@mathomp4
Copy link
Collaborator

Welp, my Intel 2021.11 + Intel MPI 2021.11 worked:

/discover/nobackup/mathomp4/Experiments/intel2021.11-2024Feb23-1day-c24

Mine is C24 yours is C12 but that shouldn't matter and you are using a few older subrepos than me it looks like, but all those updates were pretty minor. Hmm.

@mathomp4
Copy link
Collaborator

Doing an ldd rs_numtiles.x on your exectuable and mine shows that you have at the end:

	libirc.so => /gpfsm/dulocal15/sles15/intel/oneapi/2021/compiler/2023.2.1/linux/compiler/lib/intel64_lin/libirc.so (0x0000148124fb9000)

and I don't. Why is that showing up there??

@mathomp4
Copy link
Collaborator

Indeed it's only in your executable:

❯ rg irc dom_ldd.x my_ldd.x
dom_ldd.x
57:	libirc.so => /gpfsm/dulocal15/sles15/intel/oneapi/2021/compiler/2023.2.1/linux/compiler/lib/intel64_lin/libirc.so (0x0000148124fb9000)

So our Baselibs build doesn't add it, but something in the spack-stack seems to trigger its addition and at the end!

@climbfuji
Copy link
Collaborator Author

This is so puzzling. There's nothing special in the spack-stack build that I can think of.

@mathomp4
Copy link
Collaborator

Well, I am seeing something interesting which points to...something. Namely, Baselibs (for historic reasons), builds pretty much everything as a static library. ESMF and zlib seem to be built as both, but for things like HDF5, netcdf-c and netcdf-fortran, it's static-only all the way.

I've never changed that in Baselibs because, well, what we have works and I don't want to rock the boat.

So, if you do an ldd GEOSgcm.x say on my builds, it's much smaller than yours because your builds have the shared libraries rather than static.

@mathomp4
Copy link
Collaborator

Also, when I built Baselibs with 2021.11 I was using icx and icpx not icc and icpc. The main reason being, Intel 2024.0.0 doesn't have icc and icpc.

But as you said above, you got this error on AWS as well with icx and icpx.

Something in the link chain must be bringing in libirc in this way...

@climbfuji
Copy link
Collaborator Author

@mathomp4 I did identical builds on SCU16 and SCU17 with Intel. SCU16 runs fine, SCU17 has the above relink issue. I then compare the log files one by one, and the only difference that is persistent across the build process is that make for SCU16 links the pthread library as -lpthread, while for SCU17 it uses /usr/lib64/libpthread.so`. Same place for every single build/link command.

Screen Shot 2024-03-08 at 8 16 49 PM

@climbfuji
Copy link
Collaborator Author

@mathomp4 @srherbener @AlexanderRichert-NOAA @RatkoVasic-NOAA - this is something for the spack-stack known issues.

The error described above, happening on aws pcluster and discover scu17 with [email protected] is resolved by this suggestion from Intel:

Re: Relink /path/to/libirc.so' with /lib/x86_64-linux-gnu/libc.so.6' for IFUNC symbol `memmove'

First, I would have you run the command

ldd /opt/intel/oneapi/compiler/2024.0/lib/libirc.so

My guess is you will find that ldd reports that it is statically linked, even though it is not, as would be indicated by the file command. If this is the case, and you have patchelf installed on you system, run the command:

patchelf --add-needed libc.so.6 /opt/intel/oneapi/compiler/2024.0/lib/libirc.so

and then rerun your test.

Now, the problem is that on most systems, the compilers are provided by the sysadmins, means we'd have to ask them to fix that in the authoritative installation. OR install the compilers ourselves, provided that there's enough space. It's easy after all.

Note that I did not really check later versions of oneAPI - but this is a strong indicator that the problem still exists in oneAPI 2024.1:

[dheinzel@hercules-login-1 bin]$ ldd /work2/noaa/jcsda/dheinzel/oneapi/compiler/2024.1/lib/libirc.so
	statically linked

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is not working INFRA JEDI Infrastructure
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

2 participants