Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fail to link op/avx component with nvhpc compilers #9444

Closed
ggouaillardet opened this issue Sep 30, 2021 · 22 comments · Fixed by #11605
Closed

fail to link op/avx component with nvhpc compilers #9444

ggouaillardet opened this issue Sep 30, 2021 · 22 comments · Fixed by #11605
Assignees
Milestone

Comments

@ggouaillardet
Copy link
Contributor

ggouaillardet commented Sep 30, 2021

As reported by Ray Muno on the user mailing list (https://www.mail-archive.com/[email protected]/msg34594.html), op/avx component cannot be linked when the nvhpc compilers are used

 CCLD     liblocal_ops_avx512.la
   CCLD     mca_op_avx.la
./.libs/liblocal_ops_avx512.a(liblocal_ops_avx512_la-op_avx_functions.o):(.data+0x0): multiple 
definition of `ompi_op_avx_functions_avx2'
./.libs/liblocal_ops_avx2.a(liblocal_ops_avx2_la-op_avx_functions.o):(.data+0x0): first defined here
./.libs/liblocal_ops_avx512.a(liblocal_ops_avx512_la-op_avx_functions.o): In function 
`ompi_op_avx_2buff_min_uint16_t_avx2':
/project/muno/OpenMPI/BUILD/SRC/openmpi-4.1.1/ompi/mca/op/avx/op_avx_functions.c:651: multiple 
definition of `ompi_op_avx_3buff_functions_avx2'
./.libs/liblocal_ops_avx2.a(liblocal_ops_avx2_la-op_avx_functions.o):/project/muno/OpenMPI/BUILD/SRC/openmpi-4.1.1/ompi/mca/op/avx/op_avx_functions.c:651: 
first defined here
make[2]: *** [mca_op_avx.la] Error 2

The root cause is Open MPI assumes the following macros are defined if the compiler supports the AVX512 features we expect

__AVX512BW__
 __AVX512F__
__AVX512VL__

and at least one of these macros is not defined by nvhpc compilers

From ompi/mca/op/avx/op_avx_functions.c

#if defined(GENERATE_AVX512_CODE)
#  if defined(__AVX512BW__) && defined(__AVX512F__) && defined(__AVX512VL__)
#    define PREPEND _avx512
#  else
#    undef GENERATE_AVX512_CODE
#  endif  /* defined(__AVX512BW__) && defined(__AVX512F__) && defined(__AVX512VL__) */
#endif  /* defined(GENERATE_AVX512_CODE) */

I am wondering whether we really need to re-test these macros. if GENERATE_AVX512_CODE is defined, should we not be ready to go (and hence no more need to test the avx512 related macros)?

@jsquyres
Copy link
Member

@ggouaillardet Is this related to #8919?

@bosilca
Copy link
Member

bosilca commented Sep 30, 2021

There is a lengthy comment explaining why we need to test these macros in the .c file. Most compilers converge toward a well-defined, almost portable, support for immintrin.h, apparently with the exception of nvc. Until they fix the compiler, I propose to completely drop the generation of AVX code. Can you please try to following patch.

diff --git a/ompi/mca/op/avx/configure.m4 b/ompi/mca/op/avx/configure.m4
index 44e834301b..1e45624abb 100644
--- a/ompi/mca/op/avx/configure.m4
+++ b/ompi/mca/op/avx/configure.m4
@@ -123,6 +123,27 @@ AC_DEFUN([MCA_ompi_op_avx_CONFIG],[
                               MCA_BUILD_OP_AVX512_FLAGS=""
                               AC_MSG_RESULT([no])])
                          CFLAGS="$op_avx_cflags_save"
+                        ])
+                  #
+                  # Detect and drop AVX support for compilers that do not indicate
+                  # explicit AVX capabilities via defines (aka nvc at least before 21.9)
+                  #
+                  AS_IF([test $op_avx512_support -eq 1],
+                        [AC_MSG_CHECKING([if AVX512 defines are available])
+                         op_avx_cflags_save="$CFLAGS"
+                         CFLAGS="$CFLAGS_WITHOUT_OPTFLAGS -O0 $MCA_BUILD_OP_AVX512_FLAGS"
+                         AC_LINK_IFELSE(
+                             [AC_LANG_PROGRAM([[#include <immintrin.h>]],
+                                      [[
+#if !defined(__AVX512BW__) || !defined(__AVX512F__) || !defined(__AVX512VL__)
+#error "This compiler claims support for AVX512 but lacks the necessary #define
+#endif
+                                      ]])],
+                             [AC_MSG_RESULT([yes])],
+                             [op_avx512_support=0
+                              MCA_BUILD_OP_AVX512_FLAGS=""
+                              AC_MSG_RESULT([no])])
+                         CFLAGS="$op_avx_cflags_save"
                         ])])
            #
            # Check support for AVX2

@cparrott73
Copy link

cparrott73 commented Oct 14, 2021

The nvhpc compilers team is aware of this issue, but there are two separate things going on here.

nvhpc compilers by default generate code targeted for the host CPU, which is different behavior than gcc, clang, and possibly other compilers - which all target a base x86_64 instruction set by default, and only enable enhanced code generation via additional flags (e.g. -march, -mavx, etc.). This means that the Open MPI ./configure test to check "does the compiler support AVX-512 intrinsics without passing any flags" will get different results depending on whether you are running ./configure on a Sandy Bridge (AVX), Haswell/Broadwell (AVX2), or Xeon Skylake (AVX-512). I would suggest that this is not a particularly reliable test to use with nvhpc compilers, for this reason.

My suggestion would be to disable this test when nvhpc (nvc/nvfortran) compilers are used, and follow the same path that gcc takes once this test returns a negative answer for gcc: go on to test the -mavx, -mavx2, -mavx512* flags individually. nvhpc compilers support these same flags as of 21.9, and you can test on those to determine the correct flags to pass to the compiler to enable the appropriate AVX intrinsics for compiling each version of the op-avx module. This will ensure that these files get built with the correct symbols when using nvhpc compilers.

Related to this, there is also a bug in 21.9 where the compiler detects that the host supports AVX-512 (e.g. Xeon Skylake), but not all of the AVX-512 macros were being passed by default. This led directly to the linking error reported initially with this issue.

Hopefully this helps.

@bosilca
Copy link
Member

bosilca commented Oct 15, 2021

I'm not sure I agree with your analysis about the check for additional compiler flags. What this part is doing is trying to figure out how to convince the compiler to generate code for particular flavors of the X86 ISA. In fact, we need to compile the same file 3 times, for AVX, AVX2 and AVX512 support, and then we merge everything together and decide at runtime which version to use based on cpuid. So the code you are pointing to only detect which one of the 3 cases is natively generated by the selected compiler and flags (because this generation includes the CFLAGS set by the user).

In any case the dealbreaker is that at compile time we will miss the AVX512 #define, and as a result we will compile the same file twice but with the same renaming scheme. Thus the safe approach is to disable AVX512 not only on what ISA the compiler promises to generate but also depending on the existence of the #define that drive our renaming.

@cparrott73
Copy link

Fair point, I may have misinterpreted what this particular test is trying to accomplish. I do normally test build libraries such as Open MPI with a "-tp px" flag for maximum portability, so we don't run into issues with illegal instructions on systems that don't match our build host (a Skylake Xeon system). Now that the bug has been fixed in the development tree of our compilers, I will double check that compiling Open MPI with this flag still works when performing the op/avx tests.

@cparrott73
Copy link

FYI I decided to test "-tp px" internally with our current development compiler build, but I have run into an unrelated issue. I have filed a bug on it. I will let you know when I am able to make progress again.

mpbelhorn added a commit to mpbelhorn/olcf-spack that referenced this issue Nov 19, 2021
See open-mpi/ompi#9444 for details. The patch
changes an m4 file so running a reconf stage is necessary even for the
distribution releases.

Autoreconf of ompi up to v4.1.1 *requires* automake v1.15.x,
unfortunately. So the default v1.16.x available in Spack must be
over-ridden in the spec and possibly dependency builds if all
OMPI-dependent builds done with a given nvhpc toolchain will share a
single build of a given OMPI release.

It is also necessary to build some components with `-fPIC` but it was
not clear which toolchain language drivers were missing the flag (which
is enabled by default). The PIC flag modifications in this commit are
probably excessive but seem to work.
@bartoldeman
Copy link

bartoldeman commented Nov 30, 2021

I think this is what makes the difference: NVHPC's immintrin.h has this:

#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) ||      \
    defined(__AVX512F__)
#include <avx512fintrin.h>
#endif

but GCC simply includes <avx512fintrin.h> which then does

#ifndef __AVX512F__
#pragma GCC push_options
#pragma GCC target("avx512f")
#define __DISABLE_AVX512F__
#endif /* __AVX512F__ */

NVHPC has no logic like this so it just compiles any avx512 intrinsics without issue (even if you use -tp px), i.e this test blurb:

#include <immintrin.h>

int main(void) {
    __m512 vA, vB;
    _mm512_add_ps(vA, vB);
    }

with -D__SCE__ it works ok (-D_MSC_VER not, I guess that does other things, pretending MSVC, I have no idea what SCE means).

$ nvc -tp skylake -D__SCE__ test.c
"test.c", line 8: warning: variable "vA" is used before its value is set
      _mm512_add_ps(vA, vB);
                    ^

"test.c", line 8: warning: variable "vB" is used before its value is set
      _mm512_add_ps(vA, vB);
                        ^
$ nvc -tp haswell -D__SCE__ test.c
"test.c", line 7: error: identifier "__m512" is undefined
      __m512 vA, vB;
      ^

"test.c", line 8: warning: function "_mm512_add_ps" declared implicitly
      _mm512_add_ps(vA, vB);
      ^

1 error detected in the compilation of "test.c".

it's an undocumented workaround so beware of demons...

@bosilca
Copy link
Member

bosilca commented Nov 30, 2021

The real issue is not that the compiler always generates AVX512 code, but that it fails to define one of the AVX512BW or AVX512F or AVX512VL defines to let us know it will generate the code. And that's exactly what my patch above addresses. Can you try and let me know if it works ?

@bartoldeman
Copy link

bartoldeman commented Nov 30, 2021

The real issue is not that the compiler always generates AVX512 code, but that it fails to define one of the AVX512BW or AVX512F or AVX512VL defines to let us know it will generate the code. And that's exactly what my patch above addresses. Can you try and let me know if it works ?

Your patch does what it's intended to do. But what puzzled me is that it's not necessary for NVHPC 21.7, even though both 21.7 & 21.9 have the strange AVX512 flag behaviour.

Now diffing immintrin.h for 21.7 and 21.9 I see this:

-#if defined(__AVX512F__)
+#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) ||      \
+    defined(__AVX512F__)
 #include <avx512fintrin.h>
 #endif

this explains why it compiles ok with 21.7, this checks are as follows:

checking for AVX512 support... yes
checking for AVX512 support (no additional flags)... no
checking for AVX512 support (with -mavx512f -mavx512bw -mavx512vl -mavx512dq)... yes
checking if _mm512_loadu_si512 generates code that can be compiled... yes
checking if _mm512_mullo_epi64 generates code that can be compiled... yes

but with 21.9 the first "no" turns into a "yes" because of the above header change.

21.7 and 21.9 BOTH do the following:

  1. nvc -tp skylake: defines AVX512BW and AVX512F and AVX512VL
  2. nvc -mavx512f -mavx512bw -mavx512vl -mavx512dq: defines all the above as well
  3. nvc without -tp on a skylake CPU: does not define AVX512BW, but does define AVX512F. And that's the weird situation you're encountering here (and I believe this is a bug in NVHPC)

Now I also have a proposal to adjust your patch: instead of giving up, set -mavx512f -mavx512bw -mavx512vl -mavx512dq. This can probably be done by changing this check, earlier in that file, to be more general:

#if defined(__ICC) && !defined(__AVX512F__)
#error "icc needs the -m flags to provide the AVX* detection macros
#endif

could be generalized to:

#if !defined(__AVX512BW__) || !defined(__AVX512F__) || !defined(__AVX512VL__)
#error "compiler needs the -m flags to provide the AVX* detection macros"
#endif

@bosilca
Copy link
Member

bosilca commented Nov 30, 2021

Interesting suggestion, it should indeed address all issues we were covering in this discussion. How about the following patch ?

diff --git a/ompi/mca/op/avx/configure.m4 b/ompi/mca/op/avx/configure.m4
index 44e834301b..223dd8207e 100644
--- a/ompi/mca/op/avx/configure.m4
+++ b/ompi/mca/op/avx/configure.m4
@@ -50,8 +50,8 @@ AC_DEFUN([MCA_ompi_op_avx_CONFIG],[
                   AC_LINK_IFELSE(
                       [AC_LANG_PROGRAM([[#include <immintrin.h>]],
                                        [[
-#if defined(__ICC) && !defined(__AVX512F__)
-#error "icc needs the -m flags to provide the AVX* detection macros"
+#if !defined(__AVX512BW__) || !defined(__AVX512F__) || !defined(__AVX512VL__)
+#error "compiler needs the -m flags to provide the AVX* detection macros"
 #endif
     __m512 vA, vB;
     _mm512_add_ps(vA, vB)
@@ -67,8 +67,8 @@ AC_DEFUN([MCA_ompi_op_avx_CONFIG],[
                          AC_LINK_IFELSE(
                              [AC_LANG_PROGRAM([[#include <immintrin.h>]],
                                               [[
-#if defined(__ICC) && !defined(__AVX512F__)
-#error "icc needs the -m flags to provide the AVX* detection macros"
+#if !defined(__AVX512BW__) || !defined(__AVX512F__) || !defined(__AVX512VL__)
+#error "compiler needs the -m flags to provide the AVX* detection macros"
 #endif
     __m512 vA, vB;
     _mm512_add_ps(vA, vB)
@@ -90,8 +90,8 @@ AC_DEFUN([MCA_ompi_op_avx_CONFIG],[
                          AC_LINK_IFELSE(
                              [AC_LANG_PROGRAM([[#include <immintrin.h>]],
                                       [[
-#if defined(__ICC) && !defined(__AVX512F__)
-#error "icc needs the -m flags to provide the AVX* detection macros"
+#if !defined(__AVX512BW__) || !defined(__AVX512F__) || !defined(__AVX512VL__)
+#error "compiler needs the -m flags to provide the AVX* detection macros"
 #endif
     int A[] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16};
     __m512i vA = _mm512_loadu_si512((__m512i*)&(A[1]))
@@ -112,8 +112,8 @@ AC_DEFUN([MCA_ompi_op_avx_CONFIG],[
                          AC_LINK_IFELSE(
                              [AC_LANG_PROGRAM([[#include <immintrin.h>]],
                                       [[
-#if defined(__ICC) && !defined(__AVX512F__)
-#error "icc needs the -m flags to provide the AVX* detection macros"
+#if !defined(__AVX512BW__) || !defined(__AVX512F__) || !defined(__AVX512VL__)
+#error "compiler needs the -m flags to provide the AVX* detection macros"
 #endif
     __m512i vA, vB;
     _mm512_mullo_epi64(vA, vB)

@bartoldeman
Copy link

Finally got back to this. Bad news, with the above patch it still doesn't compile; it fails here:

  CCLD     ompi_info
/cvmfs/soft.computecanada.ca/gentoo/2020/usr/x86_64-pc-linux-gnu/binutils-bin/2.33.1/ld: ../../../ompi/.libs/libmpi.so: undefined reference to `llvm.x86.avx512.pmaxu.d'
/cvmfs/soft.computecanada.ca/gentoo/2020/usr/x86_64-pc-linux-gnu/binutils-bin/2.33.1/ld: ../../../ompi/.libs/libmpi.so: undefined reference to `llvm.x86.avx512.pmins.d'
/cvmfs/soft.computecanada.ca/gentoo/2020/usr/x86_64-pc-linux-gnu/binutils-bin/2.33.1/ld: ../../../ompi/.libs/libmpi.so: undefined reference to `llvm.x86.avx512.pmaxs.q'
/cvmfs/soft.computecanada.ca/gentoo/2020/usr/x86_64-pc-linux-gnu/binutils-bin/2.33.1/ld: ../../../ompi/.libs/libmpi.so: undefined reference to `llvm.x86.avx512.pmaxu.q'
/cvmfs/soft.computecanada.ca/gentoo/2020/usr/x86_64-pc-linux-gnu/binutils-bin/2.33.1/ld: ../../../ompi/.libs/libmpi.so: undefined reference to `llvm.x86.avx512.pminu.d'
/cvmfs/soft.computecanada.ca/gentoo/2020/usr/x86_64-pc-linux-gnu/binutils-bin/2.33.1/ld: ../../../ompi/.libs/libmpi.so: undefined reference to `llvm.x86.avx512.pminu.q'
/cvmfs/soft.computecanada.ca/gentoo/2020/usr/x86_64-pc-linux-gnu/binutils-bin/2.33.1/ld: ../../../ompi/.libs/libmpi.so: undefined reference to `llvm.x86.avx512.pmins.q'

I reduced it to an example with these 8 AVX512 functions:

#include <immintrin.h>

int main(int argc, char **argv)
{
    __m512i vecA = _mm512_loadu_si512((__m512 *)argv[0]);
    __m512i vecB = _mm512_loadu_si512((__m512 *)argv[1]);
    vecB = _mm512_max_epi64(vecA, vecB);
    vecB = _mm512_min_epi64(vecA, vecB);
    vecB = _mm512_max_epu64(vecA, vecB);
    vecB = _mm512_min_epu64(vecA, vecB);
    vecB = _mm512_max_epi32(vecA, vecB);
    vecB = _mm512_min_epi32(vecA, vecB);
    vecB = _mm512_max_epu32(vecA, vecB);
    vecB = _mm512_min_epu32(vecA, vecB);
    return (int)((char *)&vecB)[0];
}
$ nvc -tp skylake op_avx.c 
/cvmfs/soft.computecanada.ca/gentoo/2020/usr/x86_64-pc-linux-gnu/binutils-bin/2.33.1/ld: /tmp/nvclwCcHEoai04S.o: in function `_mm512_max_epu32':
/cvmfs/restricted.computecanada.ca/easybuild/software/2020/Core/nvhpc/22.1/Linux_x86_64/22.1/compilers/include/avx512fintrin.h:1111: undefined reference to `llvm.x86.avx512.pmaxu.d'
/cvmfs/soft.computecanada.ca/gentoo/2020/usr/x86_64-pc-linux-gnu/binutils-bin/2.33.1/ld: /tmp/nvclwCcHEoai04S.o: in function `_mm512_max_epi64':
/cvmfs/restricted.computecanada.ca/easybuild/software/2020/Core/nvhpc/22.1/Linux_x86_64/22.1/compilers/include/avx512fintrin.h:1133: undefined reference to `llvm.x86.avx512.pmaxs.q'
/cvmfs/soft.computecanada.ca/gentoo/2020/usr/x86_64-pc-linux-gnu/binutils-bin/2.33.1/ld: /tmp/nvclwCcHEoai04S.o: in function `_mm512_max_epu64':
/cvmfs/restricted.computecanada.ca/easybuild/software/2020/Core/nvhpc/22.1/Linux_x86_64/22.1/compilers/include/avx512fintrin.h:1155: undefined reference to `llvm.x86.avx512.pmaxu.q'
/cvmfs/soft.computecanada.ca/gentoo/2020/usr/x86_64-pc-linux-gnu/binutils-bin/2.33.1/ld: /tmp/nvclwCcHEoai04S.o: in function `_mm512_min_epi32':
/cvmfs/restricted.computecanada.ca/easybuild/software/2020/Core/nvhpc/22.1/Linux_x86_64/22.1/compilers/include/avx512fintrin.h:1324: undefined reference to `llvm.x86.avx512.pmins.d'
/cvmfs/soft.computecanada.ca/gentoo/2020/usr/x86_64-pc-linux-gnu/binutils-bin/2.33.1/ld: /tmp/nvclwCcHEoai04S.o: in function `_mm512_min_epu32':
/cvmfs/restricted.computecanada.ca/easybuild/software/2020/Core/nvhpc/22.1/Linux_x86_64/22.1/compilers/include/avx512fintrin.h:1346: undefined reference to `llvm.x86.avx512.pminu.d'
/cvmfs/soft.computecanada.ca/gentoo/2020/usr/x86_64-pc-linux-gnu/binutils-bin/2.33.1/ld: /tmp/nvclwCcHEoai04S.o: in function `_mm512_min_epi64':
/cvmfs/restricted.computecanada.ca/easybuild/software/2020/Core/nvhpc/22.1/Linux_x86_64/22.1/compilers/include/avx512fintrin.h:1368: undefined reference to `llvm.x86.avx512.pmins.q'
/cvmfs/soft.computecanada.ca/gentoo/2020/usr/x86_64-pc-linux-gnu/binutils-bin/2.33.1/ld: /tmp/nvclwCcHEoai04S.o: in function `_mm512_min_epu64':
/cvmfs/restricted.computecanada.ca/easybuild/software/2020/Core/nvhpc/22.1/Linux_x86_64/22.1/compilers/include/avx512fintrin.h:1390: undefined reference to `llvm.x86.avx512.pminu.q'

Bug reported to NVIDIA: https://forums.developer.nvidia.com/t/issue-with-some-avx512-intrinsics-min-max-ep-i-u-32-64/200868

@bosilca
Copy link
Member

bosilca commented Jan 18, 2022

Awesome, they generate the AVX512 code, but fail at the linking stage due to missing intrinsics. Bottom line, yet another broken compiler, that we need to handle. Honestly I have 0 interest or incentive in fixing their mess, if anybody at Nvidia want's to fix this I will find the time to review their patch. Until then, people using nvc to compile Open MPI should disable any support for AVX*.

@jsquyres
Copy link
Member

@janjust Can you find the right people at Nvidia to look into this? Thanks!

@cparrott73
Copy link

@bosilca @jsquyres

Thanks so much for this report. We have seen this issue pop up from a couple of other sources recently, and it is under investigation. I am hopeful we can push a fix out in the next release of the HPC SDK.

Should you encounter any issues with the HPC SDK compilers, copy me on them, and I'll see to it that they get in front of the appropriate eyeballs. Thanks!

@cparrott73
Copy link

FYI - it appears this issue has been fixed internally, and the fix should be available to all when the 22.2 release drops soon.

@bwbarrett bwbarrett modified the milestones: v4.1.3, v4.1.4 Mar 31, 2022
@bwbarrett bwbarrett modified the milestones: v4.1.4, v4.1.5 May 25, 2022
mpbelhorn added a commit to mpbelhorn/olcf-spack that referenced this issue May 31, 2022
See open-mpi/ompi#9444 for details. The patch
changes an m4 file so running a reconf stage is necessary even for the
distribution releases.

Autoreconf of ompi up to v4.1.1 *requires* automake v1.15.x,
unfortunately. So the default v1.16.x available in Spack must be
over-ridden in the spec and possibly dependency builds if all
OMPI-dependent builds done with a given nvhpc toolchain will share a
single build of a given OMPI release.

It is also necessary to build some components with `-fPIC` but it was
not clear which toolchain language drivers were missing the flag (which
is enabled by default). The PIC flag modifications in this commit are
probably excessive but seem to work.
mpbelhorn added a commit to mpbelhorn/olcf-spack that referenced this issue May 31, 2022
See open-mpi/ompi#9444 for details. The patch
changes an m4 file so running a reconf stage is necessary even for the
distribution releases.

Autoreconf of ompi up to v4.1.1 *requires* automake v1.15.x,
unfortunately. So the default v1.16.x available in Spack must be
over-ridden in the spec and possibly dependency builds if all
OMPI-dependent builds done with a given nvhpc toolchain will share a
single build of a given OMPI release.

It is also necessary to build some components with `-fPIC` but it was
not clear which toolchain language drivers were missing the flag (which
is enabled by default). The PIC flag modifications in this commit are
probably excessive but seem to work.
mpbelhorn added a commit to mpbelhorn/olcf-spack that referenced this issue Jun 29, 2022
See open-mpi/ompi#9444 for details. The patch
changes an m4 file so running a reconf stage is necessary even for the
distribution releases.

Autoreconf of ompi up to v4.1.1 *requires* automake v1.15.x,
unfortunately. So the default v1.16.x available in Spack must be
over-ridden in the spec and possibly dependency builds if all
OMPI-dependent builds done with a given nvhpc toolchain will share a
single build of a given OMPI release.

It is also necessary to build some components with `-fPIC` but it was
not clear which toolchain language drivers were missing the flag (which
is enabled by default). The PIC flag modifications in this commit are
probably excessive but seem to work.
sethrj pushed a commit to sethrj/spack that referenced this issue Jan 12, 2023
See open-mpi/ompi#9444 for details. The patch
changes an m4 file so running a reconf stage is necessary even for the
distribution releases.

Autoreconf of ompi up to v4.1.1 *requires* automake v1.15.x,
unfortunately. So the default v1.16.x available in Spack must be
over-ridden in the spec and possibly dependency builds if all
OMPI-dependent builds done with a given nvhpc toolchain will share a
single build of a given OMPI release.

It is also necessary to build some components with `-fPIC` but it was
not clear which toolchain language drivers were missing the flag (which
is enabled by default). The PIC flag modifications in this commit are
probably excessive but seem to work.
@bwbarrett bwbarrett modified the milestones: v4.1.5, v4.1.6 Feb 23, 2023
@zzzoom
Copy link
Contributor

zzzoom commented Mar 9, 2023

I'm facing a similar error when using the intel compiler on old KNL nodes (configure assumes AVX512BW and AVX512VL support is present).

As far as I understand, the problem is that the AVX512 flavor of ompi_op_avx_functions falls back to AVX2 during compilation because it won't be able to compile AVX512 code, and those symbols clash with the properly detected AVX2 flavor of ompi_op_avx_functions. So my question is, why is the AVX512 to AVX2 fallback being compiled at all when we know that vanilla AVX2 is being built anyway?

@bosilca
Copy link
Member

bosilca commented Mar 13, 2023

@zzzoom I'm not sure I understand your comment here. Each backend generates functions properly name-spaced (aka. the PREPEND #define identify the level of AVX support). Looking at the macro definitions and Makefile.am I don't see a way to generate collisions in the op function names. Can you provide the list of name collisions; the output of the configure related to the QVX detection and Make the AVX compilation output (from a verbose make).

@zzzoom
Copy link
Contributor

zzzoom commented Mar 13, 2023

@bosilca

mca/op/avx/.libs/libmca_op_avx.a(liblocal_ops_avx512_la-op_avx_functions.o):(.data+0x0): multiple definition of `ompi_op_avx_functions_avx2'
mca/op/avx/.libs/libmca_op_avx.a(liblocal_ops_avx2_la-op_avx_functions.o):(.data+0x0): first defined here
mca/op/avx/.libs/libmca_op_avx.a(liblocal_ops_avx512_la-op_avx_functions.o):(.data+0x1340): multiple definition of `ompi_op_avx_3buff_functions_avx2'
mca/op/avx/.libs/libmca_op_avx.a(liblocal_ops_avx2_la-op_avx_functions.o):(.data+0x1340): first defined here
make[2]: *** [Makefile:3291: libmpi.la] Error 1
configure:5979: +++ Configuring MCA framework op
configure:337668: checking for no configure components in framework op
configure:337670: result: 
configure:337672: checking for m4 configure components in framework op
configure:337674: result: avx
configure:5994: --- MCA component op:avx (m4 configuration macro)
configure:337781: checking for MCA component op:avx compile mode
configure:337787: result: static
configure:337861: checking for AVX512 support
configure:337868: result: yes
configure:337871: checking for AVX512 support (no additional flags)
configure:337890: /home/bc/ccad/stack/23.02/spack/lib/spack/env/intel/icc -o conftest -O3 -DNDEBUG -finline-functions -fno-strict-aliasing -restrict -Qoption,cpp,--extended_float_types -pthread   -I/home/bc/ccad
/stack/23.02/base/linux-rocky8-eulogia/intel-2021.2.0/zlib-1.2.13-pvq57ala447g5nnboh7zinbquuo744bw/include   -I/home/bc/ccad/stack/23.02/base/linux-rocky8-eulogia/intel-2021.2.0/hwloc-2.8.0-6uoivskxtmt67y7q6hh5m
m3yytsqpuaa/include  -I/usr/local/include -I/usr/local/include   -L/home/bc/ccad/stack/23.02/base/linux-rocky8-eulogia/intel-2021.2.0/zlib-1.2.13-pvq57ala447g5nnboh7zinbquuo744bw/lib   -L/home/bc/ccad/stack/23.0
2/base/linux-rocky8-eulogia/intel-2021.2.0/hwloc-2.8.0-6uoivskxtmt67y7q6hh5mm3yytsqpuaa/lib  conftest.c -lrt -lutil  -lz  -lhwloc  -levent_core -levent_pthreads >&5
configure:337890: $? = 0
configure:337892: result: yes
configure:337942: checking if _mm512_loadu_si512 generates code that can be compiled
configure:337963: /home/bc/ccad/stack/23.02/spack/lib/spack/env/intel/icc -o conftest  -O0    -I/home/bc/ccad/stack/23.02/base/linux-rocky8-eulogia/intel-2021.2.0/zlib-1.2.13-pvq57ala447g5nnboh7zinbquuo744bw/inc
lude   -I/home/bc/ccad/stack/23.02/base/linux-rocky8-eulogia/intel-2021.2.0/hwloc-2.8.0-6uoivskxtmt67y7q6hh5mm3yytsqpuaa/include  -I/usr/local/include -I/usr/local/include   -L/home/bc/ccad/stack/23.02/base/linu
x-rocky8-eulogia/intel-2021.2.0/zlib-1.2.13-pvq57ala447g5nnboh7zinbquuo744bw/lib   -L/home/bc/ccad/stack/23.02/base/linux-rocky8-eulogia/intel-2021.2.0/hwloc-2.8.0-6uoivskxtmt67y7q6hh5mm3yytsqpuaa/lib  conftest.
c -lrt -lutil  -lz  -lhwloc  -levent_core -levent_pthreads >&5
configure:337963: $? = 0
configure:337964: result: yes
configure:337981: checking if _mm512_mullo_epi64 generates code that can be compiled
configure:338002: /home/bc/ccad/stack/23.02/spack/lib/spack/env/intel/icc -o conftest  -O0    -I/home/bc/ccad/stack/23.02/base/linux-rocky8-eulogia/intel-2021.2.0/zlib-1.2.13-pvq57ala447g5nnboh7zinbquuo744bw/inc
lude   -I/home/bc/ccad/stack/23.02/base/linux-rocky8-eulogia/intel-2021.2.0/hwloc-2.8.0-6uoivskxtmt67y7q6hh5mm3yytsqpuaa/include  -I/usr/local/include -I/usr/local/include   -L/home/bc/ccad/stack/23.02/base/linu
x-rocky8-eulogia/intel-2021.2.0/zlib-1.2.13-pvq57ala447g5nnboh7zinbquuo744bw/lib   -L/home/bc/ccad/stack/23.02/base/linux-rocky8-eulogia/intel-2021.2.0/hwloc-2.8.0-6uoivskxtmt67y7q6hh5mm3yytsqpuaa/lib  conftest.
c -lrt -lutil  -lz  -lhwloc  -levent_core -levent_pthreads >&5
configure:338002: $? = 0
configure:338003: result: yes
configure:338020: checking for AVX2 support
configure:338027: result: yes
configure:338030: checking for AVX2 support (no additional flags)
configure:338049: /home/bc/ccad/stack/23.02/spack/lib/spack/env/intel/icc -o conftest -O3 -DNDEBUG -finline-functions -fno-strict-aliasing -restrict -Qoption,cpp,--extended_float_types -pthread   -I/home/bc/ccad
/stack/23.02/base/linux-rocky8-eulogia/intel-2021.2.0/zlib-1.2.13-pvq57ala447g5nnboh7zinbquuo744bw/include   -I/home/bc/ccad/stack/23.02/base/linux-rocky8-eulogia/intel-2021.2.0/hwloc-2.8.0-6uoivskxtmt67y7q6hh5m
m3yytsqpuaa/include  -I/usr/local/include -I/usr/local/include   -L/home/bc/ccad/stack/23.02/base/linux-rocky8-eulogia/intel-2021.2.0/zlib-1.2.13-pvq57ala447g5nnboh7zinbquuo744bw/lib   -L/home/bc/ccad/stack/23.0
2/base/linux-rocky8-eulogia/intel-2021.2.0/hwloc-2.8.0-6uoivskxtmt67y7q6hh5mm3yytsqpuaa/lib  conftest.c -lrt -lutil  -lz  -lhwloc  -levent_core -levent_pthreads >&5
configure:338049: $? = 0
configure:338051: result: yes
configure:338100: checking if _mm256_loadu_si256 generates code that can be compiled
configure:338121: /home/bc/ccad/stack/23.02/spack/lib/spack/env/intel/icc -o conftest  -O0    -I/home/bc/ccad/stack/23.02/base/linux-rocky8-eulogia/intel-2021.2.0/zlib-1.2.13-pvq57ala447g5nnboh7zinbquuo744bw/include   -I/home/bc/ccad/stack/23.02/base/linux-rocky8-eulogia/intel-2021.2.0/hwloc-2.8.0-6uoivskxtmt67y7q6hh5mm3yytsqpuaa/include  -I/usr/local/include -I/usr/local/include   -L/home/bc/ccad/stack/23.02/base/linux-rocky8-eulogia/intel-2021.2.0/zlib-1.2.13-pvq57ala447g5nnboh7zinbquuo744bw/lib   -L/home/bc/ccad/stack/23.02/base/linux-rocky8-eulogia/intel-2021.2.0/hwloc-2.8.0-6uoivskxtmt67y7q6hh5mm3yytsqpuaa/lib  conftest.c -lrt -lutil  -lz  -lhwloc  -levent_core -levent_pthreads >&5
configure:338121: $? = 0
configure:338122: result: yes

ICC's arch-related CFLAGS injected through Spack's wrapper were -xCOMMON-AVX512.

@bosilca
Copy link
Member

bosilca commented Mar 14, 2023

Apparently, it does not complain about functions being defined twice but about the array of functions being defined twice. And for each of the two arrays of functions (ompi_op_avx_functions_avx2 and ompi_op_avx_3buff_functions_avx2) the multiple definitions are in, what was supposed to be complimentary, compiled files. Extremely weird, as if somehow the PREPEND macro was redefined to point to the same text without affecting the generation of the function themselves.

I need to see how the different compiled versions of the op_avx_functions.c file are generated, basically what are the compile flags and defines that are used. Could you provide the output of make V=1 in the ompi/mca/op/avx directory?

@zzzoom
Copy link
Contributor

zzzoom commented Mar 14, 2023

Making all in mca/op/avx
make[2]: Entering directory '/tmp/bc/spack-stage/spack-stage-openmpi-4.1.5-uynnpmk3a2d2kvjbu3ae7x4hgtu42wfc/spack-src/ompi/mca/op/avx'
/bin/sh ../../../../libtool  --tag=CC   --mode=compile /home/bc/ccad/stack/23.02/spack/lib/spack/env/intel/icc -DHAVE_CONFIG_H -I. -I../../../../opal/include -I../../../../ompi/include -I../../../../oshmem/include -I../../../../opal/mca/hwloc/hwloc201/hwloc/include/private/autogen -I../../../../opal/mca/hwloc/hwloc201/hwloc/include/hwloc/autogen -I../../../../ompi/mpiext/cuda/c  -DGENERATE_AVX_CODE -DGENERATE_SSE3_CODE -DGENERATE_SSE41_CODE -I../../../.. -I../../../../orte/include   -I/home/bc/ccad/stack/23.02/base/linux-rocky8-eulogia/intel-2021.2.0/zlib-1.2.13-pvq57ala447g5nnboh7zinbquuo744bw/include   -I/home/bc/ccad/stack/23.02/base/linux-rocky8-eulogia/intel-2021.2.0/hwloc-2.8.0-6uoivskxtmt67y7q6hh5mm3yytsqpuaa/include  -I/usr/local/include -I/usr/local/include  -O3 -DNDEBUG -finline-functions -fno-strict-aliasing -restrict -Qoption,cpp,--extended_float_types -pthread -MT liblocal_ops_avx_la-op_avx_functions.lo -MD -MP -MF .deps/liblocal_ops_avx_la-op_avx_functions.Tpo -c -o liblocal_ops_avx_la-op_avx_functions.lo `test -f 'op_avx_functions.c' || echo './'`op_avx_functions.c
/bin/sh ../../../../libtool  --tag=CC   --mode=compile /home/bc/ccad/stack/23.02/spack/lib/spack/env/intel/icc -DHAVE_CONFIG_H -I. -I../../../../opal/include -I../../../../ompi/include -I../../../../oshmem/include -I../../../../opal/mca/hwloc/hwloc201/hwloc/include/private/autogen -I../../../../opal/mca/hwloc/hwloc201/hwloc/include/hwloc/autogen -I../../../../ompi/mpiext/cuda/c  -DGENERATE_SSE3_CODE -DGENERATE_SSE41_CODE -DGENERATE_AVX_CODE -DGENERATE_AVX2_CODE -I../../../.. -I../../../../orte/include   -I/home/bc/ccad/stack/23.02/base/linux-rocky8-eulogia/intel-2021.2.0/zlib-1.2.13-pvq57ala447g5nnboh7zinbquuo744bw/include   -I/home/bc/ccad/stack/23.02/base/linux-rocky8-eulogia/intel-2021.2.0/hwloc-2.8.0-6uoivskxtmt67y7q6hh5mm3yytsqpuaa/include  -I/usr/local/include -I/usr/local/include  -O3 -DNDEBUG -finline-functions -fno-strict-aliasing -restrict -Qoption,cpp,--extended_float_types -pthread -MT liblocal_ops_avx2_la-op_avx_functions.lo -MD -MP -MF .deps/liblocal_ops_avx2_la-op_avx_functions.Tpo -c -o liblocal_ops_avx2_la-op_avx_functions.lo `test -f 'op_avx_functions.c' || echo './'`op_avx_functions.c
/bin/sh ../../../../libtool  --tag=CC   --mode=compile /home/bc/ccad/stack/23.02/spack/lib/spack/env/intel/icc -DHAVE_CONFIG_H -I. -I../../../../opal/include -I../../../../ompi/include -I../../../../oshmem/include -I../../../../opal/mca/hwloc/hwloc201/hwloc/include/private/autogen -I../../../../opal/mca/hwloc/hwloc201/hwloc/include/hwloc/autogen -I../../../../ompi/mpiext/cuda/c  -DGENERATE_SSE3_CODE -DGENERATE_SSE41_CODE -DGENERATE_AVX_CODE -DGENERATE_AVX2_CODE -DGENERATE_AVX512_CODE -I../../../.. -I../../../../orte/include   -I/home/bc/ccad/stack/23.02/base/linux-rocky8-eulogia/intel-2021.2.0/zlib-1.2.13-pvq57ala447g5nnboh7zinbquuo744bw/include   -I/home/bc/ccad/stack/23.02/base/linux-rocky8-eulogia/intel-2021.2.0/hwloc-2.8.0-6uoivskxtmt67y7q6hh5mm3yytsqpuaa/include  -I/usr/local/include -I/usr/local/include  -O3 -DNDEBUG -finline-functions -fno-strict-aliasing -restrict -Qoption,cpp,--extended_float_types -pthread -MT liblocal_ops_avx512_la-op_avx_functions.lo -MD -MP -MF .deps/liblocal_ops_avx512_la-op_avx_functions.Tpo -c -o liblocal_ops_avx512_la-op_avx_functions.lo `test -f 'op_avx_functions.c' || echo './'`op_avx_functions.c
depbase=`echo op_avx_component.lo | sed 's|[^/]*$|.deps/&|;s|\.lo$||'`;\
/bin/sh ../../../../libtool  --tag=CC   --mode=compile /home/bc/ccad/stack/23.02/spack/lib/spack/env/intel/icc -DHAVE_CONFIG_H -I. -I../../../../opal/include -I../../../../ompi/include -I../../../../oshmem/include -I../../../../opal/mca/hwloc/hwloc201/hwloc/include/private/autogen -I../../../../opal/mca/hwloc/hwloc201/hwloc/include/hwloc/autogen -I../../../../ompi/mpiext/cuda/c   -I../../../.. -I../../../../orte/include   -I/home/bc/ccad/stack/23.02/base/linux-rocky8-eulogia/intel-2021.2.0/zlib-1.2.13-pvq57ala447g5nnboh7zinbquuo744bw/include   -I/home/bc/ccad/stack/23.02/base/linux-rocky8-eulogia/intel-2021.2.0/hwloc-2.8.0-6uoivskxtmt67y7q6hh5mm3yytsqpuaa/include  -I/usr/local/include -I/usr/local/include  -O3 -DNDEBUG -finline-functions -fno-strict-aliasing -restrict -Qoption,cpp,--extended_float_types -pthread -MT op_avx_component.lo -MD -MP -MF $depbase.Tpo -c -o op_avx_component.lo op_avx_component.c &&\
mv -f $depbase.Tpo $depbase.Plo
libtool: compile:  /home/bc/ccad/stack/23.02/spack/lib/spack/env/intel/icc -DHAVE_CONFIG_H -I. -I../../../../opal/include -I../../../../ompi/include -I../../../../oshmem/include -I../../../../opal/mca/hwloc/hwloc201/hwloc/include/private/autogen -I../../../../opal/mca/hwloc/hwloc201/hwloc/include/hwloc/autogen -I../../../../ompi/mpiext/cuda/c -DGENERATE_SSE3_CODE -DGENERATE_SSE41_CODE -DGENERATE_AVX_CODE -DGENERATE_AVX2_CODE -I../../../.. -I../../../../orte/include -I/home/bc/ccad/stack/23.02/base/linux-rocky8-eulogia/intel-2021.2.0/zlib-1.2.13-pvq57ala447g5nnboh7zinbquuo744bw/include -I/home/bc/ccad/stack/23.02/base/linux-rocky8-eulogia/intel-2021.2.0/hwloc-2.8.0-6uoivskxtmt67y7q6hh5mm3yytsqpuaa/include -I/usr/local/include -I/usr/local/include -O3 -DNDEBUG -finline-functions -fno-strict-aliasing -restrict -Qoption,cpp,--extended_float_types -pthread -MT liblocal_ops_avx2_la-op_avx_functions.lo -MD -MP -MF .deps/liblocal_ops_avx2_la-op_avx_functions.Tpo -c op_avx_functions.c  -fPIC -DPIC -o .libs/liblocal_ops_avx2_la-op_avx_functions.o
libtool: compile:  /home/bc/ccad/stack/23.02/spack/lib/spack/env/intel/icc -DHAVE_CONFIG_H -I. -I../../../../opal/include -I../../../../ompi/include -I../../../../oshmem/include -I../../../../opal/mca/hwloc/hwloc201/hwloc/include/private/autogen -I../../../../opal/mca/hwloc/hwloc201/hwloc/include/hwloc/autogen -I../../../../ompi/mpiext/cuda/c -I../../../.. -I../../../../orte/include -I/home/bc/ccad/stack/23.02/base/linux-rocky8-eulogia/intel-2021.2.0/zlib-1.2.13-pvq57ala447g5nnboh7zinbquuo744bw/include -I/home/bc/ccad/stack/23.02/base/linux-rocky8-eulogia/intel-2021.2.0/hwloc-2.8.0-6uoivskxtmt67y7q6hh5mm3yytsqpuaa/include -I/usr/local/include -I/usr/local/include -O3 -DNDEBUG -finline-functions -fno-strict-aliasing -restrict -Qoption,cpp,--extended_float_types -pthread -MT op_avx_component.lo -MD -MP -MF .deps/op_avx_component.Tpo -c op_avx_component.c  -fPIC -DPIC -o .libs/op_avx_component.o
libtool: compile:  /home/bc/ccad/stack/23.02/spack/lib/spack/env/intel/icc -DHAVE_CONFIG_H -I. -I../../../../opal/include -I../../../../ompi/include -I../../../../oshmem/include -I../../../../opal/mca/hwloc/hwloc201/hwloc/include/private/autogen -I../../../../opal/mca/hwloc/hwloc201/hwloc/include/hwloc/autogen -I../../../../ompi/mpiext/cuda/c -DGENERATE_AVX_CODE -DGENERATE_SSE3_CODE -DGENERATE_SSE41_CODE -I../../../.. -I../../../../orte/include -I/home/bc/ccad/stack/23.02/base/linux-rocky8-eulogia/intel-2021.2.0/zlib-1.2.13-pvq57ala447g5nnboh7zinbquuo744bw/include -I/home/bc/ccad/stack/23.02/base/linux-rocky8-eulogia/intel-2021.2.0/hwloc-2.8.0-6uoivskxtmt67y7q6hh5mm3yytsqpuaa/include -I/usr/local/include -I/usr/local/include -O3 -DNDEBUG -finline-functions -fno-strict-aliasing -restrict -Qoption,cpp,--extended_float_types -pthread -MT liblocal_ops_avx_la-op_avx_functions.lo -MD -MP -MF .deps/liblocal_ops_avx_la-op_avx_functions.Tpo -c op_avx_functions.c  -fPIC -DPIC -o .libs/liblocal_ops_avx_la-op_avx_functions.o
libtool: compile:  /home/bc/ccad/stack/23.02/spack/lib/spack/env/intel/icc -DHAVE_CONFIG_H -I. -I../../../../opal/include -I../../../../ompi/include -I../../../../oshmem/include -I../../../../opal/mca/hwloc/hwloc201/hwloc/include/private/autogen -I../../../../opal/mca/hwloc/hwloc201/hwloc/include/hwloc/autogen -I../../../../ompi/mpiext/cuda/c -DGENERATE_SSE3_CODE -DGENERATE_SSE41_CODE -DGENERATE_AVX_CODE -DGENERATE_AVX2_CODE -DGENERATE_AVX512_CODE -I../../../.. -I../../../../orte/include -I/home/bc/ccad/stack/23.02/base/linux-rocky8-eulogia/intel-2021.2.0/zlib-1.2.13-pvq57ala447g5nnboh7zinbquuo744bw/include -I/home/bc/ccad/stack/23.02/base/linux-rocky8-eulogia/intel-2021.2.0/hwloc-2.8.0-6uoivskxtmt67y7q6hh5mm3yytsqpuaa/include -I/usr/local/include -I/usr/local/include -O3 -DNDEBUG -finline-functions -fno-strict-aliasing -restrict -Qoption,cpp,--extended_float_types -pthread -MT liblocal_ops_avx512_la-op_avx_functions.lo -MD -MP -MF .deps/liblocal_ops_avx512_la-op_avx_functions.Tpo -c op_avx_functions.c  -fPIC -DPIC -o .libs/liblocal_ops_avx512_la-op_avx_functions.o
libtool: compile:  /home/bc/ccad/stack/23.02/spack/lib/spack/env/intel/icc -DHAVE_CONFIG_H -I. -I../../../../opal/include -I../../../../ompi/include -I../../../../oshmem/include -I../../../../opal/mca/hwloc/hwloc201/hwloc/include/private/autogen -I../../../../opal/mca/hwloc/hwloc201/hwloc/include/hwloc/autogen -I../../../../ompi/mpiext/cuda/c -I../../../.. -I../../../../orte/include -I/home/bc/ccad/stack/23.02/base/linux-rocky8-eulogia/intel-2021.2.0/zlib-1.2.13-pvq57ala447g5nnboh7zinbquuo744bw/include -I/home/bc/ccad/stack/23.02/base/linux-rocky8-eulogia/intel-2021.2.0/hwloc-2.8.0-6uoivskxtmt67y7q6hh5mm3yytsqpuaa/include -I/usr/local/include -I/usr/local/include -O3 -DNDEBUG -finline-functions -fno-strict-aliasing -restrict -Qoption,cpp,--extended_float_types -pthread -MT op_avx_component.lo -MD -MP -MF .deps/op_avx_component.Tpo -c op_avx_component.c -o op_avx_component.o >/dev/null 2>&1
libtool: compile:  /home/bc/ccad/stack/23.02/spack/lib/spack/env/intel/icc -DHAVE_CONFIG_H -I. -I../../../../opal/include -I../../../../ompi/include -I../../../../oshmem/include -I../../../../opal/mca/hwloc/hwloc201/hwloc/include/private/autogen -I../../../../opal/mca/hwloc/hwloc201/hwloc/include/hwloc/autogen -I../../../../ompi/mpiext/cuda/c -DGENERATE_AVX_CODE -DGENERATE_SSE3_CODE -DGENERATE_SSE41_CODE -I../../../.. -I../../../../orte/include -I/home/bc/ccad/stack/23.02/base/linux-rocky8-eulogia/intel-2021.2.0/zlib-1.2.13-pvq57ala447g5nnboh7zinbquuo744bw/include -I/home/bc/ccad/stack/23.02/base/linux-rocky8-eulogia/intel-2021.2.0/hwloc-2.8.0-6uoivskxtmt67y7q6hh5mm3yytsqpuaa/include -I/usr/local/include -I/usr/local/include -O3 -DNDEBUG -finline-functions -fno-strict-aliasing -restrict -Qoption,cpp,--extended_float_types -pthread -MT liblocal_ops_avx_la-op_avx_functions.lo -MD -MP -MF .deps/liblocal_ops_avx_la-op_avx_functions.Tpo -c op_avx_functions.c -o liblocal_ops_avx_la-op_avx_functions.o >/dev/null 2>&1
libtool: compile:  /home/bc/ccad/stack/23.02/spack/lib/spack/env/intel/icc -DHAVE_CONFIG_H -I. -I../../../../opal/include -I../../../../ompi/include -I../../../../oshmem/include -I../../../../opal/mca/hwloc/hwloc201/hwloc/include/private/autogen -I../../../../opal/mca/hwloc/hwloc201/hwloc/include/hwloc/autogen -I../../../../ompi/mpiext/cuda/c -DGENERATE_SSE3_CODE -DGENERATE_SSE41_CODE -DGENERATE_AVX_CODE -DGENERATE_AVX2_CODE -I../../../.. -I../../../../orte/include -I/home/bc/ccad/stack/23.02/base/linux-rocky8-eulogia/intel-2021.2.0/zlib-1.2.13-pvq57ala447g5nnboh7zinbquuo744bw/include -I/home/bc/ccad/stack/23.02/base/linux-rocky8-eulogia/intel-2021.2.0/hwloc-2.8.0-6uoivskxtmt67y7q6hh5mm3yytsqpuaa/include -I/usr/local/include -I/usr/local/include -O3 -DNDEBUG -finline-functions -fno-strict-aliasing -restrict -Qoption,cpp,--extended_float_types -pthread -MT liblocal_ops_avx2_la-op_avx_functions.lo -MD -MP -MF .deps/liblocal_ops_avx2_la-op_avx_functions.Tpo -c op_avx_functions.c -o liblocal_ops_avx2_la-op_avx_functions.o >/dev/null 2>&1
libtool: compile:  /home/bc/ccad/stack/23.02/spack/lib/spack/env/intel/icc -DHAVE_CONFIG_H -I. -I../../../../opal/include -I../../../../ompi/include -I../../../../oshmem/include -I../../../../opal/mca/hwloc/hwloc201/hwloc/include/private/autogen -I../../../../opal/mca/hwloc/hwloc201/hwloc/include/hwloc/autogen -I../../../../ompi/mpiext/cuda/c -DGENERATE_SSE3_CODE -DGENERATE_SSE41_CODE -DGENERATE_AVX_CODE -DGENERATE_AVX2_CODE -DGENERATE_AVX512_CODE -I../../../.. -I../../../../orte/include -I/home/bc/ccad/stack/23.02/base/linux-rocky8-eulogia/intel-2021.2.0/zlib-1.2.13-pvq57ala447g5nnboh7zinbquuo744bw/include -I/home/bc/ccad/stack/23.02/base/linux-rocky8-eulogia/intel-2021.2.0/hwloc-2.8.0-6uoivskxtmt67y7q6hh5mm3yytsqpuaa/include -I/usr/local/include -I/usr/local/include -O3 -DNDEBUG -finline-functions -fno-strict-aliasing -restrict -Qoption,cpp,--extended_float_types -pthread -MT liblocal_ops_avx512_la-op_avx_functions.lo -MD -MP -MF .deps/liblocal_ops_avx512_la-op_avx_functions.Tpo -c op_avx_functions.c -o liblocal_ops_avx512_la-op_avx_functions.o >/dev/null 2>&1
mv -f .deps/liblocal_ops_avx_la-op_avx_functions.Tpo .deps/liblocal_ops_avx_la-op_avx_functions.Plo
/bin/sh ../../../../libtool  --tag=CC   --mode=link /home/bc/ccad/stack/23.02/spack/lib/spack/env/intel/icc  -O3 -DNDEBUG -finline-functions -fno-strict-aliasing -restrict -Qoption,cpp,--extended_float_types -pthread  -L/home/bc/ccad/stack/23.02/base/linux-rocky8-eulogia/intel-2021.2.0/zlib-1.2.13-pvq57ala447g5nnboh7zinbquuo744bw/lib   -L/home/bc/ccad/stack/23.02/base/linux-rocky8-eulogia/intel-2021.2.0/hwloc-2.8.0-6uoivskxtmt67y7q6hh5mm3yytsqpuaa/lib  -o liblocal_ops_avx.la  liblocal_ops_avx_la-op_avx_functions.lo  -lrt -lutil  -lz  -lhwloc  -levent_core -levent_pthreads
libtool: link: ar cru .libs/liblocal_ops_avx.a .libs/liblocal_ops_avx_la-op_avx_functions.o
libtool: link: ranlib .libs/liblocal_ops_avx.a
libtool: link: ( cd ".libs" && rm -f "liblocal_ops_avx.la" && ln -s "../liblocal_ops_avx.la" "liblocal_ops_avx.la" )
mv -f .deps/liblocal_ops_avx2_la-op_avx_functions.Tpo .deps/liblocal_ops_avx2_la-op_avx_functions.Plo
/bin/sh ../../../../libtool  --tag=CC   --mode=link /home/bc/ccad/stack/23.02/spack/lib/spack/env/intel/icc  -O3 -DNDEBUG -finline-functions -fno-strict-aliasing -restrict -Qoption,cpp,--extended_float_types -pthread  -L/home/bc/ccad/stack/23.02/base/linux-rocky8-eulogia/intel-2021.2.0/zlib-1.2.13-pvq57ala447g5nnboh7zinbquuo744bw/lib   -L/home/bc/ccad/stack/23.02/base/linux-rocky8-eulogia/intel-2021.2.0/hwloc-2.8.0-6uoivskxtmt67y7q6hh5mm3yytsqpuaa/lib  -o liblocal_ops_avx2.la  liblocal_ops_avx2_la-op_avx_functions.lo  -lrt -lutil  -lz  -lhwloc  -levent_core -levent_pthreads
libtool: link: ar cru .libs/liblocal_ops_avx2.a .libs/liblocal_ops_avx2_la-op_avx_functions.o
libtool: link: ranlib .libs/liblocal_ops_avx2.a
libtool: link: ( cd ".libs" && rm -f "liblocal_ops_avx2.la" && ln -s "../liblocal_ops_avx2.la" "liblocal_ops_avx2.la" )
mv -f .deps/liblocal_ops_avx512_la-op_avx_functions.Tpo .deps/liblocal_ops_avx512_la-op_avx_functions.Plo
/bin/sh ../../../../libtool  --tag=CC   --mode=link /home/bc/ccad/stack/23.02/spack/lib/spack/env/intel/icc  -O3 -DNDEBUG -finline-functions -fno-strict-aliasing -restrict -Qoption,cpp,--extended_float_types -pthread  -L/home/bc/ccad/stack/23.02/base/linux-rocky8-eulogia/intel-2021.2.0/zlib-1.2.13-pvq57ala447g5nnboh7zinbquuo744bw/lib   -L/home/bc/ccad/stack/23.02/base/linux-rocky8-eulogia/intel-2021.2.0/hwloc-2.8.0-6uoivskxtmt67y7q6hh5mm3yytsqpuaa/lib  -o liblocal_ops_avx512.la  liblocal_ops_avx512_la-op_avx_functions.lo  -lrt -lutil  -lz  -lhwloc  -levent_core -levent_pthreads
libtool: link: ar cru .libs/liblocal_ops_avx512.a .libs/liblocal_ops_avx512_la-op_avx_functions.o
libtool: link: ranlib .libs/liblocal_ops_avx512.a
libtool: link: ( cd ".libs" && rm -f "liblocal_ops_avx512.la" && ln -s "../liblocal_ops_avx512.la" "liblocal_ops_avx512.la" )
/bin/sh ../../../../libtool  --tag=CC   --mode=link /home/bc/ccad/stack/23.02/spack/lib/spack/env/intel/icc  -O3 -DNDEBUG -finline-functions -fno-strict-aliasing -restrict -Qoption,cpp,--extended_float_types -pthread -module -avoid-version -L/home/bc/ccad/stack/23.02/base/linux-rocky8-eulogia/intel-2021.2.0/zlib-1.2.13-pvq57ala447g5nnboh7zinbquuo744bw/lib   -L/home/bc/ccad/stack/23.02/base/linux-rocky8-eulogia/intel-2021.2.0/hwloc-2.8.0-6uoivskxtmt67y7q6hh5mm3yytsqpuaa/lib  -o libmca_op_avx.la  op_avx_component.lo liblocal_ops_avx.la liblocal_ops_avx2.la liblocal_ops_avx512.la -lrt -lutil  -lz  -lhwloc  -levent_core -levent_pthreads
libtool: link: (cd .libs/libmca_op_avx.lax/liblocal_ops_avx.a && ar x "/tmp/bc/spack-stage/spack-stage-openmpi-4.1.5-uynnpmk3a2d2kvjbu3ae7x4hgtu42wfc/spack-src/ompi/mca/op/avx/./.libs/liblocal_ops_avx.a")
libtool: link: (cd .libs/libmca_op_avx.lax/liblocal_ops_avx2.a && ar x "/tmp/bc/spack-stage/spack-stage-openmpi-4.1.5-uynnpmk3a2d2kvjbu3ae7x4hgtu42wfc/spack-src/ompi/mca/op/avx/./.libs/liblocal_ops_avx2.a")
libtool: link: (cd .libs/libmca_op_avx.lax/liblocal_ops_avx512.a && ar x "/tmp/bc/spack-stage/spack-stage-openmpi-4.1.5-uynnpmk3a2d2kvjbu3ae7x4hgtu42wfc/spack-src/ompi/mca/op/avx/./.libs/liblocal_ops_avx512.a")
libtool: link: ar cru .libs/libmca_op_avx.a .libs/op_avx_component.o   .libs/libmca_op_avx.lax/liblocal_ops_avx.a/liblocal_ops_avx_la-op_avx_functions.o  .libs/libmca_op_avx.lax/liblocal_ops_avx2.a/liblocal_ops_avx2_la-op_avx_functions.o  .libs/libmca_op_avx.lax/liblocal_ops_avx512.a/liblocal_ops_avx512_la-op_avx_functions.o
libtool: link: ranlib .libs/libmca_op_avx.a
libtool: link: rm -fr .libs/libmca_op_avx.lax
libtool: link: ( cd ".libs" && rm -f "libmca_op_avx.la" && ln -s "../libmca_op_avx.la" "libmca_op_avx.la" )
make[2]: Leaving directory '/tmp/bc/spack-stage/spack-stage-openmpi-4.1.5-uynnpmk3a2d2kvjbu3ae7x4hgtu42wfc/spack-src/ompi/mca/op/avx'

@bosilca
Copy link
Member

bosilca commented Mar 14, 2023

ok, I see what's going on. There are too many cases to test, so we cut some corners between the configure and the .c functions file.

The best solution would be to protect each function instance with all the necessary checks, but the possible combinations are humongous, and this solution, while optimal, is not practical (at least I do not intend to spend the time required to make it work).

The second best approach is to cut short the AVX512 code generation completely, if any of the necessary features are missing. We can do this in two steps, prevent the AVX512 code generation at the configure.m4 level, and avoid name collision in all other cases.

Please test the patch below.

diff --git a/ompi/mca/op/avx/configure.m4 b/ompi/mca/op/avx/configure.m4
index 73c8048603..44ce505581 100644
--- a/ompi/mca/op/avx/configure.m4
+++ b/ompi/mca/op/avx/configure.m4
@@ -123,6 +123,28 @@ AC_DEFUN([MCA_ompi_op_avx_CONFIG],[
                               MCA_BUILD_OP_AVX512_FLAGS=""
                               AC_MSG_RESULT([no])])
                          CFLAGS="$op_avx_cflags_save"
+                        ])
+                  #
+                  # Check for combination of AVX512F + AVX512VL
+                  #
+                  AS_IF([test $op_avx512_support -eq 1],
+                        [AC_MSG_CHECKING([if _mm_max_epi64 generates code that can be compiled])
+                         op_avx_cflags_save="$CFLAGS"
+                         CFLAGS="$CFLAGS_WITHOUT_OPTFLAGS -O0 $MCA_BUILD_OP_AVX512_FLAGS"
+                         AC_LINK_IFELSE(
+                             [AC_LANG_PROGRAM([[#include <immintrin.h>]],
+                                      [[
+#if !defined(__AVX512F__) || !defined(__AVX512VL__) || !defined(__AVX512BW__)
+#error "icc needs the -m flags to provide the AVX* detection macros"
+#endif
+    __m128i vA, vB;
+    _mm_max_epi64(vA, vB)
+                                      ]])],
+                             [AC_MSG_RESULT([yes])],
+                             [op_avx512_support=0
+                              MCA_BUILD_OP_AVX512_FLAGS=""
+                              AC_MSG_RESULT([no])])
+                         CFLAGS="$op_avx_cflags_save"
                         ])])
            #
            # Check support for AVX2
diff --git a/ompi/mca/op/avx/op_avx_functions.c b/ompi/mca/op/avx/op_avx_functions.c
index bcfd3eb56f..d97d2d9633 100644
--- a/ompi/mca/op/avx/op_avx_functions.c
+++ b/ompi/mca/op/avx/op_avx_functions.c
@@ -32,16 +32,18 @@
  * to a lesser support (AVX512 -> AVX2, AVX2 -> AVX, AVX -> error out).
  */
 #if defined(GENERATE_AVX512_CODE)
+#  define PREPEND _avx512
 #  if defined(__AVX512BW__) && defined(__AVX512F__) && defined(__AVX512VL__)
-#    define PREPEND _avx512
+/* all good */
 #  else
 #    undef GENERATE_AVX512_CODE
 #  endif  /* defined(__AVX512BW__) && defined(__AVX512F__) && defined(__AVX512VL__) */
 #endif  /* defined(GENERATE_AVX512_CODE) */
 
 #if !defined(PREPEND) && defined(GENERATE_AVX2_CODE)
+#  define PREPEND _avx2
 #  if defined(__AVX2__)
-#    define PREPEND _avx2
+/* all good */
 #  else
 #    undef GENERATE_AVX2_CODE
 #  endif  /* defined(__AVX2__) */

@zzzoom
Copy link
Contributor

zzzoom commented Apr 19, 2023

Sorry for the delay, it works. Tested with GCC 12.2.0 and ICC 2021.2.0 on KNL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants