-
Notifications
You must be signed in to change notification settings - Fork 891
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fail to link op/avx component with nvhpc compilers #9444
Comments
@ggouaillardet Is this related to #8919? |
There is a lengthy comment explaining why we need to test these macros in the .c file. Most compilers converge toward a well-defined, almost portable, support for immintrin.h, apparently with the exception of nvc. Until they fix the compiler, I propose to completely drop the generation of AVX code. Can you please try to following patch. diff --git a/ompi/mca/op/avx/configure.m4 b/ompi/mca/op/avx/configure.m4
index 44e834301b..1e45624abb 100644
--- a/ompi/mca/op/avx/configure.m4
+++ b/ompi/mca/op/avx/configure.m4
@@ -123,6 +123,27 @@ AC_DEFUN([MCA_ompi_op_avx_CONFIG],[
MCA_BUILD_OP_AVX512_FLAGS=""
AC_MSG_RESULT([no])])
CFLAGS="$op_avx_cflags_save"
+ ])
+ #
+ # Detect and drop AVX support for compilers that do not indicate
+ # explicit AVX capabilities via defines (aka nvc at least before 21.9)
+ #
+ AS_IF([test $op_avx512_support -eq 1],
+ [AC_MSG_CHECKING([if AVX512 defines are available])
+ op_avx_cflags_save="$CFLAGS"
+ CFLAGS="$CFLAGS_WITHOUT_OPTFLAGS -O0 $MCA_BUILD_OP_AVX512_FLAGS"
+ AC_LINK_IFELSE(
+ [AC_LANG_PROGRAM([[#include <immintrin.h>]],
+ [[
+#if !defined(__AVX512BW__) || !defined(__AVX512F__) || !defined(__AVX512VL__)
+#error "This compiler claims support for AVX512 but lacks the necessary #define
+#endif
+ ]])],
+ [AC_MSG_RESULT([yes])],
+ [op_avx512_support=0
+ MCA_BUILD_OP_AVX512_FLAGS=""
+ AC_MSG_RESULT([no])])
+ CFLAGS="$op_avx_cflags_save"
])])
#
# Check support for AVX2 |
The nvhpc compilers team is aware of this issue, but there are two separate things going on here. nvhpc compilers by default generate code targeted for the host CPU, which is different behavior than gcc, clang, and possibly other compilers - which all target a base x86_64 instruction set by default, and only enable enhanced code generation via additional flags (e.g. -march, -mavx, etc.). This means that the Open MPI ./configure test to check "does the compiler support AVX-512 intrinsics without passing any flags" will get different results depending on whether you are running ./configure on a Sandy Bridge (AVX), Haswell/Broadwell (AVX2), or Xeon Skylake (AVX-512). I would suggest that this is not a particularly reliable test to use with nvhpc compilers, for this reason. My suggestion would be to disable this test when nvhpc (nvc/nvfortran) compilers are used, and follow the same path that gcc takes once this test returns a negative answer for gcc: go on to test the -mavx, -mavx2, -mavx512* flags individually. nvhpc compilers support these same flags as of 21.9, and you can test on those to determine the correct flags to pass to the compiler to enable the appropriate AVX intrinsics for compiling each version of the op-avx module. This will ensure that these files get built with the correct symbols when using nvhpc compilers. Related to this, there is also a bug in 21.9 where the compiler detects that the host supports AVX-512 (e.g. Xeon Skylake), but not all of the AVX-512 macros were being passed by default. This led directly to the linking error reported initially with this issue. Hopefully this helps. |
I'm not sure I agree with your analysis about the check for additional compiler flags. What this part is doing is trying to figure out how to convince the compiler to generate code for particular flavors of the X86 ISA. In fact, we need to compile the same file 3 times, for AVX, AVX2 and AVX512 support, and then we merge everything together and decide at runtime which version to use based on cpuid. So the code you are pointing to only detect which one of the 3 cases is natively generated by the selected compiler and flags (because this generation includes the CFLAGS set by the user). In any case the dealbreaker is that at compile time we will miss the AVX512 #define, and as a result we will compile the same file twice but with the same renaming scheme. Thus the safe approach is to disable AVX512 not only on what ISA the compiler promises to generate but also depending on the existence of the #define that drive our renaming. |
Fair point, I may have misinterpreted what this particular test is trying to accomplish. I do normally test build libraries such as Open MPI with a "-tp px" flag for maximum portability, so we don't run into issues with illegal instructions on systems that don't match our build host (a Skylake Xeon system). Now that the bug has been fixed in the development tree of our compilers, I will double check that compiling Open MPI with this flag still works when performing the op/avx tests. |
FYI I decided to test "-tp px" internally with our current development compiler build, but I have run into an unrelated issue. I have filed a bug on it. I will let you know when I am able to make progress again. |
See open-mpi/ompi#9444 for details. The patch changes an m4 file so running a reconf stage is necessary even for the distribution releases. Autoreconf of ompi up to v4.1.1 *requires* automake v1.15.x, unfortunately. So the default v1.16.x available in Spack must be over-ridden in the spec and possibly dependency builds if all OMPI-dependent builds done with a given nvhpc toolchain will share a single build of a given OMPI release. It is also necessary to build some components with `-fPIC` but it was not clear which toolchain language drivers were missing the flag (which is enabled by default). The PIC flag modifications in this commit are probably excessive but seem to work.
I think this is what makes the difference: NVHPC's
but GCC simply includes
NVHPC has no logic like this so it just compiles any avx512 intrinsics without issue (even if you use
with
it's an undocumented workaround so beware of demons... |
The real issue is not that the compiler always generates AVX512 code, but that it fails to define one of the AVX512BW or AVX512F or AVX512VL defines to let us know it will generate the code. And that's exactly what my patch above addresses. Can you try and let me know if it works ? |
Your patch does what it's intended to do. But what puzzled me is that it's not necessary for NVHPC 21.7, even though both 21.7 & 21.9 have the strange AVX512 flag behaviour. Now diffing
this explains why it compiles ok with 21.7, this checks are as follows:
but with 21.9 the first "no" turns into a "yes" because of the above header change. 21.7 and 21.9 BOTH do the following:
Now I also have a proposal to adjust your patch: instead of giving up, set
could be generalized to:
|
Interesting suggestion, it should indeed address all issues we were covering in this discussion. How about the following patch ? diff --git a/ompi/mca/op/avx/configure.m4 b/ompi/mca/op/avx/configure.m4
index 44e834301b..223dd8207e 100644
--- a/ompi/mca/op/avx/configure.m4
+++ b/ompi/mca/op/avx/configure.m4
@@ -50,8 +50,8 @@ AC_DEFUN([MCA_ompi_op_avx_CONFIG],[
AC_LINK_IFELSE(
[AC_LANG_PROGRAM([[#include <immintrin.h>]],
[[
-#if defined(__ICC) && !defined(__AVX512F__)
-#error "icc needs the -m flags to provide the AVX* detection macros"
+#if !defined(__AVX512BW__) || !defined(__AVX512F__) || !defined(__AVX512VL__)
+#error "compiler needs the -m flags to provide the AVX* detection macros"
#endif
__m512 vA, vB;
_mm512_add_ps(vA, vB)
@@ -67,8 +67,8 @@ AC_DEFUN([MCA_ompi_op_avx_CONFIG],[
AC_LINK_IFELSE(
[AC_LANG_PROGRAM([[#include <immintrin.h>]],
[[
-#if defined(__ICC) && !defined(__AVX512F__)
-#error "icc needs the -m flags to provide the AVX* detection macros"
+#if !defined(__AVX512BW__) || !defined(__AVX512F__) || !defined(__AVX512VL__)
+#error "compiler needs the -m flags to provide the AVX* detection macros"
#endif
__m512 vA, vB;
_mm512_add_ps(vA, vB)
@@ -90,8 +90,8 @@ AC_DEFUN([MCA_ompi_op_avx_CONFIG],[
AC_LINK_IFELSE(
[AC_LANG_PROGRAM([[#include <immintrin.h>]],
[[
-#if defined(__ICC) && !defined(__AVX512F__)
-#error "icc needs the -m flags to provide the AVX* detection macros"
+#if !defined(__AVX512BW__) || !defined(__AVX512F__) || !defined(__AVX512VL__)
+#error "compiler needs the -m flags to provide the AVX* detection macros"
#endif
int A[] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16};
__m512i vA = _mm512_loadu_si512((__m512i*)&(A[1]))
@@ -112,8 +112,8 @@ AC_DEFUN([MCA_ompi_op_avx_CONFIG],[
AC_LINK_IFELSE(
[AC_LANG_PROGRAM([[#include <immintrin.h>]],
[[
-#if defined(__ICC) && !defined(__AVX512F__)
-#error "icc needs the -m flags to provide the AVX* detection macros"
+#if !defined(__AVX512BW__) || !defined(__AVX512F__) || !defined(__AVX512VL__)
+#error "compiler needs the -m flags to provide the AVX* detection macros"
#endif
__m512i vA, vB;
_mm512_mullo_epi64(vA, vB) |
Finally got back to this. Bad news, with the above patch it still doesn't compile; it fails here:
I reduced it to an example with these 8 AVX512 functions:
Bug reported to NVIDIA: https://forums.developer.nvidia.com/t/issue-with-some-avx512-intrinsics-min-max-ep-i-u-32-64/200868 |
Awesome, they generate the AVX512 code, but fail at the linking stage due to missing intrinsics. Bottom line, yet another broken compiler, that we need to handle. Honestly I have 0 interest or incentive in fixing their mess, if anybody at Nvidia want's to fix this I will find the time to review their patch. Until then, people using nvc to compile Open MPI should disable any support for AVX*. |
@janjust Can you find the right people at Nvidia to look into this? Thanks! |
Thanks so much for this report. We have seen this issue pop up from a couple of other sources recently, and it is under investigation. I am hopeful we can push a fix out in the next release of the HPC SDK. Should you encounter any issues with the HPC SDK compilers, copy me on them, and I'll see to it that they get in front of the appropriate eyeballs. Thanks! |
FYI - it appears this issue has been fixed internally, and the fix should be available to all when the 22.2 release drops soon. |
See open-mpi/ompi#9444 for details. The patch changes an m4 file so running a reconf stage is necessary even for the distribution releases. Autoreconf of ompi up to v4.1.1 *requires* automake v1.15.x, unfortunately. So the default v1.16.x available in Spack must be over-ridden in the spec and possibly dependency builds if all OMPI-dependent builds done with a given nvhpc toolchain will share a single build of a given OMPI release. It is also necessary to build some components with `-fPIC` but it was not clear which toolchain language drivers were missing the flag (which is enabled by default). The PIC flag modifications in this commit are probably excessive but seem to work.
See open-mpi/ompi#9444 for details. The patch changes an m4 file so running a reconf stage is necessary even for the distribution releases. Autoreconf of ompi up to v4.1.1 *requires* automake v1.15.x, unfortunately. So the default v1.16.x available in Spack must be over-ridden in the spec and possibly dependency builds if all OMPI-dependent builds done with a given nvhpc toolchain will share a single build of a given OMPI release. It is also necessary to build some components with `-fPIC` but it was not clear which toolchain language drivers were missing the flag (which is enabled by default). The PIC flag modifications in this commit are probably excessive but seem to work.
See open-mpi/ompi#9444 for details. The patch changes an m4 file so running a reconf stage is necessary even for the distribution releases. Autoreconf of ompi up to v4.1.1 *requires* automake v1.15.x, unfortunately. So the default v1.16.x available in Spack must be over-ridden in the spec and possibly dependency builds if all OMPI-dependent builds done with a given nvhpc toolchain will share a single build of a given OMPI release. It is also necessary to build some components with `-fPIC` but it was not clear which toolchain language drivers were missing the flag (which is enabled by default). The PIC flag modifications in this commit are probably excessive but seem to work.
See open-mpi/ompi#9444 for details. The patch changes an m4 file so running a reconf stage is necessary even for the distribution releases. Autoreconf of ompi up to v4.1.1 *requires* automake v1.15.x, unfortunately. So the default v1.16.x available in Spack must be over-ridden in the spec and possibly dependency builds if all OMPI-dependent builds done with a given nvhpc toolchain will share a single build of a given OMPI release. It is also necessary to build some components with `-fPIC` but it was not clear which toolchain language drivers were missing the flag (which is enabled by default). The PIC flag modifications in this commit are probably excessive but seem to work.
I'm facing a similar error when using the intel compiler on old KNL nodes (configure assumes AVX512BW and AVX512VL support is present). As far as I understand, the problem is that the AVX512 flavor of |
@zzzoom I'm not sure I understand your comment here. Each backend generates functions properly name-spaced (aka. the PREPEND #define identify the level of AVX support). Looking at the macro definitions and Makefile.am I don't see a way to generate collisions in the op function names. Can you provide the list of name collisions; the output of the configure related to the QVX detection and Make the AVX compilation output (from a verbose make). |
mca/op/avx/.libs/libmca_op_avx.a(liblocal_ops_avx512_la-op_avx_functions.o):(.data+0x0): multiple definition of `ompi_op_avx_functions_avx2'
mca/op/avx/.libs/libmca_op_avx.a(liblocal_ops_avx2_la-op_avx_functions.o):(.data+0x0): first defined here
mca/op/avx/.libs/libmca_op_avx.a(liblocal_ops_avx512_la-op_avx_functions.o):(.data+0x1340): multiple definition of `ompi_op_avx_3buff_functions_avx2'
mca/op/avx/.libs/libmca_op_avx.a(liblocal_ops_avx2_la-op_avx_functions.o):(.data+0x1340): first defined here
make[2]: *** [Makefile:3291: libmpi.la] Error 1 configure:5979: +++ Configuring MCA framework op
configure:337668: checking for no configure components in framework op
configure:337670: result:
configure:337672: checking for m4 configure components in framework op
configure:337674: result: avx
configure:5994: --- MCA component op:avx (m4 configuration macro)
configure:337781: checking for MCA component op:avx compile mode
configure:337787: result: static
configure:337861: checking for AVX512 support
configure:337868: result: yes
configure:337871: checking for AVX512 support (no additional flags)
configure:337890: /home/bc/ccad/stack/23.02/spack/lib/spack/env/intel/icc -o conftest -O3 -DNDEBUG -finline-functions -fno-strict-aliasing -restrict -Qoption,cpp,--extended_float_types -pthread -I/home/bc/ccad
/stack/23.02/base/linux-rocky8-eulogia/intel-2021.2.0/zlib-1.2.13-pvq57ala447g5nnboh7zinbquuo744bw/include -I/home/bc/ccad/stack/23.02/base/linux-rocky8-eulogia/intel-2021.2.0/hwloc-2.8.0-6uoivskxtmt67y7q6hh5m
m3yytsqpuaa/include -I/usr/local/include -I/usr/local/include -L/home/bc/ccad/stack/23.02/base/linux-rocky8-eulogia/intel-2021.2.0/zlib-1.2.13-pvq57ala447g5nnboh7zinbquuo744bw/lib -L/home/bc/ccad/stack/23.0
2/base/linux-rocky8-eulogia/intel-2021.2.0/hwloc-2.8.0-6uoivskxtmt67y7q6hh5mm3yytsqpuaa/lib conftest.c -lrt -lutil -lz -lhwloc -levent_core -levent_pthreads >&5
configure:337890: $? = 0
configure:337892: result: yes
configure:337942: checking if _mm512_loadu_si512 generates code that can be compiled
configure:337963: /home/bc/ccad/stack/23.02/spack/lib/spack/env/intel/icc -o conftest -O0 -I/home/bc/ccad/stack/23.02/base/linux-rocky8-eulogia/intel-2021.2.0/zlib-1.2.13-pvq57ala447g5nnboh7zinbquuo744bw/inc
lude -I/home/bc/ccad/stack/23.02/base/linux-rocky8-eulogia/intel-2021.2.0/hwloc-2.8.0-6uoivskxtmt67y7q6hh5mm3yytsqpuaa/include -I/usr/local/include -I/usr/local/include -L/home/bc/ccad/stack/23.02/base/linu
x-rocky8-eulogia/intel-2021.2.0/zlib-1.2.13-pvq57ala447g5nnboh7zinbquuo744bw/lib -L/home/bc/ccad/stack/23.02/base/linux-rocky8-eulogia/intel-2021.2.0/hwloc-2.8.0-6uoivskxtmt67y7q6hh5mm3yytsqpuaa/lib conftest.
c -lrt -lutil -lz -lhwloc -levent_core -levent_pthreads >&5
configure:337963: $? = 0
configure:337964: result: yes
configure:337981: checking if _mm512_mullo_epi64 generates code that can be compiled
configure:338002: /home/bc/ccad/stack/23.02/spack/lib/spack/env/intel/icc -o conftest -O0 -I/home/bc/ccad/stack/23.02/base/linux-rocky8-eulogia/intel-2021.2.0/zlib-1.2.13-pvq57ala447g5nnboh7zinbquuo744bw/inc
lude -I/home/bc/ccad/stack/23.02/base/linux-rocky8-eulogia/intel-2021.2.0/hwloc-2.8.0-6uoivskxtmt67y7q6hh5mm3yytsqpuaa/include -I/usr/local/include -I/usr/local/include -L/home/bc/ccad/stack/23.02/base/linu
x-rocky8-eulogia/intel-2021.2.0/zlib-1.2.13-pvq57ala447g5nnboh7zinbquuo744bw/lib -L/home/bc/ccad/stack/23.02/base/linux-rocky8-eulogia/intel-2021.2.0/hwloc-2.8.0-6uoivskxtmt67y7q6hh5mm3yytsqpuaa/lib conftest.
c -lrt -lutil -lz -lhwloc -levent_core -levent_pthreads >&5
configure:338002: $? = 0
configure:338003: result: yes
configure:338020: checking for AVX2 support
configure:338027: result: yes
configure:338030: checking for AVX2 support (no additional flags)
configure:338049: /home/bc/ccad/stack/23.02/spack/lib/spack/env/intel/icc -o conftest -O3 -DNDEBUG -finline-functions -fno-strict-aliasing -restrict -Qoption,cpp,--extended_float_types -pthread -I/home/bc/ccad
/stack/23.02/base/linux-rocky8-eulogia/intel-2021.2.0/zlib-1.2.13-pvq57ala447g5nnboh7zinbquuo744bw/include -I/home/bc/ccad/stack/23.02/base/linux-rocky8-eulogia/intel-2021.2.0/hwloc-2.8.0-6uoivskxtmt67y7q6hh5m
m3yytsqpuaa/include -I/usr/local/include -I/usr/local/include -L/home/bc/ccad/stack/23.02/base/linux-rocky8-eulogia/intel-2021.2.0/zlib-1.2.13-pvq57ala447g5nnboh7zinbquuo744bw/lib -L/home/bc/ccad/stack/23.0
2/base/linux-rocky8-eulogia/intel-2021.2.0/hwloc-2.8.0-6uoivskxtmt67y7q6hh5mm3yytsqpuaa/lib conftest.c -lrt -lutil -lz -lhwloc -levent_core -levent_pthreads >&5
configure:338049: $? = 0
configure:338051: result: yes
configure:338100: checking if _mm256_loadu_si256 generates code that can be compiled
configure:338121: /home/bc/ccad/stack/23.02/spack/lib/spack/env/intel/icc -o conftest -O0 -I/home/bc/ccad/stack/23.02/base/linux-rocky8-eulogia/intel-2021.2.0/zlib-1.2.13-pvq57ala447g5nnboh7zinbquuo744bw/include -I/home/bc/ccad/stack/23.02/base/linux-rocky8-eulogia/intel-2021.2.0/hwloc-2.8.0-6uoivskxtmt67y7q6hh5mm3yytsqpuaa/include -I/usr/local/include -I/usr/local/include -L/home/bc/ccad/stack/23.02/base/linux-rocky8-eulogia/intel-2021.2.0/zlib-1.2.13-pvq57ala447g5nnboh7zinbquuo744bw/lib -L/home/bc/ccad/stack/23.02/base/linux-rocky8-eulogia/intel-2021.2.0/hwloc-2.8.0-6uoivskxtmt67y7q6hh5mm3yytsqpuaa/lib conftest.c -lrt -lutil -lz -lhwloc -levent_core -levent_pthreads >&5
configure:338121: $? = 0
configure:338122: result: yes ICC's arch-related CFLAGS injected through Spack's wrapper were |
Apparently, it does not complain about functions being defined twice but about the array of functions being defined twice. And for each of the two arrays of functions ( I need to see how the different compiled versions of the op_avx_functions.c file are generated, basically what are the compile flags and defines that are used. Could you provide the output of |
|
ok, I see what's going on. There are too many cases to test, so we cut some corners between the configure and the .c functions file. The best solution would be to protect each function instance with all the necessary checks, but the possible combinations are humongous, and this solution, while optimal, is not practical (at least I do not intend to spend the time required to make it work). The second best approach is to cut short the AVX512 code generation completely, if any of the necessary features are missing. We can do this in two steps, prevent the AVX512 code generation at the configure.m4 level, and avoid name collision in all other cases. Please test the patch below.
|
Sorry for the delay, it works. Tested with GCC 12.2.0 and ICC 2021.2.0 on KNL. |
As reported by Ray Muno on the user mailing list (https://www.mail-archive.com/[email protected]/msg34594.html), op/avx component cannot be linked when the nvhpc compilers are used
The root cause is Open MPI assumes the following macros are defined if the compiler supports the AVX512 features we expect
and at least one of these macros is not defined by nvhpc compilers
From
ompi/mca/op/avx/op_avx_functions.c
I am wondering whether we really need to re-test these macros. if
GENERATE_AVX512_CODE
is defined, should we not be ready to go (and hence no more need to test the avx512 related macros)?The text was updated successfully, but these errors were encountered: