Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump up GCC version to 13.x or later to support SIMD AVX512 fp16 instructions #5226

Open
naveentatikonda opened this issue Jan 6, 2025 · 15 comments
Assignees

Comments

@naveentatikonda
Copy link
Member

naveentatikonda commented Jan 6, 2025

Description

As of now, we are using GCC 10.x to compile native libraries in k-NN plugin. But, we are working on a new feature(targeting to be released in 2.19) with Intel to add support for avx512_fp16 instructions to Faiss Scalar Quantizer fp16 which will boost the performance. But, to invoke these instructions we need to compile these native libraries with GCC 13.x or later and the package manager doesn't support these versions directly on AL2 so we need to install it from source.

OpenSearch Version - 2.19

cc: @peterzhuamazon

@github-actions github-actions bot added the untriaged Issues that have not yet been triaged label Jan 6, 2025
@naveentatikonda naveentatikonda moved this from 🆕 New to 🏗 In progress in Engineering Effectiveness Board Jan 6, 2025
@peterzhuamazon peterzhuamazon added release v2.19.0 and removed untriaged Issues that have not yet been triaged labels Jan 6, 2025
@peterzhuamazon peterzhuamazon self-assigned this Jan 6, 2025
@peterzhuamazon
Copy link
Member

Taking a look on change this on AL2, note that we might need to do the same for Almalinux once AL2 deprecated.

@peterzhuamazon
Copy link
Member

peterzhuamazon commented Jan 8, 2025

GCC13 is not able to compile openblas correctly:


Warning: 'REALPART_EXPR <bets>' may be used uninitialized [-Wmaybe-uninitialized]
zblat3_3m.f:1276:47:

 1276 |       COMPLEX*16         ALPHA, ALS, BETA, BETS
      |                                               ^
note: 'REALPART_EXPR <bets>' was declared here
gfortran -O2 -Wall -frecursive -fno-optimize-sibling-calls -m64 -fopenmp  -fno-tree-vectorize  -o cblat2 cblat2.o ../libopenblasp-r0.3.27.a -lm -lpthread -lgfortran -lm -lpthread -lgfortran -L/usr/lib/gcc/x86_64-redhat-linux/7 -L/usr/lib/gcc/x86_64-redhat-linux/7/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -L/usr/lib/gcc/x86_64-redhat-linux/7/../../..  -lpthread -lc
gfortran -O2 -Wall -frecursive -fno-optimize-sibling-calls -m64 -fopenmp  -fno-tree-vectorize  -o zblat2 zblat2.o ../libopenblasp-r0.3.27.a -lm -lpthread -lgfortran -lm -lpthread -lgfortran -L/usr/lib/gcc/x86_64-redhat-linux/7 -L/usr/lib/gcc/x86_64-redhat-linux/7/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -L/usr/lib/gcc/x86_64-redhat-linux/7/../../..  -lpthread -lc
gfortran -O2 -Wall -frecursive -fno-optimize-sibling-calls -m64 -fopenmp  -fno-tree-vectorize  -o sblat3 sblat3.o ../libopenblasp-r0.3.27.a -lm -lpthread -lgfortran -lm -lpthread -lgfortran -L/usr/lib/gcc/x86_64-redhat-linux/7 -L/usr/lib/gcc/x86_64-redhat-linux/7/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -L/usr/lib/gcc/x86_64-redhat-linux/7/../../..  -lpthread -lc
gfortran -O2 -Wall -frecursive -fno-optimize-sibling-calls -m64 -fopenmp  -fno-tree-vectorize  -o dblat3 dblat3.o ../libopenblasp-r0.3.27.a -lm -lpthread -lgfortran -lm -lpthread -lgfortran -L/usr/lib/gcc/x86_64-redhat-linux/7 -L/usr/lib/gcc/x86_64-redhat-linux/7/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -L/usr/lib/gcc/x86_64-redhat-linux/7/../../..  -lpthread -lc
gfortran -O2 -Wall -frecursive -fno-optimize-sibling-calls -m64 -fopenmp  -fno-tree-vectorize  -o cblat3 cblat3.o ../libopenblasp-r0.3.27.a -lm -lpthread -lgfortran -lm -lpthread -lgfortran -L/usr/lib/gcc/x86_64-redhat-linux/7 -L/usr/lib/gcc/x86_64-redhat-linux/7/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -L/usr/lib/gcc/x86_64-redhat-linux/7/../../..  -lpthread -lc
gfortran -O2 -Wall -frecursive -fno-optimize-sibling-calls -m64 -fopenmp  -fno-tree-vectorize  -o cblat3_3m cblat3_3m.o ../libopenblasp-r0.3.27.a -lm -lpthread -lgfortran -lm -lpthread -lgfortran -L/usr/lib/gcc/x86_64-redhat-linux/7 -L/usr/lib/gcc/x86_64-redhat-linux/7/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -L/usr/lib/gcc/x86_64-redhat-linux/7/../../..  -lpthread -lc
gfortran -O2 -Wall -frecursive -fno-optimize-sibling-calls -m64 -fopenmp  -fno-tree-vectorize  -o zblat3 zblat3.o ../libopenblasp-r0.3.27.a -lm -lpthread -lgfortran -lm -lpthread -lgfortran -L/usr/lib/gcc/x86_64-redhat-linux/7 -L/usr/lib/gcc/x86_64-redhat-linux/7/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -L/usr/lib/gcc/x86_64-redhat-linux/7/../../..  -lpthread -lc
OPENBLAS_NUM_THREADS=1 OMP_NUM_THREADS=1 ./sblat1

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:

Could not print backtrace: unrecognized DWARF version in .debug_info at 6

Could not print backtrace: unrecognized DWARF version in .debug_info at 6

Could not print backtrace: unrecognized DWARF version in .debug_info at 6
#0  0x7f5110367eb3
#1  0x7f5110367179
#2  0x7f510fa75d0f
#3  0x7f51104c8bde
#4  0x7f51104d76b0
#5  0x409c38
#6  0x40804e
#7  0x7f510fa63139
#8  0x4080c9
#9  0xffffffffffffffff
make[1]: *** [level1] Segmentation fault (core dumped)
make[1]: *** Waiting for unfinished jobs....
make[1]: Leaving directory `/OpenBLAS/test'
make: *** [tests] Error 2

@peterzhuamazon
Copy link
Member

This might be a case being fixed in 0.3.28 version:
https://github.com/OpenMathLib/OpenBLAS/releases/tag/v0.3.28

@peterzhuamazon
Copy link
Member

A similar segfault on 0.3.27 on RISCV with gcc13:
OpenMathLib/OpenBLAS#4719 (comment)

@peterzhuamazon
Copy link
Member

Seems like this is related to using USE_OPENMP=1 in 0.3.27 specifically.

@peterzhuamazon
Copy link
Member

peterzhuamazon commented Jan 8, 2025

Report to them: OpenMathLib/OpenBLAS#4719 (comment)

@peterzhuamazon
Copy link
Member

Will try more verbose and develop branch tomorrow:
OpenMathLib/OpenBLAS#4719 (comment)

@peterzhuamazon
Copy link
Member

Failed at the same place again with develop branch:

OPENBLAS_NUM_THREADS=1 OMP_NUM_THREADS=1 ./sblat1
Core: SkylakeX

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:

Could not print backtrace: unrecognized DWARF version in .debug_info at 6

Could not print backtrace: unrecognized DWARF version in .debug_info at 6

Could not print backtrace: unrecognized DWARF version in .debug_info at 6
#0  0x7f6fc7a4deb3
#1  0x7f6fc7a4d179
#2  0x7f6fc715bd0f
#3  0x7f6fc7baebde
#4  0x7f6fc7bbd6b0
#5  0x409c38
#6  0x40804e
#7  0x7f6fc7149139
#8  0x4080c9
#9  0xffffffffffffffff
make[1]: *** [level1] Segmentation fault (core dumped)
make[1]: *** Waiting for unfinished jobs....
make[1]: Leaving directory `/OpenBLAS/test'
make: *** [tests] Error 2

@martin-frbg
Copy link

Not sure what to make of this, what build options did you use for OpenBLAS please ?

@martin-frbg
Copy link

("historically", this kind of crash used to happen when one mixed a version of gcc with a different one of gfortran at build, or the test executables managed to load an older version of libgfortran.so at runtime. The former can happen when you set up your PATH to resolve "gcc" and "gfortran" to the newer versions, but "/usr/bin/cc" is still linked to the distribution-default older version of gcc. )

@peterzhuamazon
Copy link
Member

peterzhuamazon commented Jan 9, 2025

Hi @martin-frbg ,

On compiling openblas:

CXX=g++
FC=gfortran
USE_OPENMP=1
DYNAMIC_ARCH=1

Then run make with the above env vars.

We did use a manually compiled version of gcc 13.2 from source, with these flags:

../configure --enable-languages=all --prefix=/usr/local --disable-multilib --disable-bootstrap

We did have another older version of gfortran on version 4.x I believe alongside the 13.2 version, probably that can be a reason let me take a look on that.

Thanks.

@peterzhuamazon
Copy link
Member

After having binutils upgrade to 2.42.90:

../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -L/usr/lib/gcc/x86_64-redhat-linux/7/../../..  -lpthread -lc
rm -f ?BLAT2.SUMM
OPENBLAS_NUM_THREADS=1 OMP_NUM_THREADS=1 ./sblat2 < ./sblat2.dat
At line 170 of file sblat2.f (unit = 6, file = 'stdout')
Fortran runtime error: Bad STATUS parameter in OPEN statement

Error termination. Backtrace:
OPENBLAS_NUM_THREADS=1 OMP_NUM_THREADS=1 ./sblat1

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

@peterzhuamazon
Copy link
Member

See a similar issue in 2019 here:

@peterzhuamazon
Copy link
Member

Seems like it is also incorrectly linked:

bash-4.2# ldd ./test/sblat1
	linux-vdso.so.1 (0x00007ffc277f9000)
	libgfortran.so.4 => /lib64/libgfortran.so.4 (0x00007fd279b09000)
	libm.so.6 => /lib64/libm.so.6 (0x00007fd2797c9000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fd2795ab000)
	libc.so.6 => /lib64/libc.so.6 (0x00007fd2791fe000)
	libgomp.so.1 => /usr/local/lib64/libgomp.so.1 (0x00007fd278fb2000)
	libgcc_s.so.1 => /usr/local/lib64/libgcc_s.so.1 (0x00007fd278d8f000)
	libquadmath.so.0 => /usr/local/lib64/libquadmath.so.0 (0x00007fd278b4a000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fd279ed8000)
	libdl.so.2 => /lib64/libdl.so.2 (0x00007fd278946000)

@martin-frbg
Copy link

The "bad status in open" reeks of an ancient gfortran, and indeed a GCC13 gfortran should be linking libgfortran.so.5 - this looks a lot like the "new gcc - old gfortran" (or vice versa) ABI mismatch I mentioned. Please check that you are actually invoking the versions of gcc and gfortran that you intended to use. (BTW the CXX=g++ is irrelevant for everything except the "thread safety tests" that need to be explicitly enabled - CC is what you want to have set up correctly, unless you rely on the automatic default of /usr/bin/cc that could be a link to whatever "old but stable" system compiler your installation was delivered with)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: 🏗 In progress
Status: In Progress
Development

No branches or pull requests

3 participants