Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support building with flang on windows #4768

Open
h-vetinari opened this issue Jun 28, 2024 · 16 comments
Open

Support building with flang on windows #4768

h-vetinari opened this issue Jun 28, 2024 · 16 comments
Labels
Bug in other software Compiler, Virtual Machine, etc. bug affecting OpenBLAS

Comments

@h-vetinari
Copy link
Contributor

h-vetinari commented Jun 28, 2024

OpenBLAS already added flang support, but I don't think this is being tested on windows? While reviving the old effort to build conda-forge's openblas with flang, I originally ran into some parsing issue with flang 18.

Luckily, with a flang 19 built from main (already built for debugging something else, so I thought I'd try), it seems that particular issue is gone. 🥳

However, I first encountered some CMake detection issues:

-- Found OpenMP_C: -Xclang -fopenmp (found version "5.1")
CMake Error at D:/bld/openblas_1719527807775/_build_env/Library/share/cmake-3.29/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
-- Configuring incomplete, errors occurred!
  Could NOT find OpenMP_Fortran (missing: OpenMP_Fortran_FLAGS
  OpenMP_Fortran_LIB_NAMES)

After iteratively figuring out (also re-encountering #3069 again along the way) that I needed to add (something like)

    -DOpenMP_Fortran_FLAGS=-fopenmp ^
    -DOpenMP_Fortran_LIB_NAMES=libomp ^
    -DOpenMP_libomp_LIBRARY=-llibomp ^
    -DOpenMP_C_FLAGS=-fopenmp ^
    -DOpenMP_C_LIB_NAMES=libomp ^

I then ran into what looks like a regular compilation error:

[...]
[3643/19184] Building Fortran object CMakeFiles\LAPACK_OVERRIDES.dir\lapack-netlib\SRC\stplqt.f.obj
[3644/19184] Building Fortran object CMakeFiles\LAPACK_OVERRIDES.dir\lapack-netlib\SRC\stplqt2.f.obj
[3645/19184] Building Fortran object CMakeFiles\LAPACK_OVERRIDES.dir\lapack-netlib\SRC\stpmlqt.f.obj
[3646/19184] Building Fortran object CMakeFiles\LAPACK_OVERRIDES.dir\lapack-netlib\SRC\ssytrd_2stage.f.obj
[3647/19184] Building Fortran object CMakeFiles\LAPACK_OVERRIDES.dir\lapack-netlib\SRC\ssytrd_sb2st.F.obj
FAILED: CMakeFiles/LAPACK_OVERRIDES.dir/lapack-netlib/SRC/ssytrd_sb2st.F.obj 
%BUILD_PREFIX%\Library\bin\flang.exe -I%SRC_DIR%\lapack-netlib\SRC -I%SRC_DIR%\lapack-netlib\LAPACKE\include -fopenmp -fopenmp -ffixed-line-length-72 -o CMakeFiles\LAPACK_OVERRIDES.dir\lapack-netlib\SRC\ssytrd_sb2st.F.obj -c CMakeFiles\LAPACK_OVERRIDES.dir\lapack-netlib\SRC\ssytrd_sb2st.F-pp.f
error: Semantic errors in CMakeFiles\LAPACK_OVERRIDES.dir\lapack-netlib\SRC\ssytrd_sb2st.F-pp.f
D:\\bld\\openblas_1719536014536\\work\\lapack-netlib\\SRC\\ssytrd_sb2st.F:237:11: error: Cannot read module file for module 'omp_lib': Source file 'omp_lib.mod' was not found
        use omp_lib
            ^^^^^^^
@martin-frbg
Copy link
Collaborator

right, flang on Windows appears to be lagging behind the Linux/Unix version. If there is no omp_lib module provided by LLVM, I suggest you open an issue with them (unless already known/documented). This may also be the reason why you needed to set a bunch of cmake variables manually

@martin-frbg martin-frbg added the Bug in other software Compiler, Virtual Machine, etc. bug affecting OpenBLAS label Jun 28, 2024
@martin-frbg
Copy link
Collaborator

but ISTR broken OpenMP support in LLVM on Windows is a known problem, and your "flang 19" is an unstable snapshot

@h-vetinari
Copy link
Contributor Author

but ISTR broken OpenMP support in LLVM on Windows is a known problem

Do you have a link?

and your "flang 19" is an unstable snapshot

Yes, 19.1.0rc1 is only expected in about a month. The problem was that all flang 18 builds were broken for SciPy, and I wanted to test/ensure that things work with flang 19 (early enough for potentially necessary fixes to land), hence why I built from main, as noted in the OP.

@martin-frbg
Copy link
Collaborator

somewhere in the discussion in #3973 I think, but it could be that it was broken once, worked for a while and is now broken again.

@h-vetinari
Copy link
Contributor Author

It seems upstream intends to support it. I'm trying to rebuild as necessary to test that hypothesis.

Meanwhile, I've had one passing run without OpenMP, however, the logs got spammed so badly with warnings (~500MB), that I cannot really check them. After running again with warnings ignored, I get:

97% tests passed, 4 tests failed out of 120
Errors while running CTest
Output from these tests are in: D:/bld/openblas_1719572917500/work/build/Testing/Temporary/LastTest.log

Use "--rerun-failed --output-on-failure" to re-run the failed cases verbosely.
Total Test time (real) = 9104.76 sec

The following tests FAILED:
	  5 - sblas3 (Failed)
	  8 - dblas3 (Failed)
	 11 - cblas3 (Failed)
	 15 - zblas3 (Failed)

The runtime especially of complex-valued procedures is off the charts. That's also the part that depends on LLVM's compiler-rt (instead of MSVC's runtime); not sure if that plays a role somehow. An excerpt:

 14/120 Test  #15: zblas3 ......................................................................................***Failed    5.13 sec
        Start  16: zblas3_3m
 15/120 Test  #16: zblas3_3m ...................................................................................   Passed    0.75 sec
        Start  17: REAL_LAPACK_linear_equation_routines
 16/120 Test  #14: zblas2 ......................................................................................   Passed  202.79 sec
        Start  18: COMPLEX_LAPACK_linear_equation_routines
 17/120 Test  #17: REAL_LAPACK_linear_equation_routines ........................................................   Passed  2937.14 sec
        Start  19: DOUBLE_PRECISION_LAPACK_linear_equation_routines
 18/120 Test  #18: COMPLEX_LAPACK_linear_equation_routines .....................................................   Passed  4390.90 sec
        Start  20: COMPLEX16_LAPACK_linear_equation_routines
 19/120 Test  #19: DOUBLE_PRECISION_LAPACK_linear_equation_routines ............................................   Passed  3293.06 sec
        Start  21: SINGLE-DOUBLE_PRECISION_LAPACK_prototype_linear_equation_routines
 20/120 Test  #21: SINGLE-DOUBLE_PRECISION_LAPACK_prototype_linear_equation_routines ...........................   Passed  178.72 sec
        Start  22: Testing_COMPLEX-COMPLEX16_LAPACK_prototype_linear_equation_routines
 21/120 Test  #22: Testing_COMPLEX-COMPLEX16_LAPACK_prototype_linear_equation_routines .........................   Passed  793.80 sec
        Start  23: Testing_REAL_LAPACK_RFP_prototype_linear_equation_routines
 22/120 Test  #23: Testing_REAL_LAPACK_RFP_prototype_linear_equation_routines ..................................   Passed   57.25 sec
        Start  24: Testing_DOUBLE_PRECISION_LAPACK_RFP_prototype_linear_equation_routines
 23/120 Test  #24: Testing_DOUBLE_PRECISION_LAPACK_RFP_prototype_linear_equation_routines ......................   Passed  115.00 sec
        Start  25: Testing_COMPLEX_LAPACK_RFP_prototype_linear_equation_routines
 24/120 Test  #25: Testing_COMPLEX_LAPACK_RFP_prototype_linear_equation_routines ...............................   Passed  338.11 sec
        Start  26: Testing_COMPLEX16_LAPACK_RFP_prototype_linear_equation_routines
 25/120 Test  #26: Testing_COMPLEX16_LAPACK_RFP_prototype_linear_equation_routines .............................   Passed  283.97 sec
        Start  27: SNEP:_Testing_Nonsymmetric_Eigenvalue_Problem_routines
 26/120 Test  #27: SNEP:_Testing_Nonsymmetric_Eigenvalue_Problem_routines ......................................   Passed    1.00 sec
        Start  28: SSEP:_Testing_Symmetric_Eigenvalue_Problem_routines
 27/120 Test  #28: SSEP:_Testing_Symmetric_Eigenvalue_Problem_routines .........................................   Passed    0.70 sec
        Start  29: SSE2:_Testing_Symmetric_Eigenvalue_Problem_routines
 28/120 Test  #29: SSE2:_Testing_Symmetric_Eigenvalue_Problem_routines .........................................   Passed    1.89 sec
        Start  30: SSVD:_Testing_Singular_Value_Decomposition_routines
 29/120 Test  #30: SSVD:_Testing_Singular_Value_Decomposition_routines .........................................   Passed    0.71 sec
        Start  31: SSEC:_Testing_REAL_Eigen_Condition_Routines
 30/120 Test  #31: SSEC:_Testing_REAL_Eigen_Condition_Routines .................................................   Passed    0.60 sec
        Start  32: SSEV:_Testing_REAL_Nonsymmetric_Eigenvalue_Driver

@h-vetinari
Copy link
Contributor Author

FWIW, the time to run ctest -j2 on our current builds is:

100% tests passed, 0 tests failed out of 120

Total Test time (real) = 6305.40 sec

So flang takes about 50% more time.

@h-vetinari
Copy link
Contributor Author

on a rerun of the flang-built openblas (without openmp), I now get:

100% tests passed, 0 tests failed out of 120

Total Test time (real) = 8278.32 sec

My suspicion is that the CI agents switching between different CPU types randomly is the cause for the pass or fail. If necessary, I can try to validate that hypothesis.

@martin-frbg
Copy link
Collaborator

maybe alternating between avx512 and non-avx512 , common problem with azure-ci and ISTR mmuetzel had noted problems with win-llvm and avx512 previously

@martin-frbg
Copy link
Collaborator

omp mod problem could also be missing include path to modules in llvm install path. will try to look at your logs later if able

@h-vetinari
Copy link
Contributor Author

It fails with the following instructions found for the CPU of the CI agent (according to numpy):

    "found": [
      "SSSE3",
      "SSE41",
      "POPCNT",
      "SSE42",
      "AVX",
      "F16C",
      "FMA3",
      "AVX2",
      "AVX512F",
      "AVX512CD",
      "AVX512_SKX"
    ],
    "not found": [
      "AVX512_CLX",
      "AVX512_CNL",
      "AVX512_ICL"
    ]

I managed to fix the omp_lib.mod problem, the openmp-enabled build now fails with:

[17746/19184] Linking Fortran shared library lib\openblas.dll
FAILED: lib/openblas.dll lib/Release/openblas.lib 
C:\Windows\system32\cmd.exe /C "C:\Windows\system32\cmd.exe /C "%BUILD_PREFIX%\Library\bin\cmake.exe -E __create_def %SRC_DIR%\build\CMakeFiles\openblas_shared.dir\.\exports.def %SRC_DIR%\build\CMakeFiles\openblas_shared.dir\.\exports.def.objs --nm=%BUILD_PREFIX%\Library\bin\llvm-nm.exe && cd %SRC_DIR%\build" && %BUILD_PREFIX%\Library\bin\cmake.exe -E vs_link_dll --intdir=CMakeFiles\openblas_shared.dir --rc=C:\PROGRA~2\WI3CF2~1\10\bin\100226~1.0\x64\rc.exe --mt=C:\PROGRA~2\WI3CF2~1\10\bin\100226~1.0\x64\mt.exe --manifests  -- C:\PROGRA~1\LLVM\bin\lld-link.exe /nologo @CMakeFiles\openblas_shared.rsp  /out:lib\openblas.dll /implib:lib\Release\openblas.lib /pdb:lib\openblas.pdb /dll /version:0.3 /machine:x64 /INCREMENTAL:NO /DEBUG /OPT:REF /OPT:ICF  /DEF:CMakeFiles\openblas_shared.dir\.\exports.def  -libpath:"D:/bld/openblas_1719644456116/_build_env/Library/lib" -libpath:"D:/bld/openblas_1719644456116/_build_env/Library/lib/clang/19/lib/windows"  && cd ."
LINK: command "C:\PROGRA~1\LLVM\bin\lld-link.exe /nologo @CMakeFiles\openblas_shared.rsp /out:lib\openblas.dll /implib:lib\Release\openblas.lib /pdb:lib\openblas.pdb /dll /version:0.3 /machine:x64 /INCREMENTAL:NO /DEBUG /OPT:REF /OPT:ICF /DEF:CMakeFiles\openblas_shared.dir\.\exports.def -libpath:D:/bld/openblas_1719644456116/_build_env/Library/lib -libpath:D:/bld/openblas_1719644456116/_build_env/Library/lib/clang/19/lib/windows /MANIFEST:EMBED,ID=2" failed (exit code 1) with the following output:
lld-link: warning: ignoring unknown argument '-lpthreads'
lld-link: error: undefined symbol: omp_get_max_threads

@martin-frbg
Copy link
Collaborator

The lpthreads argument error is probably bogus. Do you see -fopenmp or -lomp in the build log ? Do you see omp_get_max_threads in nm or dumpbin output of the llvm-provided libomp ?

@h-vetinari
Copy link
Contributor Author

Passing configuration (non-openmp):

    "found": [
      "SSSE3",
      "SSE41",
      "POPCNT",
      "SSE42",
      "AVX",
      "F16C",
      "FMA3",
      "AVX2"
    ],
    "not found": [
      "AVX512F",
      "AVX512CD",
      "AVX512_SKX",
      "AVX512_CLX",
      "AVX512_CNL",
      "AVX512_ICL"
    ]

So indeed some AVX512 issue seems likely

@h-vetinari
Copy link
Contributor Author

Do you see -fopenmp or -lomp in the build log ?

Yeah, I'm even passing that explicitly in -DOpenMP_Fortran_FLAGS=-fopenmp. To be honest, I have no idea where the -lpthreads is coming from.

@martin-frbg
Copy link
Collaborator

seen some google hits suggesting that it is simply fallout from one of cmake's standard configuration check scripts and can be ignored.
but omp_get_max_threads should definitely be in llvm's libomp (or at least used to be in earlier versions)

@h-vetinari
Copy link
Contributor Author

Looking closer, I'm almost certain this is a question of the wrong library path being picked:

-libpath:"D:/bld/openblas_1719644456116/_build_env/Library/lib" -libpath:"D:/bld/openblas_1719644456116/_build_env/Library/lib/clang/19/lib/windows"

Both of these are pointing to the build environment, not the host where openmp is actually present. Perhaps the path is constructed relative to clang/flang?

@h-vetinari
Copy link
Contributor Author

h-vetinari commented Jul 8, 2024

Two good news, two bad news

Good news #1

I managed to build openblas in conda-forge with flang 19 and the following patch:

commit 4cab116c4e811f977c816f3027df57d1fba3bd0c
Author: H. Vetinari <[email protected]>
Date:   Mon Jul 1 09:05:00 2024 +1100

    explicitly link to OpenMP

    bump CMake minimum to adopt CMP0074 for checking OpenMP_ROOT, see
    https://cmake.org/cmake/help/latest/policy/CMP0074.html

diff --git a/CMakeLists.txt b/CMakeLists.txt
index 096ca88b3..ef505f70d 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -2,7 +2,7 @@
 ## Author: Hank Anderson <[email protected]>
 ##

-cmake_minimum_required(VERSION 2.8.5)
+cmake_minimum_required(VERSION 3.12)

 project(OpenBLAS C ASM)

@@ -100,6 +100,10 @@ endif()

 message(WARNING "CMake support is experimental. It does not yet support all build options and may not produce the same Makefiles that OpenBLAS ships with.")

+if (USE_OPENMP)
+  find_package(OpenMP REQUIRED)
+endif ()
+
 include("${PROJECT_SOURCE_DIR}/cmake/utils.cmake")
 include("${PROJECT_SOURCE_DIR}/cmake/system.cmake")

@@ -256,6 +260,15 @@ if (${CMAKE_SYSTEM_NAME} MATCHES "AIX|Android|Linux|FreeBSD|OpenBSD|NetBSD|Drago
   endif()
 endif()

+if (USE_OPENMP)
+  if(BUILD_STATIC_LIBS)
+    target_link_libraries(${OpenBLAS_LIBNAME}_static OpenMP::OpenMP_C)
+  endif()
+  if(BUILD_SHARED_LIBS)
+    target_link_libraries(${OpenBLAS_LIBNAME}_shared OpenMP::OpenMP_C)
+  endif()
+endif()
+
 # Seems that this hack doesn't required since macOS 11 Big Sur
 if (APPLE AND BUILD_SHARED_LIBS AND CMAKE_HOST_SYSTEM_VERSION VERSION_LESS 20)
   set (CMAKE_C_USE_RESPONSE_FILE_FOR_OBJECTS 1)

This is to fix the linkage errors for symbols like omp_get_max_threads mentioned above. I'm not claiming that the patch is directly upstreamable, but if you want I can open a PR.

Bad news #1

Finding LLVM's OpenMP correctly using CMake's find_package(OpenMP) is apparently fraught. Even using the OpenMP_ROOT work-around noted here, the Fortran-bits do not get picked up correctly, requiring a manual override:

    REM not picked up by `find_package(OpenMP)` for some reason
    set "CMAKE_EXTRA=-DOpenMP_Fortran_FLAGS=-fopenmp -DOpenMP_Fortran_LIB_NAMES=libomp"
    set "FFLAGS=%FFLAGS% -I%LIBRARY_PREFIX%\include"

Good news #2

The test suite is extraordinarily faster than without openmp:

Total Test time (real) =  98.31 sec

That's roughly 85x faster than the result we get for llvm-flang-built OpenBLAS without OpenMP (which is probably too good to be true, in the sense that the pthreads variant is probably being pessimized somewhere/somehow), and still ~65x faster than the current pthreads-based production builds; c.f. also conda-forge/openblas-feedstock#160

Bad news #2

The same kind of failures as with pthreads occur on agents with AVX512 instructions:

97% tests passed, 4 tests failed out of 120

The following tests FAILED:
	  5 - sblas3 (Failed)
	  8 - dblas3 (Failed)
	 11 - cblas3 (Failed)
	 15 - zblas3 (Failed)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug in other software Compiler, Virtual Machine, etc. bug affecting OpenBLAS
Projects
None yet
Development

No branches or pull requests

2 participants