Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Investigate replacing uses of bash and other Unix tools in CMake build system with platform independent code #2311

Open
edsavage opened this issue Jun 16, 2022 · 6 comments

Comments

@edsavage
Copy link
Contributor

edsavage commented Jun 16, 2022

The initial approach to migrating to a CMake build system has continued to assume the presence of a Unix like build environment. On Windows this requires the presence of git bash and the minimal GNU toolset - MinGW. It is desirable to lessen the dependency on such tools on Windows as it lowers the bar for developers on that platform and simplifies the setting up of the build environment.

What follows is a list of places in the CMake build system that make calls to bash or other Unix commands.

lib/ver/CMakeLists.txt

lib/ver/CMakeLists.txt:execute_process(COMMAND date "+%Y" OUTPUT_VARIABLE BUILD_YEAR OUTPUT_STRIP_TRAILING_WHITESPACE)
lib/ver/CMakeLists.txt:execute_process(COMMAND id COMMAND awk -F ")" "{ print $1 }" COMMAND awk -F "(" "{ print $2 }" OUTPUT_VARIABLE USER_NAME
lib/ver/CMakeLists.txt:execute_process(COMMAND awk -F= "/elasticsearchVersion/ {gsub(/-.*/,\"\"); print $2}" $ENV{CPP_SRC_HOME}/gradle.properties OUTPUT_VARIABLE PRODUCT_VERSION OUTPUT_STRIP_TRAILING_WHITESPACE)

Suggestions:

  • id: Use native “whoami” command that gives response "vm-win2016-64-1\eds” and parse using cmake string commands
  • date: use native “date /T” (requires extensions to be installed and parse out the year
  • awk: use CMake file READ command to read in gradle.properties and then search and parse out elasticsearchVersion value

3rd_party/CMakeLists.txt:

add_custom_target(
    eigen
    DEPENDS 3rd_party.sh pull-eigen.sh
    COMMAND bash -c "./3rd_party.sh --add ${INSTALL_DIR}"
    COMMAND bash -c "./pull-eigen.sh"
    WORKING_DIRECTORY cmake${CMAKE_CURRENT_SOURCE_DIR}
)

cmake/compiler/vs2019.cmake

execute_process(COMMAND bash -c "cygpath -m -s \"${ROOT}/Program Files (x86)/Microsoft Visual Studio/2019/Professional\"" OUTPUT_VARIABLE VCBASE OUTPUT_STRIP_TRAILING_WHITESPACE)
execute_process(COMMAND bash -c "cygpath -m -s \"${ROOT}/Program Files (x86)/Windows Kits\"" OUTPUT_VARIABLE WINSDKBASE OUTPUT_STRIP_TRAILING_WHITESPACE)
execute_process(COMMAND bash -c "cygpath -m -s \"${ROOT}/Program Files (x86)\"" OUTPUT_VARIABLE PFX86_DIR OUTPUT_STRIP_TRAILING_WHITESPACE)
execute_process(COMMAND bash -c "cd ${PFX86_DIR} && cygpath -m -s \"Microsoft Visual Studio\"" OUTPUT_VARIABLE MSVC_DIR OUTPUT_STRIP_TRAILING_WHITESPACE)
execute_process(COMMAND bash -c "cd ${PFX86_DIR} && cygpath -m -s \"Windows Kits\"" OUTPUT_VARIABLE WIN_KITS_DIR OUTPUT_STRIP_TRAILING_WHITESPACE)
execute_process(COMMAND bash -c "/bin/ls -1 ${PFX86_DIR}/${MSVC_DIR}/2019/Professional/VC/Tools/MSVC" COMMAND bash -c "tail -1" OUTPUT_VARIABLE VCVER OUTPUT_STRIP_TRAILING_WHITESPACE)
execute_process(COMMAND bash -c "/bin/ls -1 ${WINSDKBASE}/10/Include" COMMAND bash -c "tail -1" OUTPUT_VARIABLE UCRTVER OUTPUT_STRIP_TRAILING_WHITESPACE)

cmake/variables.cmake

execute_process(COMMAND awk -F= "/elasticsearchVersion/ {gsub(/-.*/,\"\"); print $2}"
  $ENV{CPP_SRC_HOME}/gradle.properties OUTPUT_VARIABLE ML_VERSION_NUM OUTPUT_STRIP_TRAILING_WHITESPACE)

cmake/functions.cmake

function(ml_generate_resources _target)
  if(NOT WIN32)
    return()
  endif()
  set( ${_target}_LINKFLAGS ${CMAKE_CURRENT_BINARY_DIR}/${_target}.res )
  set_target_properties( ${_target} PROPERTIES LINK_FLAGS ${${_target}_LINKFLAGS} )
  execute_process(COMMAND bash -c "${CMAKE_SOURCE_DIR}/mk/make_rc_defines.sh ${_target}.exe" OUTPUT_VARIABLE
    RC_DEFINES OUTPUT_STRIP_TRAILING_WHITESPACE)
  file(WRITE ${CMAKE_CURRENT_BINARY_DIR}/tmp.sh "rc -nologo ${CPPFLAGS} ${RC_DEFINES} -Fo${_target}.res ${CMAKE_SOURCE_DIR}/mk/ml.rc")
  add_custom_target(
    ${_target}.res
    DEPENDS ${CMAKE_SOURCE_DIR}/mk/ml.rc ${CMAKE_SOURCE_DIR}/gradle.properties ${CMAKE_SOURCE_DIR}/mk/ml.ico ${CMAKE_SOURCE_DIR}/mk/make_rc_defines.sh ${CMAKE_CURRENT_BINARY_DIR}/tmp.sh
    COMMAND bash -c ${CMAKE_CURRENT_BINARY_DIR}/tmp.sh
    )
  add_dependencies(${_target} ${_target}.res)

  set_directory_properties(PROPERTIES ADDITIONAL_MAKE_CLEAN_FILES ${CMAKE_CURRENT_BINARY_DIR}/tmp.sh)
  set_directory_properties(PROPERTIES ADDITIONAL_MAKE_CLEAN_FILES ${CMAKE_CURRENT_BINARY_DIR}/${_target}.res)
endfunction()

build.gradle

grep -n bash build.gradle
83:// Always do the C++ build using bash (Git bash on Windows)
84:project.ext.bash = isWindows ? "C:\\Program Files\\Git\\bin\\bash" : "/bin/bash"
102:      commandLine bash
111:  commandLine bash
118:  commandLine bash
126:  commandLine bash
133:  commandLine bash
192:  commandLine bash
400:  commandLine bash
@edsavage
Copy link
Contributor Author

Suggestions:

  • id: Use native “whoami” command that gives response "vm-win2016-64-1\eds” and parse using cmake string commands
  • date: use native “date /T” (requires extensions to be installed and parse out the year
  • awk: use CMake file READ command to read in gradle.properties and then search and parse out elasticsearchVersion value
  • 3rd_party.sh and pull-eigen.sh: rewrite using CMake scripting and call using cmake -P
  • vs2019.cmake: Use cmake syntax for directories (forward slashes, spaces allowed). Potentially hard code version numbers (allowing to be overridden from environment) or use native dir /od command and parse output.
  • ml_generate_resources: call rc directly (not via shell script), replace make_rc_defines.sh with cmake commands (using native windows commands where necessary - or simply use a windows batch file)
  • build.gradle: Assumes the presence of git bash and sets the environment using set_env.sh. For a native build in a windows command shell we could set up the environment with the combination of the vcvarsall.bat
    file
    and a minimal set_env.bat file to set CPP_SRC_HOME etc.

@droberts195
Copy link
Contributor

droberts195 commented Jun 17, 2022

  • date: use native “date /T” (requires extensions to be installed and parse out the year

For the date we could use the TIMESTAMP feature of CMake's string function: https://cmake.org/cmake/help/v3.19/command/string.html#timestamp

If we needed accurate timestamps on each sub-compilation step this wouldn't be acceptable, as it gets the date when the build system is generated, not when it's run. However, we're only getting the year to put in the copyright message, and each CI build will generate the build system from scratch, so CMake's string and TIMESTAMP are adequate for our needs and have the advantage of being completely platform-independent.

@edsavage
Copy link
Contributor Author

For the date we could use the TIMESTAMP feature of CMake's string function:

This looks ideal. Accurate timestamps aren't required, in fact we only require the year.

@droberts195
Copy link
Contributor

file(WRITE ${CMAKE_CURRENT_BINARY_DIR}/tmp.sh "rc -nologo ${CPPFLAGS} ${RC_DEFINES} -Fo${_target}.res ${CMAKE_SOURCE_DIR}/mk/ml.rc")

I don't think we'd need to use a tmp.sh Bash script to run rc. rc is a native Windows program - the resource compiler for Windows programs. So surely we could just run it directly?

In fact, CMake might even provide a more elegant way to run rc. https://discourse.cmake.org/t/how-to-use-resource-files-rc-in-cmake/2628 alludes to it. This might be something to research for a future enhancement.

@droberts195
Copy link
Contributor

grep -n bash build.gradle
83:// Always do the C++ build using bash (Git bash on Windows)

The reason we're doing this is so that we can set up an environment using set_env.sh. It's nice that we can have one script for all platforms - if we had a .sh and a separate .bat then it's likely that people would sometimes forget to keep them in sync.

This might be the hardest of all to move away from.

However, I still think it's worth gradually getting rid of the other places where we're unnecessarily using Git Bash functionality from CMake. Once the lower level locations are dealt with we might find we don't need as many environment variables.

Another thing we could look at is setting up some of the paths that are currently in set_env.sh in CMakeLists.txt. It's probably going to be a long-term iterative process but we can keep taking small steps in the right direction.

@droberts195
Copy link
Contributor

  • 3rd_party.sh and pull-eigen.sh: rewrite using CMake scripting and call using cmake -P

I think this would be good. Certainly 3rd_party.sh seems to be getting run more often than it needs to be for iterative local rebuilds. Presumably we have it as a dependency for everything. This is true, but it still doesn't need to be run after every single minor code edit. It takes quite a long time on Linux where the PyTorch and MKL libraries are big.

edsavage added a commit to edsavage/ml-cpp that referenced this issue Jun 17, 2022
Replace calls to unix like commands with either CMake commands or the
appropriate command for the native platform.

Relates elastic#2311
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants