Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3x faster MIOpen build by using host compiler instead of ROCm/clang++ #2091

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from

Conversation

dmikushin
Copy link
Contributor

@dmikushin dmikushin commented Apr 13, 2023

This PR experimentally switches MIOpen C++ compiler from /opt/rocm/llvm/bin/clang++ to just system's default C++ compiler (e.g. GCC) and greatly improves the MIOpen compilation speed (about 3.5x on my machine):

Original:

CXX=/opt/rocm/llvm/bin/clang++ cmake -DMIOPEN_BACKEND=HIP -DCMAKE_PREFIX_PATH="/opt/rocm/;/opt/rocm/hip;/root/MIOpen/install_dir" ..

time make MIOpen -j8
________________________________________________________
Executed in   34.38 mins   fish           external 
   usr time  128.20 mins  554.00 micros  128.20 mins 
   sys time    4.50 mins  294.00 micros    4.50 mins

New:

cmake -DMIOPEN_BACKEND=HIP -DCMAKE_HIP_ARCHITECTURES="gfx900;gfx906;gfx908;gfx90a;gfx1030" ..

time make MIOpen -j8
________________________________________________________
Executed in  599.50 secs   fish           external 
   usr time   36.34 mins    0.00 micros   36.34 mins 
   sys time    2.08 mins  1303.00 micros    2.08 mins

…l subset of source files with HIP-enabled Clang compiler. The rest of .cpp files are compiled with the host C++ compiler (e.g. gcc or regular clang), which is faster
…he following calls to hip_add_interface_compile_flags are disabled in /opt/rocm/lib/cmake/hip/hip-config.cmake: hip_add_interface_compile_flags(hip::device -x hip) hip_add_interface_compile_flags(hip::device --offload-arch=${GPU_TARGET})

* Adding an option to specify the HIP architectures explicitly, since HIP language uses native target (your GPU) by default:
  cmake -DMIOPEN_BACKEND=HIP -DCMAKE_HIP_ARCHITECTURES="gfx900;gfx906;gfx908;gfx90a;gfx1030" ..
@dmikushin dmikushin requested a review from JehandadKhan April 13, 2023 16:25
@dmikushin
Copy link
Contributor Author

dmikushin commented Apr 13, 2023

Not all MIOpen source files could be treated as plain C++ files. Apparently, this happens due to some CK headers spilling GPU device code definitions via include files. In order to workaround this issue, we temporary migrate to CMake 3.24 and rename these files from .cpp to .hip.

Moreover, two lines of hip-config.cmake are incompatible, because they enforce rocm/clang-specific flags in a too hard way:

hip_add_interface_compile_flags(hip::device -x hip)
hip_add_interface_compile_flags(hip::device --offload-arch=${GPU_TARGET})

@junliume
Copy link
Contributor

@dmikushin this is very interesting indeed! Let's merge the latest develop to your branch and resolve these conflicts first.

@dmikushin
Copy link
Contributor Author

dmikushin commented Apr 13, 2023

@JehandadKhan One reason I'm compiling some files as .hip sources is the following:

/usr/local/include/ck/utility/data_type.hpp:11:17: error: ‘_Float16’ does not name a type; did you mean ‘_Float64’?                                           
   11 | using half_t  = _Float16;
      |                 ^~~~~~~~
      |                 _Float64
/usr/local/include/ck/utility/data_type.hpp:25:8: error: template parameters not deducible in partial specialization:
   25 | struct vector_type<T __attribute__((ext_vector_type(V))), N>;
      |        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/include/ck/utility/data_type.hpp:25:8: note:         ‘V’
/usr/local/include/ck/utility/data_type.hpp:43:8: error: template parameters not deducible in partial specialization:
   43 | struct vector_type_maker<T __attribute__((ext_vector_type(N1))), N0>
      |        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/include/ck/utility/data_type.hpp:43:8: note:         ‘N1’
/usr/local/include/ck/utility/data_type.hpp:80:8: error: template parameters not deducible in partial specialization:
   80 | struct scalar_type<T __attribute__((ext_vector_type(N)))>
      |        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/include/ck/utility/data_type.hpp:80:8: note:         ‘N’
/usr/local/include/ck/utility/data_type.hpp:109:20: error: ‘half_t’ was not declared in this scope; did you mean ‘bhalf_t’?
  109 | struct scalar_type<half_t>
      |                    ^~~~~~
      |                    bhalf_t
/usr/local/include/ck/utility/data_type.hpp:109:26: error: template argument 1 is invalid
  109 | struct scalar_type<half_t>
      |                          ^
/usr/local/include/ck/utility/data_type.hpp:914:39: error: ‘half_t’ was not declared in this scope; did you mean ‘bhalf_t’?
  914 | using half2_t  = typename vector_type<half_t, 2>::type;
      |                                       ^~~~~~
      |                                       bhalf_t

CK wants to use half types, that are not fully standardized.

Another reason: included CK headers contains device function code:

/usr/local/include/ck/utility/math_v2.hpp:129:31: error: redefinition of ‘bool ck::math::isnan(int32_t)’
  129 | static inline __device__ bool isnan(int32_t x)

I have no idea why CK needs to include all of this into the host source files. Perhaps, these functions could be macro-guarded?

@pfultz2
Copy link
Contributor

pfultz2 commented Apr 13, 2023

MIOpen doesnt compile device kernels during cmake build time. Everything compiled at runtime. So it doesnt make sense to use the HIP language.

The only reason why we use hip::device is because of calling hipExtModuleLaunchKernel as hip only makes this host function available in the headers with device compilation because its not supported on cuda platforms(I dont really understand hip's logic here). However, there is a workaround by just adding an extern declaration for the function(thats what we do in migraphx here).

Also, I dont think it is good to require a version of cmake that is not available in standard packages in ubuntu 20.04.

@dmikushin
Copy link
Contributor Author

@pfultz2 , please see above:

/usr/local/include/ck/utility/math_v2.hpp:129:31: error: redefinition of ‘bool ck::math::isnan(int32_t)’
  129 | static inline __device__ bool isnan(int32_t x)

If we can ask CK to guard this code, then there will be no need of HIP mode indeed. And likely no need for CMake 3.24 as well, so your concern will be resolved.

@@ -191,7 +190,6 @@ endif()
# HIP is always required
find_package(hip REQUIRED PATHS /opt/rocm)
message(STATUS "Build with HIP ${hip_VERSION}")
target_flags(HIP_COMPILER_FLAGS hip::device)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will break when we compile(at runtime) with clang directly instead of using hiprtc. We still probably need that as a fallback for when we run into issues with hiprtc.

@pfultz2
Copy link
Contributor

pfultz2 commented Apr 13, 2023

If we can ask CK to guard this code, then there will be no need of HIP mode indeed. And likely no need for CMake 3.24 as well, so your concern will be resolved.

Yes this is a bug that should be fixed in CK. They should not be exposing device code in the header files for a host API.

@dmikushin
Copy link
Contributor Author

dmikushin commented Apr 13, 2023

My ultimate goal is to drop rocm/clang++ from compilation, because it's slow. We should let the clang++ developers to figure out why it is slow and use the host gcc in the meantime, which is much faster. As you said, this is possible, given that CK stops spoiling host compilation with half types and unused declaration of device functions. Perhaps this could be resolved by a small code cleanup...

set(HIP_LIBRARIES ${hip_LIBRARIES})
set(HIP_LIBRARY ${hip_LIBRARY})
set(HIP_HIPCC_EXECUTABLE ${hip_HIPCC_EXECUTABLE})
set(HIP_HIPCONFIG_EXECUTABLE ${hip_HIPCONFIG_EXECUTABLE})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really related to this PR, but this file should be removed. This was added just to provide workarounds for hip-clang for rocm 3.5. There has been fixes since this, so we should no longer need this(if we still need it then we need to report it to the hip team).

@dmikushin dmikushin changed the title 3x faster MIOpen build by using ROCm/clang only for .hip files 3x faster MIOpen build by using host compiler instead of ROCm/clang++ compiler Apr 17, 2023
@dmikushin dmikushin changed the title 3x faster MIOpen build by using host compiler instead of ROCm/clang++ compiler 3x faster MIOpen build by using host compiler instead of ROCm/clang++ Apr 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants