Skip to content

Commit

Permalink
updated AMDGPU_TARGETS to GPU_TARGETS and checked for --offload-compress
Browse files Browse the repository at this point in the history
  • Loading branch information
dlangbe committed Nov 27, 2024
1 parent 2b4e394 commit 32c83a0
Show file tree
Hide file tree
Showing 3 changed files with 30 additions and 18 deletions.
35 changes: 23 additions & 12 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,6 @@ if( CMAKE_PROJECT_NAME STREQUAL "rocwmma" )
option( ROCWMMA_BUILD_ASSEMBLY "Output assembly files" OFF )
endif()

# set( AMDGPU_TARGETS "gfx908:xnack-" ) # User variable
if( CMAKE_INSTALL_PREFIX_INITIALIZED_TO_DEFAULT )
set( CMAKE_INSTALL_PREFIX "/opt/rocm" CACHE PATH "Install path prefix, prepended onto install directories" FORCE )
endif()
Expand Down Expand Up @@ -96,28 +95,40 @@ endif()

if (ADDRESS_SANITIZER_ENABLED)
#TODO: Remove next line when rocm-cmake fix is available
set(CMAKE_NO_BUILTIN_CHRPATH ON)
rocm_check_target_ids(DEFAULT_AMDGPU_TARGETS
rocm_check_target_ids(DEFAULT_GPU_TARGETS
TARGETS "gfx90a:xnack+;gfx942:xnack+" )
else()
#TODO: Remove next line when rocm-cmake fix is available
set(CMAKE_NO_BUILTIN_CHRPATH ON)
rocm_check_target_ids(DEFAULT_AMDGPU_TARGETS
rocm_check_target_ids(DEFAULT_GPU_TARGETS
TARGETS "gfx908;gfx90a;gfx942;gfx1100;gfx1101;gfx1102;gfx1200;gfx1201" )
endif()

# Check if offload compression is supported
include(CheckCXXCompilerFlag)
check_cxx_compiler_flag("--offload-compress" CXX_COMPILER_SUPPORTS_OFFLOAD_COMPRESS)
if (NOT CXX_COMPILER_SUPPORTS_OFFLOAD_COMPRESS)
if(ADDRESS_SANITIZER_ENABLED)
set(CMAKE_NO_BUILTIN_CHRPATH ON)
endif()
elseif(CXX_COMPILER_SUPPORTS_OFFLOAD_COMPRESS)
set(CMAKE_NO_BUILTIN_CHRPATH OFF)
endif()


# Variable AMDGPU_TARGET must be a cached variable and must be specified before calling find_package(hip)
# This is because hip-config.cmake sets --offload-arch via AMDGPU_TARGET cached variable __after__ setting
# default cached variable AMDGPU_TARGET to DEFAULT_AMDGPU_TARGETS, where not all archs are compatible with MFMA instructions
# Variable GPU_TARGET must be a cached variable and must be specified before calling find_package(hip)
# This is because hip-config.cmake sets --offload-arch via GPU_TARGET cached variable __after__ setting
# default cached variable GPU_TARGET to DEFAULT_GPU_TARGETS, where not all archs are compatible with MFMA instructions
#
# By rule, once cached variable is set, it cannot be overridden unless we use the FORCE option
if(AMDGPU_TARGETS)
set(AMDGPU_TARGETS "${AMDGPU_TARGETS}" CACHE STRING "List of specific machine types for library to target")
if(GPU_TARGETS)
set(GPU_TARGETS "${GPU_TARGETS}" CACHE STRING "List of specific machine types for library to target")
elseif(AMDGPU_TARGETS)
set(GPU_TARGETS "${AMDGPU_TARGETS}" CACHE STRING "List of specific machine types for library to target")
message(STATUS "WARNING: AMDGPU_TARGETS use is deprecated. Use GPU_TARGETS.")
else()
set(AMDGPU_TARGETS "${DEFAULT_AMDGPU_TARGETS}" CACHE STRING "List of specific machine types for library to target")
set(GPU_TARGETS "${DEFAULT_GPU_TARGETS}" CACHE STRING "List of specific machine types for library to target")
endif()
message( VERBOSE "AMDGPU_TARGETS=${AMDGPU_TARGETS}")
message( VERBOSE "GPU_TARGETS=${GPU_TARGETS}")

find_package( hip REQUIRED )
find_package( hiprtc REQUIRED )
Expand Down
7 changes: 4 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ The test suite includes validation and benchmarking projects that focus on unit

## Requirements

rocWMMA currently supports the following AMDGPU architectures:
rocWMMA currently supports the following AMD GPU architectures:

* CDNA class GPU featuring matrix core support: gfx908, gfx90a, gfx940, gfx940, gfx942 as 'gfx9'
* RDNA3 class GPU featuring AI acceleration support: gfx1100, gfx1101, gfx1102 as 'gfx11'
Expand Down Expand Up @@ -47,7 +47,8 @@ For more detailed information, please refer to the [rocWMMA installation guide](

|Option|Description|Default value|
|---|---|---|
|AMDGPU_TARGETS|Build code for specific GPU target(s)|gfx908:xnack-;gfx90a:xnack-;gfx90a:xnack+;gfx1100;gfx1101;gfx1102|
|GPU_TARGETS|Build code for specific GPU target(s)|gfx908:xnack-;gfx90a:xnack-;gfx90a:xnack+;gfx1100;gfx1101;gfx1102|
|AMDGPU_TARGETS|(Deprecated) Build code for specific GPU target(s)|gfx908:xnack-;gfx90a:xnack-;gfx90a:xnack+;gfx1100;gfx1101;gfx1102|
|ROCWMMA_BUILD_TESTS|Build Tests|ON|
|ROCWMMA_BUILD_SAMPLES|Build Samples|ON|
|ROCWMMA_BUILD_DOCS|Build doxygen documentation from code|OFF|
Expand All @@ -67,7 +68,7 @@ results. Here are some configuration examples:
|Configuration|Command|
|---|---|
|Basic|`CC=/opt/rocm/bin/amdclang CXX=/opt/rocm/bin/amdclang++ cmake -B<build_dir> .`|
|Targeting gfx908|`CC=/opt/rocm/bin/amdclang CXX=/opt/rocm/bin/amdclang++ cmake -B<build_dir> . -DAMDGPU_TARGETS=gfx908:xnack-` |
|Targeting gfx908|`CC=/opt/rocm/bin/amdclang CXX=/opt/rocm/bin/amdclang++ cmake -B<build_dir> . -DGPU_TARGETS=gfx908:xnack-` |
|Debug build|`CC=/opt/rocm/bin/amdclang CXX=/opt/rocm/bin/amdclang++ cmake -B<build_dir> . -DCMAKE_BUILD_TYPE=Debug` |
|Build without rocBLAS (default on)|`CC=/opt/rocm/bin/amdclang CXX=/opt/rocm/bin/amdclang++ cmake -B<build_dir> . -DROCWMMA_VALIDATE_WITH_ROCBLAS=OFF -DROCWMMA_BENCHMARK_WITH_ROCBLAS=OFF` |

Expand Down
6 changes: 3 additions & 3 deletions docs/install/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -183,7 +183,7 @@ Below are the project options available to build rocWMMA library with or without
* - **Option**
- **Description**
- **Default Value**
* - AMDGPU_TARGETS
* - GPU_TARGETS
- Build code for specific GPU target(s)
- ``gfx908:xnack-``; ``gfx90a:xnack-``; ``gfx90a:xnack+``; ``gfx940``; ``gfx941``; ``gfx942``; ``gfx1100``; ``gfx1101``; ``gfx1102``
* - ROCWMMA_BUILD_TESTS
Expand Down Expand Up @@ -235,7 +235,7 @@ Here are some other example project configurations:
+===================================+================================================================================================================================================================+
| Basic | :code:`CC=/opt/rocm/bin/amdclang CXX=/opt/rocm/bin/amdclang++ cmake -B <build_dir>` |
+-----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Targeting gfx908 | :code:`CC=/opt/rocm/bin/amdclang CXX=/opt/rocm/bin/amdclang++ cmake -B <build_dir> . -DAMDGPU_TARGETS=gfx908:xnack-` |
| Targeting gfx908 | :code:`CC=/opt/rocm/bin/amdclang CXX=/opt/rocm/bin/amdclang++ cmake -B <build_dir> . -DGPU_TARGETS=gfx908:xnack-` |
+-----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Debug build | :code:`CC=/opt/rocm/bin/amdclang CXX=/opt/rocm/bin/amdclang++ cmake -B <build_dir> . -DCMAKE_BUILD_TYPE=Debug` |
+-----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+
Expand Down Expand Up @@ -481,7 +481,7 @@ Build performance

Depending on the resources available to the build machine and the build configuration selected, rocWMMA build times can be on the order of an hour or more. Here are some things you can do to reduce build times:

* Target a specific GPU (e.g., ``-D AMDGPU_TARGETS=gfx908:xnack-``)
* Target a specific GPU (e.g., ``-D GPU_TARGETS=gfx908:xnack-``)
* Use lots of threads (e.g., ``-j32``)
* Select ``ROCWMMA_BUILD_ASSEMBLY=OFF``
* Select ``ROCWMMA_BUILD_DOCS=OFF``.
Expand Down

0 comments on commit 32c83a0

Please sign in to comment.