[Issue]: How to build rocWMMA on Windows ? #464

Jay19751103 · 2024-11-25T09:36:52Z

Problem Description

Adding New gfx model gfx1151 to Linux , it can build on Linux also I can build the llama cpp with rocWMMA patch
https://github.com/ggerganov/llama.cpp/pull/7011/commits to test FA

Testing llama.cpp with rocWMMA lib build
amd@halo:~/llama.cpp_wmma/build/bin$ ./llama-bench -m ~/ROCm/6.3/rocBLAS/Meta-Llama-3-8B.Q4_K_M.gguf -p 512 -fa 1
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: CUDA_USE_TENSOR_CORES: yes
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Radeon Graphics, compute capability 11.5, VMM: no

model	size	params	backend	ngl	fa	test	t/s
llama 8B Q4_K - Medium	4.58 GiB	8.03 B	ROCm	99	1	pp 512	662.65 ± 18.68
llama 8B Q4_K - Medium	4.58 GiB	8.03 B	ROCm	99	1	tg 128	36.41 ± 0.28

Without rocWMMA
amd@halo:~/llama.cpp/build/bin$ ./llama-bench -m ~/ROCm/6.3/rocBLAS/Meta-Llama-3-8B.Q4_K_M.gguf -p 512
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: yes
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Radeon Graphics, compute capability 11.5, VMM: no

model	size	params	backend	ngl	test	t/s
llama 8B Q4_K - Medium	4.58 GiB	8.03 B	ROCm	99	pp512	625.42 ± 5.46
llama 8B Q4_K - Medium	4.58 GiB	8.03 B	ROCm	99	tg128	36.03 ± 0.24

I'd want to port into windows. But On Windows I get the cmake error as following
G:\rocWMMA>cmake -Bbuild2 -DAMDGPU_TARGETS="gfx1151" -G Ninja
-- The CXX compiler identification is Clang 19.0.0 with GNU-like command-line
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: C:/opt/rocm/6.1/bin/clang++.exe - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Performing Test COMPILER_HAS_TARGET_ID_gfx908_xnack_off
-- Performing Test COMPILER_HAS_TARGET_ID_gfx908_xnack_off - Failed
-- Performing Test COMPILER_HAS_TARGET_ID_gfx90a_xnack_off
-- Performing Test COMPILER_HAS_TARGET_ID_gfx90a_xnack_off - Failed
-- Performing Test COMPILER_HAS_TARGET_ID_gfx90a_xnack_on
-- Performing Test COMPILER_HAS_TARGET_ID_gfx90a_xnack_on - Failed
-- Performing Test COMPILER_HAS_TARGET_ID_gfx940
-- Performing Test COMPILER_HAS_TARGET_ID_gfx940 - Failed
-- Performing Test COMPILER_HAS_TARGET_ID_gfx941
-- Performing Test COMPILER_HAS_TARGET_ID_gfx941 - Failed
-- Performing Test COMPILER_HAS_TARGET_ID_gfx942
-- Performing Test COMPILER_HAS_TARGET_ID_gfx942 - Failed
-- Performing Test COMPILER_HAS_TARGET_ID_gfx1100
-- Performing Test COMPILER_HAS_TARGET_ID_gfx1100 - Failed
-- Performing Test COMPILER_HAS_TARGET_ID_gfx1101
-- Performing Test COMPILER_HAS_TARGET_ID_gfx1101 - Failed
-- Performing Test COMPILER_HAS_TARGET_ID_gfx1102
-- Performing Test COMPILER_HAS_TARGET_ID_gfx1102 - Failed
-- Performing Test COMPILER_HAS_TARGET_ID_gfx1151
-- Performing Test COMPILER_HAS_TARGET_ID_gfx1151 - Failed
-- Performing Test HIP_CLANG_SUPPORTS_PARALLEL_JOBS
-- Performing Test HIP_CLANG_SUPPORTS_PARALLEL_JOBS - Failed
CMake Deprecation Warning at C:/opt/rocm/6.1/lib/cmake/hiprtc/hiprtc-config.cmake:21 (cmake_minimum_required):
Compatibility with CMake < 3.5 will be removed from a future version of
CMake.

Update the VERSION argument value or use a ... suffix to tell
CMake that the project does not need compatibility with older versions.
Call Stack (most recent call first):
CMakeLists.txt:103 (find_package)

CMake Error at C:/Program Files/CMake/share/cmake-3.30/Modules/FindPackageHandleStandardArgs.cmake:233 (message):
Could NOT find OpenMP_CXX (missing: OpenMP_CXX_FLAGS OpenMP_CXX_LIB_NAMES)
Call Stack (most recent call first):
C:/Program Files/CMake/share/cmake-3.30/Modules/FindPackageHandleStandardArgs.cmake:603 (_FPHSA_FAILURE_MESSAGE)
C:/Program Files/CMake/share/cmake-3.30/Modules/FindOpenMP.cmake:600 (find_package_handle_standard_args)
CMakeLists.txt:104 (find_package)

-- Configuring incomplete, errors occurred!

Operating System

10.0.22631

CPU

AMD 7700X

GPU

GFX1151

ROCm Version

ROCm 6.2.3

ROCm Component

rocWMMA

Steps to Reproduce

0001-Port-gfx1151-to-rocWMMA.patch
Patching to add gfx1151 and use
cmake -Bbuild2 -DAMDGPU_TARGETS="gfx1151" -G Ninja get error

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

PS C:\Users\users> (Get-WmiObject Win32_OperatingSystem).Version
10.0.22631
PS C:\Users\users> (Get-WmiObject win32_Processor).Name
AMD Ryzen 7 7700X 8-Core Processor
PS C:\Users\users> (Get-WmiObject win32_VideoController).Name
AMD Radeon RX 7900 XTX
AMD Radeon(TM) Graphics

ppanchad-amd · 2024-11-25T19:36:05Z

Hi @Jay19751103. Internal ticket has been created to assist with your issue. Thanks!

Jay19751103 · 2024-11-26T08:10:23Z

Hi
Use old version and Disable OpenMP can build. but better is supported with native source.

sorasoras · 2024-12-03T17:45:24Z

I would like to see if I can compile rocwmma for gfx1100 as well on Windows.
it was prerequisite for implement flash attention for llama cpp.

sorasoras · 2024-12-14T18:07:19Z

Hi Use old version and Disable OpenMP can build. but better is supported with native source.

Do you mean Rocm 5.7?

CMake Error at CMakeLists.txt:102 (find_package):
find_package for module OpenMP called with REQUIRED, but
CMAKE_DISABLE_FIND_PACKAGE_OpenMP is enabled. A REQUIRED package cannot be
disabled.
doesnt seems to work at current build

Jay19751103 · 2024-12-16T05:33:25Z

I disabled OpenMP on windows, This may need to wait rocWmma new release.
Normally OpenMP is used to verify the between CPU program and GPU program.
Currently, I do the test bench on Linux and use following on windows.

Now my based version is
677b441

Change following
diff --git a/CMakeLists.txt b/CMakeLists.txt
index 242b6afa..8e2d28e1 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -48,7 +48,7 @@ set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} --driver-mode=g++ -Xclang -fallow-half-a

Top level configs

if( CMAKE_PROJECT_NAME STREQUAL "rocwmma" )

option( ROCWMMA_BUILD_TESTS "Build rocWMMA tests" ON )

option( ROCWMMA_BUILD_TESTS "Build rocWMMA tests" OFF )
option( ROCWMMA_BUILD_SAMPLES "Build rocWMMA samples" ON )
option( ROCWMMA_BUILD_ASSEMBLY "Output assembly files" OFF )
endif()
@@ -101,18 +101,10 @@ message( VERBOSE "AMDGPU_TARGETS=${AMDGPU_TARGETS}")

find_package( hip REQUIRED )
find_package( hiprtc REQUIRED )
-find_package( OpenMP REQUIRED )

-find_path(ROCM_SMI_ROOT "include/rocm_smi/rocm_smi.h"

PATHS "${ROCM_ROOT}"
PATH_SUFFIXES "rocm_smi"
)

-find_library(ROCM_SMI_LIBRARY rocm_smi64

PATHS "${ROCM_SMI_ROOT}/lib")

add_library(rocwmma INTERFACE)
-target_link_libraries(rocwmma INTERFACE hip::device hip::host OpenMP::OpenMP_CXX ${ROCM_SMI_LIBRARY})
+target_link_libraries(rocwmma INTERFACE hip::device hip::host ${ROCM_SMI_LIBRARY})

rocm_install_targets(
TARGETS rocwmma
diff --git a/samples/CMakeLists.txt b/samples/CMakeLists.txt
index 9a89a851..709b0666 100644
--- a/samples/CMakeLists.txt
+++ b/samples/CMakeLists.txt
@@ -35,7 +35,7 @@ function(add_rocwmma_sample TEST_TARGET TEST_SOURCE)

list(APPEND TEST_SOURCE ${ARGN})
add_executable(${TEST_TARGET} ${TEST_SOURCE})

target_link_libraries(${TEST_TARGET} OpenMP::OpenMP_CXX "-L${HIP_CLANG_ROOT}/lib" "-Wl,-rpath=${HIP_CLANG_ROOT}/lib")

target_link_libraries(${TEST_TARGET} "-L${HIP_CLANG_ROOT}/lib" "-Wl,-rpath=${HIP_CLANG_ROOT}/lib")
target_link_libraries(${TEST_TARGET} rocwmma hiprtc::hiprtc)
target_include_directories(${TEST_TARGET} PRIVATE
${CMAKE_CURRENT_SOURCE_DIR}

Build command
cmake -G "Ninja" -DROCWMMA_BUILD_TESTS=OFF -DCMAKE_BUILD_TYPE=Release -DAMDGPU_TARGETS=gfx1100 -Bbuild

Then install to hipsdk directory. and use following branch llama branch to build GGML FA features with rocWMMA
https://github.com/hjc4869/llama.cpp

sorasoras · 2024-12-16T17:54:15Z

I disabled OpenMP on windows, This may need to wait rocWmma new release. Normally OpenMP is used to verify the between CPU program and GPU program. Currently, I do the test bench on Linux and use following on windows.

Now my based version is 677b441

Change following diff --git a/CMakeLists.txt b/CMakeLists.txt index 242b6afa..8e2d28e1 100644 --- a/CMakeLists.txt +++ b/CMakeLists.txt @@ -48,7 +48,7 @@ set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} --driver-mode=g++ -Xclang -fallow-half-a

Top level configs

if( CMAKE_PROJECT_NAME STREQUAL "rocwmma" )

option( ROCWMMA_BUILD_TESTS "Build rocWMMA tests" ON )

option( ROCWMMA_BUILD_TESTS "Build rocWMMA tests" OFF )
option( ROCWMMA_BUILD_SAMPLES "Build rocWMMA samples" ON )
option( ROCWMMA_BUILD_ASSEMBLY "Output assembly files" OFF )
endif()
@@ -101,18 +101,10 @@ message( VERBOSE "AMDGPU_TARGETS=${AMDGPU_TARGETS}")

find_package( hip REQUIRED ) find_package( hiprtc REQUIRED ) -find_package( OpenMP REQUIRED )

-find_path(ROCM_SMI_ROOT "include/rocm_smi/rocm_smi.h"

PATHS "${ROCM_ROOT}"

PATH_SUFFIXES "rocm_smi"

)

-find_library(ROCM_SMI_LIBRARY rocm_smi64

PATHS "${ROCM_SMI_ROOT}/lib")

add_library(rocwmma INTERFACE) -target_link_libraries(rocwmma INTERFACE hip::device hip::host OpenMP::OpenMP_CXX ${ROCM_SMI_LIBRARY}) +target_link_libraries(rocwmma INTERFACE hip::device hip::host ${ROCM_SMI_LIBRARY})

rocm_install_targets( TARGETS rocwmma diff --git a/samples/CMakeLists.txt b/samples/CMakeLists.txt index 9a89a851..709b0666 100644 --- a/samples/CMakeLists.txt +++ b/samples/CMakeLists.txt @@ -35,7 +35,7 @@ function(add_rocwmma_sample TEST_TARGET TEST_SOURCE)

list(APPEND TEST_SOURCE ${ARGN}) add_executable(${TEST_TARGET} ${TEST_SOURCE})

target_link_libraries(${TEST_TARGET} OpenMP::OpenMP_CXX "-L${HIP_CLANG_ROOT}/lib" "-Wl,-rpath=${HIP_CLANG_ROOT}/lib")

target_link_libraries(${TEST_TARGET} "-L${HIP_CLANG_ROOT}/lib" "-Wl,-rpath=${HIP_CLANG_ROOT}/lib")
target_link_libraries(${TEST_TARGET} rocwmma hiprtc::hiprtc)
target_include_directories(${TEST_TARGET} PRIVATE
${CMAKE_CURRENT_SOURCE_DIR}

Build command cmake -G "Ninja" -DROCWMMA_BUILD_TESTS=OFF -DCMAKE_BUILD_TYPE=Release -DAMDGPU_TARGETS=gfx1100 -Bbuild

Then install to hipsdk directory. and use following branch llama branch to build GGML FA features with rocWMMA https://github.com/hjc4869/llama.cpp

it would be nice if you can post your cmakelist.txt. it's kind of hard to read.
thanks

johnnynunez · 2024-12-17T16:34:45Z

New HIP SDK 6.2.0 is on windows: https://www.amd.com/en/developer/resources/rocm-hub/eula/licenses.html?filename=AMD-Software-PRO-Edition-24.Q4-Win10-Win11-For-HIP.exe

ROCm/MIOpen#3436

johnnynunez · 2024-12-23T22:47:26Z

@huanrwan-amd windows...

jamesxu2 · 2025-01-07T18:13:19Z

Hi @Jay19751103 and @sorasoras, rocWMMA is not supported or tested on Windows.

@johnnynunez I'm not sure what the meaning of your comments is, can you clarify?

Jay19751103 · 2025-01-08T05:30:17Z

Hi @jamesxu2

I download the source and build it with disabling testing and OPENMP then install to hipsdk folder.
For llama.cpp
Sync the change ggml-org/llama.cpp@a0c09b1
Rebuild and testing with llama-server with -fa argument, llama-bench with -fa 1 argument.
Do you means that we cannot use it on windows even it can build on windows?

jamesxu2 · 2025-01-08T14:42:41Z

@Jay19751103, even if you can build it, rocWMMA is not supported or tested on Windows. This means there can be bugs or unexpected behaviour related to the different OS. Because Windows is not in our testing matrix, it may partially "work" in some usecases but suddenly break after an update, or fail in unexpected ways.

ppanchad-amd added the Under Investigation label Nov 25, 2024

jamesxu2 closed this as completed Jan 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Issue]: How to build rocWMMA on Windows ? #464

[Issue]: How to build rocWMMA on Windows ? #464

Jay19751103 commented Nov 25, 2024 •

edited

Loading

ppanchad-amd commented Nov 25, 2024

Jay19751103 commented Nov 26, 2024

sorasoras commented Dec 3, 2024 •

edited

Loading

sorasoras commented Dec 14, 2024 •

edited

Loading

Jay19751103 commented Dec 16, 2024

sorasoras commented Dec 16, 2024

Top level configs

johnnynunez commented Dec 17, 2024

johnnynunez commented Dec 23, 2024

jamesxu2 commented Jan 7, 2025

Jay19751103 commented Jan 8, 2025 •

edited

Loading

jamesxu2 commented Jan 8, 2025

[Issue]: How to build rocWMMA on Windows ? #464

[Issue]: How to build rocWMMA on Windows ? #464

Comments

Jay19751103 commented Nov 25, 2024 • edited Loading

Problem Description

Operating System

CPU

GPU

ROCm Version

ROCm Component

Steps to Reproduce

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

Additional Information

ppanchad-amd commented Nov 25, 2024

Jay19751103 commented Nov 26, 2024

sorasoras commented Dec 3, 2024 • edited Loading

sorasoras commented Dec 14, 2024 • edited Loading

Jay19751103 commented Dec 16, 2024

Top level configs

sorasoras commented Dec 16, 2024

Top level configs

johnnynunez commented Dec 17, 2024

johnnynunez commented Dec 23, 2024

jamesxu2 commented Jan 7, 2025

Jay19751103 commented Jan 8, 2025 • edited Loading

jamesxu2 commented Jan 8, 2025

Jay19751103 commented Nov 25, 2024 •

edited

Loading

sorasoras commented Dec 3, 2024 •

edited

Loading

sorasoras commented Dec 14, 2024 •

edited

Loading

Jay19751103 commented Jan 8, 2025 •

edited

Loading