MKL FP16 GEMM crash on MTL iGPU #524

rnwang04 · 2024-07-03T14:35:20Z

Summary

I found on MTL iGPU, if I call FP16 gemm of onemkl (no matter using OneAPI 2024.0 or 2024.2), the program will crash, and if I call it many times, it will cause my machine to freeze directly.
However, on ARC, everything is fine.

Version

oneAPI 2024.0 or oneAPI 2024.2 .

Environment

minimal c++ program
Windows 11
icx 2024.0.2 or 2024.2
Hardware: Intel Core Ultra iGPU

Steps to reproduce

#include <sycl/sycl.hpp>
#include <oneapi/mkl.hpp>
#include <iostream>
using namespace sycl;
int main() {

   queue q{gpu_selector_v};
   std::cout << "Device: " << q.get_device().get_info<info::device::name>() << std::endl;

   const int M = 1024;
   const int N = 11008;
   const int K = 4096;

   float* A_h = new float[M * K];
   float* B_h = new float[K * N];
   float* C_h = new float[M * N];
   // random
   for (int i = 0; i < M * K; ++i) {
       A_h[i] = static_cast<float>(rand()) / static_cast<float>(RAND_MAX);
   }
   for (int i = 0; i < K * N; ++i) {
       B_h[i] = static_cast<float>(rand()) / static_cast<float>(RAND_MAX);
   }
   // convert input to half
   sycl::half* A_h_half = new sycl::half[M * K];
   sycl::half* B_h_half = new sycl::half[K * N];
   for (int i = 0; i < M * K; ++i) {
       A_h_half[i] = sycl::half(A_h[i]);
   }
   for (int i = 0; i < K * N; ++i) {
       B_h_half[i] = sycl::half(B_h[i]);
   }

   buffer<sycl::half> A(A_h_half, M * K);
   buffer<sycl::half> B(B_h_half, K * N);
   buffer<float> C(C_h, M * N);
   // Use OneMKL to do GEMM
   {
       q.submit([&](handler &h) {
           sycl::accessor A_acc(A, h, sycl::write_only, sycl::no_init);
           sycl::accessor B_acc(B, h, sycl::write_only, sycl::no_init);
           sycl::accessor C_acc(C, h, sycl::write_only, sycl::no_init);
           oneapi::mkl::blas::row_major::gemm(
                q,
                oneapi::mkl::transpose::nontrans,
                oneapi::mkl::transpose::trans,
                M, N, K,
                1.0f, A_acc.get_pointer(), K,
                B_acc.get_pointer(), K,
                0.0f, C_acc.get_pointer(), N);
       }).wait();
   }

   delete[] A_h;
   delete[] B_h;
   delete[] C_h;
   delete[] A_h_half;
   delete[] B_h_half;

   printf("run success!\n");
   return 0;
}

running above script with below command:

# for linux
source /opt/intel/oneapi/setvars.sh
icpx -std=c++17 -fsycl -fopenmp -lpthread -l mkl_intel_lp64 -lmkl_core -lmkl_intel_thread -lmkl_sycl_blas -lmkl_intel_ilp64 -lmkl_tbb_thread -o gemm_fp16 gemm_fp16.cpp

# for windows
call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" --force
icx -std=c++17 -fsycl -fopenmp mkl_sycl_blas_dll.lib mkl_intel_ilp64_dll.lib mkl_sequential_dll.lib mkl_core_dll.lib -o gemm_fp16 gemm_fp16.cpp

Observed behavior

If I run above command on Linux Arc A770, it works fine:

If I run above command on Windows MTL iGPU, it fails and even cause a black screen:

Expected behavior

I hope above FP16 GEMM can work for MTL iGPU. Thanks!

andrewtbarker · 2024-07-03T17:32:53Z

Thanks for reporting this, this is a known issue in the oneMKL backend and should be fixed in the next release of the product.

rnwang04 · 2024-07-04T00:55:56Z

Hi @andrewtbarker , thanks for quick reply !
I wonder about how long I can obtain such fix ? Next release means OneAPI 2024.3 or smaller product iterations?

sknepper · 2024-08-09T17:21:48Z

Hello @rnwang04 - this has been fixed in the Intel oneMKL 2024.2.1 release, which is publicly available. Thanks!

rnwang04 · 2024-08-13T01:35:13Z

@sknepper Thanks for your great support ! I will have a try later 😊

andrewtbarker self-assigned this Jul 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MKL FP16 GEMM crash on MTL iGPU #524

MKL FP16 GEMM crash on MTL iGPU #524

rnwang04 commented Jul 3, 2024

andrewtbarker commented Jul 3, 2024

rnwang04 commented Jul 4, 2024

sknepper commented Aug 9, 2024

rnwang04 commented Aug 13, 2024

MKL FP16 GEMM crash on MTL iGPU #524

MKL FP16 GEMM crash on MTL iGPU #524

Comments

rnwang04 commented Jul 3, 2024

Summary

Version

Environment

Steps to reproduce

Observed behavior

Expected behavior

andrewtbarker commented Jul 3, 2024

rnwang04 commented Jul 4, 2024

sknepper commented Aug 9, 2024

rnwang04 commented Aug 13, 2024