Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MKL FP16 GEMM crash on MTL iGPU #524

Open
rnwang04 opened this issue Jul 3, 2024 · 4 comments
Open

MKL FP16 GEMM crash on MTL iGPU #524

rnwang04 opened this issue Jul 3, 2024 · 4 comments
Assignees

Comments

@rnwang04
Copy link

rnwang04 commented Jul 3, 2024

Summary

I found on MTL iGPU, if I call FP16 gemm of onemkl (no matter using OneAPI 2024.0 or 2024.2), the program will crash, and if I call it many times, it will cause my machine to freeze directly.
However, on ARC, everything is fine.

Version

oneAPI 2024.0 or oneAPI 2024.2 .

Environment

  • minimal c++ program
  • Windows 11
  • icx 2024.0.2 or 2024.2
  • Hardware: Intel Core Ultra iGPU

Steps to reproduce

#include <sycl/sycl.hpp>
#include <oneapi/mkl.hpp>
#include <iostream>
using namespace sycl;
int main() {

   queue q{gpu_selector_v};
   std::cout << "Device: " << q.get_device().get_info<info::device::name>() << std::endl;

   const int M = 1024;
   const int N = 11008;
   const int K = 4096;

   float* A_h = new float[M * K];
   float* B_h = new float[K * N];
   float* C_h = new float[M * N];
   // random
   for (int i = 0; i < M * K; ++i) {
       A_h[i] = static_cast<float>(rand()) / static_cast<float>(RAND_MAX);
   }
   for (int i = 0; i < K * N; ++i) {
       B_h[i] = static_cast<float>(rand()) / static_cast<float>(RAND_MAX);
   }
   // convert input to half
   sycl::half* A_h_half = new sycl::half[M * K];
   sycl::half* B_h_half = new sycl::half[K * N];
   for (int i = 0; i < M * K; ++i) {
       A_h_half[i] = sycl::half(A_h[i]);
   }
   for (int i = 0; i < K * N; ++i) {
       B_h_half[i] = sycl::half(B_h[i]);
   }

   buffer<sycl::half> A(A_h_half, M * K);
   buffer<sycl::half> B(B_h_half, K * N);
   buffer<float> C(C_h, M * N);
   // Use OneMKL to do GEMM
   {
       q.submit([&](handler &h) {
           sycl::accessor A_acc(A, h, sycl::write_only, sycl::no_init);
           sycl::accessor B_acc(B, h, sycl::write_only, sycl::no_init);
           sycl::accessor C_acc(C, h, sycl::write_only, sycl::no_init);
           oneapi::mkl::blas::row_major::gemm(
                q,
                oneapi::mkl::transpose::nontrans,
                oneapi::mkl::transpose::trans,
                M, N, K,
                1.0f, A_acc.get_pointer(), K,
                B_acc.get_pointer(), K,
                0.0f, C_acc.get_pointer(), N);
       }).wait();
   }

   delete[] A_h;
   delete[] B_h;
   delete[] C_h;
   delete[] A_h_half;
   delete[] B_h_half;

   printf("run success!\n");
   return 0;
}

running above script with below command:

# for linux
source /opt/intel/oneapi/setvars.sh
icpx -std=c++17 -fsycl -fopenmp -lpthread -l mkl_intel_lp64 -lmkl_core -lmkl_intel_thread -lmkl_sycl_blas -lmkl_intel_ilp64 -lmkl_tbb_thread -o gemm_fp16 gemm_fp16.cpp

# for windows
call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" --force
icx -std=c++17 -fsycl -fopenmp mkl_sycl_blas_dll.lib mkl_intel_ilp64_dll.lib mkl_sequential_dll.lib mkl_core_dll.lib -o gemm_fp16 gemm_fp16.cpp

Observed behavior

If I run above command on Linux Arc A770, it works fine:
image

If I run above command on Windows MTL iGPU, it fails and even cause a black screen:
image

Expected behavior

I hope above FP16 GEMM can work for MTL iGPU. Thanks!

@andrewtbarker andrewtbarker self-assigned this Jul 3, 2024
@andrewtbarker
Copy link
Contributor

Thanks for reporting this, this is a known issue in the oneMKL backend and should be fixed in the next release of the product.

@rnwang04
Copy link
Author

rnwang04 commented Jul 4, 2024

Hi @andrewtbarker , thanks for quick reply !
I wonder about how long I can obtain such fix ? Next release means OneAPI 2024.3 or smaller product iterations?

@sknepper
Copy link
Contributor

sknepper commented Aug 9, 2024

Hello @rnwang04 - this has been fixed in the Intel oneMKL 2024.2.1 release, which is publicly available. Thanks!

@rnwang04
Copy link
Author

@sknepper Thanks for your great support ! I will have a try later 😊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants