-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing gemm_batch data types #446
Comments
@AidanBeltonS Thanks for reporting this. At this point, this gap is known and expected. The documentation you linked points to oneMKL Product implementation (not oneMKL open source interfaces). Typically, new APIs/features are implemented in oneMKL Product first and then they are ported to oneMKL open source interfaces. If this use case is critical for your application, please let us know. We also encourage everyone to contribute :) |
Thanks for the response @mmeterel thank you for clarifying the documentation. Yes this would be something that is critical for our application. See for our use case: ggerganov/llama.cpp#5591 |
@AidanBeltonS Thanks for your contributions! |
There seem to be more missing cases from the cublas backend that weren't covered by what llama required. I'm in the process of finding and implementing them. |
From what I remember Aidan had issues with the quantized types in particular. He was seeing incorrect results which could be due to the reference that needs to be adjusted or some other issue. |
Summary
I believe there are some missing gemm_batch implementations, looking at the oneMKL docs it seems this should support. A
gemm_batch
with, two half matrices as input, a float matrix out, and float scaling. My reference: https://www.intel.com/content/www/us/en/docs/onemkl/developer-reference-dpcpp/2023-0/gemm-batch.htmlI run into issues of this overload not being found. Is my documentation correct, or have I misunderstood something?
Version
oneMKL hash: 7d2044e
Environment
oneMKL works with multiple HW and backend libraries and also depends on the
compiler and build environment. Include
the following information to help reproduce the issue:
Steps to reproduce
Compile with for NVidia GPUs:
icpx -fsycl -fsycl-targets=nvptx64-nvidia-cuda reproducer_onemkl_batch.cpp -lonemkl
or for Intel GPUs:
icpx -fsycl reproducer_onemkl_batch.cpp -lonemkl
Error:
Given the documentation I linked to above, I would expect this to compile. As the docs express that this combination of data types are supported.
The text was updated successfully, but these errors were encountered: