Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL] Enabled more data types for oneMKL's gemm_batch API #8236

Merged
merged 4 commits into from
Jul 5, 2024

Conversation

OuadiElfarouki
Copy link
Collaborator

Additional gemm_batch types have been enabled in oneMKL (oneapi-src/oneMKL#466) and this patch enables their corresponding APIs for the SYCL backend which eliminates the extra-steps needed when targetting NON INTEL devices to cast/copy input/output to the supported types.

The enablement of gemm_batch_impl<sycl::half, sycl::half, float, float> for instance removes the overhead of calling gemm_batch_impl<sycl::half, sycl::half, sycl::half, sycl::half> followed by a to_fp32_sycl for the dst to be copied back from fp16 to fp32, which directly affects the KQ + KQV multi-batch path in quantized models Prompt Processing for instance.

Performance on intel GPUs remain the same, and a slight improvement in Prompt Processing performance on some Nvidia GPUs was observed (0 to 3% on average).

@github-actions github-actions bot added ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels Jul 1, 2024
@airMeng
Copy link
Collaborator

airMeng commented Jul 1, 2024

oneapi-src/oneMKL#466 is merged last week, shall you wait for the the next release of oneMKL? Sorry I am not familiar with oneMKL.

@OuadiElfarouki
Copy link
Collaborator Author

@airMeng Thanks for the suggestion. At the moment there is no clear/official release process on oneMKL Interface side. We don't mention anything related to oneMKL Interface releases in the README-sycl.md as well so from a user perspective it shouldn't be confusing at the moment.
We shall adopt a different approach whenver we hear from the oneMKL side regarding their release process, so will keep this in mind!

@mofosyne mofosyne added medium severity Used to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable) Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level and removed medium severity Used to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable) labels Jul 3, 2024
@OuadiElfarouki
Copy link
Collaborator Author

@airMeng @joeatodd anything else we want to address for this ?

Copy link
Collaborator

@joeatodd joeatodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚢

@AidanBeltonS AidanBeltonS merged commit 1f3e1b6 into ggerganov:master Jul 5, 2024
50 of 53 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants