Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ARM] Support FP16 post-ops fusion into ACL kernels #2067

Open
dmitry-gorokhov opened this issue Aug 30, 2024 · 2 comments
Open

[ARM] Support FP16 post-ops fusion into ACL kernels #2067

dmitry-gorokhov opened this issue Aug 30, 2024 · 2 comments
Labels
enhancement A feature or an optimization request help wanted platform:cpu-aarch64 Codeowner: @oneapi-src/onednn-cpu-aarch64

Comments

@dmitry-gorokhov
Copy link
Contributor

Summary

Current ACL integration prohibits first post op fusion into ACL kernel in case FP16 dst data_type. The request is to conditionally enable such behavior.

Problem statement

OneDNN post-ops fusion mechanism provides significant performance boost by skipping intermediate memory movements overheads. However in bounds of ACL such behavior is disabled for FP16 execution due to oneDNN requirements on precision of post-ops computations (should be equal to FP16). Fusion of single post op for FP16 primitives leads to multiple FP16<->FP32 datatype conversions and expensive memory access overheads. As a result separate execution of corresponding operations (via separate oneDNN primitives call) provides better performance in comparision with fusion version.

Preferred solution

Inside OpenVINO we just relaxed the condition to allow FP16 post-op fusion (with FP16 insternal compute) inside ACL integration. However that solution might not be sutable for all oneDNN users due to accuracy restrictions.
Based on that the proposal is to adopt dnnl::accumulation_mode atribute as a trigger for different post-ops computational precision. As a results desired behavior in terms of balance between accuracy and performance can be choosen on oneDNN user level.

@dmitry-gorokhov dmitry-gorokhov added the enhancement A feature or an optimization request label Aug 30, 2024
@vpirogov vpirogov added help wanted platform:cpu-aarch64 Codeowner: @oneapi-src/onednn-cpu-aarch64 labels Aug 30, 2024
@theComputeKid
Copy link
Contributor

It makes sense to me, do you have any patches demonstrating the scale of changes needed to adopt the attribute?

@vpirogov
Copy link
Member

vpirogov commented Sep 3, 2024

Related discussion in #1689.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A feature or an optimization request help wanted platform:cpu-aarch64 Codeowner: @oneapi-src/onednn-cpu-aarch64
Projects
None yet
Development

No branches or pull requests

3 participants