Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cpu: aarch64: Expand brgemm aarch64 unsupported cases handling mechanism #2099

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Radu2k
Copy link
Contributor

@Radu2k Radu2k commented Sep 18, 2024

Remove the asserts that cause segfaults in the test and handle unsupported cases based on oneDNN status type. As well remove the forgotten asserts in brgemm_matmul.cpp and modify acl_deconvolution.hpp to make op/f32/deconv_bwd_d test pass.

Description

This replaces the asserts in brgemm that make test_benchdnn_modeC_graph_ci_cpu segfault on AArch64 by replacing the asserts with status_t type in oneDNN. As well, the segfault of --graph --skip-impl=ref --case=op/f32/deconv.json test when built with gcc-10 and g++-10, which is part of the graph test batch.

Checklist

General

  • Do all unit and benchdnn tests (make test and make test_benchdnn_*) pass locally for each commit?
  • Have you formatted the code using clang-format?

Bug fixes

  • Have you included information on how to reproduce the issue (either in a github issue or in this PR)?
  • Have you added relevant regression tests?

Remove the asserts that cause segfaults in the test and handle
unsupported cases based on oneDNN status type. As well remove
the forgotten asserts in brgemm_matmul.cpp and modify
acl_deconvolution.hpp to make op/f32/deconv_bwd_d test pass.
@Radu2k Radu2k requested review from a team as code owners September 18, 2024 16:12
@github-actions github-actions bot added the platform:cpu-aarch64 Codeowner: @oneapi-src/onednn-cpu-aarch64 label Sep 18, 2024
Comment on lines +248 to +249
const dnn_mem_t &mem_dt = args.find(arg);
const dnn_mem_t &mem_fp = args_.find(arg);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this change needed? What's the issue with auto?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dzarukin Hi Dmitry, thanks for having a look into this. I have replaced auto with explicit data types for improved readability, consistency, and type safety. When debugging it was particularly useful to have the data type specified and I did go an extra mile checking that the function returns the same type. Should not be a problem but I am happy to review it if it is a deal breaker.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ve been working on this the past few days, and all tests for the Graph API should run fine. There are a few edge cases that fail but I’ll detail in an issue soon and shall be fixed in the foreseeable future.

Thanks for the approval!

Copy link
Contributor

@mgouicem mgouicem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you split the change to acl_deconvolution in a separate commit as it seems to fix a different issue (unrelated to brgemm)?

@Radu2k
Copy link
Contributor Author

Radu2k commented Sep 19, 2024

@mgouicem Hi Mourad, thanks for having a look into this. I understand your opinion but why I have added it to this PR is because --graph --skip-impl=ref --case=op/f32/deconv.json is part of test_benchdnn_modeC_graph_ci_cpu batch apparently, so it is a direct fix of the test. That being said it is the only place where we have spotted the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
platform:cpu-aarch64 Codeowner: @oneapi-src/onednn-cpu-aarch64
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants