Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cpu: aarch64: optimising memory/thread utilization in BRGEMM Matmul #2103

Merged
merged 1 commit into from
Oct 4, 2024

Conversation

Shreyas-fuj
Copy link
Contributor

Description

This PR brings some optimizations to brgemm matmul operator by improving memory utilization and multithreading capabilities.

This PR contains the following changes:

  • Modification of blocking parameters for M,K,N based on some heuristics obtained by testing matmul on shapes of majority of language models.
  • Assembly level optimization which removes the necessity of the fadd() instruction before storing the accumulator results in destination matrix.

General

  • [y] Do all unit and benchdnn tests (make test and make test_benchdnn_*) pass locally for each commit?
make test

99% tests passed, 2 tests failed out of 200

Total Test time (real) = 2142.22 sec

The following tests FAILED:
	159 - test_graph_unit_dnnl_large_partition_usm_cpu (Failed)
	181 - test_benchdnn_modeC_graph_ci_cpu (Failed)
Errors while running CTest
Output from these tests are in: /home/shreyas/G/shr-fuj/oneDNN_open_source/build/Testing/Temporary/LastTest.log
Use "--rerun-failed --output-on-failure" to re-run the failed cases verbosely.
make: *** [Makefile:71: test] Error 8

  • [y] Have you formatted the code using clang-format?

@Shreyas-fuj Shreyas-fuj requested review from a team as code owners September 20, 2024 06:31
@Shreyas-fuj Shreyas-fuj changed the title Changed blocking in heuristics for M,K,N in BRGEMM Matmul to enable better memory/thread utilization Changed blocking heuristics for M,K,N in BRGEMM Matmul to enable better memory/thread utilization Sep 20, 2024
@Shreyas-fuj Shreyas-fuj changed the title Changed blocking heuristics for M,K,N in BRGEMM Matmul to enable better memory/thread utilization Changed blocking heuristics for M,K,N in BRGEMM Matmul to enable better memory/thread utilization in aarch64 Sep 20, 2024
.gitignore Outdated Show resolved Hide resolved
@theComputeKid
Copy link
Contributor

@Sqvid can you please review this for the aarch64? Thanks.

@Sqvid
Copy link
Contributor

Sqvid commented Sep 23, 2024

@Shreyas-fuj Thanks for your contribution. In addition to some of the comments could you also squash and re-push with a commit message that follows the conventions outlined here: https://github.com/oneapi-src/oneDNN/blob/main/CONTRIBUTING.md#code-contribution-guidelines. Thanks!

@vpirogov
Copy link
Member

We need a commit message linter...

@github-actions github-actions bot added the platform:cpu-aarch64 Codeowner: @oneapi-src/onednn-cpu-aarch64 label Sep 23, 2024
@Shreyas-fuj
Copy link
Contributor Author

Hi @Sqvid ,
I have squashed the commits in to a single one by following the guidelines. Thanks.

Copy link
Contributor

@mgouicem mgouicem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Sqvid , I have squashed the commits in to a single one by following the guidelines. Thanks.

Thanks. (nit) commit message: hueristics -> heuristics.

@theComputeKid
Copy link
Contributor

theComputeKid commented Sep 24, 2024

@vpirogov we can have a python script called by a dedicated github action, that fails a PR if the start of the PR does not start with a word followed by a semi colon. I can help with this, we can create an issue and I can work on it if you think it is worth it. Thoughts?

@vpirogov
Copy link
Member

@vpirogov we can have a python script called by a dedicated github action, that fails a PR if the start of the PR does not start with a word followed by a semi colon. I can help with this, we can create an issue and I can work on it if you think it is worth it. Thoughts?

Sounds like a plan. In ideal world I would use an off the shelf action for this, but I don't see any. The only tool I was able to find is CommitLint.

@Sqvid
Copy link
Contributor

Sqvid commented Sep 25, 2024

The only tool I was able to find is CommitLint

CommitLint looks very capable for the task. The only issue I could see (which is not necessarily a bad thing) is that it allows the sub-category to be in parenthesis cpu(aarch64): rather than consecutive colons cpu: aarch64:. However, it's not a terrible thing if we were to follow a standard like Conventional Commits.

Either way a custom parser should be fairly easy to throw together with some regex.

@Shreyas-fuj
Copy link
Contributor Author

@mgouicem, @dzarukin, please let me know if any more changes are required for approval, thanks.

@Shreyas-fuj
Copy link
Contributor Author

@mgouicem @dzarukin, wanted to know if anything from my side is remaining for approval of the PR, thanks.

@mgouicem
Copy link
Contributor

Hi @Shreyas-fuj. This looks fine on my side.
Given the aarch64 target, you will need approval from @oneapi-src/onednn-cpu-aarch64 code owners.

@Shreyas-fuj
Copy link
Contributor Author

Shreyas-fuj commented Sep 28, 2024

@theComputeKid, @Sqvid requesting for review and approval of the PR, thanks.

@Sqvid
Copy link
Contributor

Sqvid commented Sep 30, 2024

@Shreyas-fuj Yes looking into now. Will revert in ~48 hours. Thanks.

@Sqvid
Copy link
Contributor

Sqvid commented Oct 2, 2024

@Shreyas-fuj performance looks good to me.

Just a couple of nitpicks I'd appreciate if you could address:
1. Two files are no longer needed in the patchset (as the changes have been removed).
2. In some cases you have deleted or moved code and left a comment in the original location to the effect of "deleted this.". This makes sense in a side-by-side diff but will be confusing to a future reader.

Other than this I'm happy to approve the changes. Thanks.

@Shreyas-fuj
Copy link
Contributor Author

@Shreyas-fuj performance looks good to me.

Just a couple of nitpicks I'd appreciate if you could address: 1. Two files are no longer needed in the patchset (as the changes have been removed). 2. In some cases you have deleted or moved code and left a comment in the original location to the effect of "deleted this.". This makes sense in a side-by-side diff but will be confusing to a future reader.

Other than this I'm happy to approve the changes. Thanks.

@Sqvid , I have applied the suggested changes, please have a look, thanks.

Copy link
Contributor

@Radu2k Radu2k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@spalicki
Copy link
Contributor

spalicki commented Oct 3, 2024

@Shreyas-fuj Could you please rebase the changes on top of main and squash the commits? Or change the PR name to match the desired commit title. If you could do both it would be even better - merging PR with multiple commits squashes the PR (we want to avoid merge commits on release branches) and takes the commit message from PR title, which currently is too long and not in sync with our guidelines.

@Shreyas-fuj
Copy link
Contributor Author

Shreyas-fuj commented Oct 4, 2024

@Shreyas-fuj Could you please rebase the changes on top of main and squash the commits? Or change the PR name to match the desired commit title. If you could do both it would be even better - merging PR with multiple commits squashes the PR (we want to avoid merge commits on release branches) and takes the commit message from PR title, which currently is too long and not in sync with our guidelines.

@spalicki , I have rebased and squashed the commits, and renamed the PR as suggested, thanks.

@Shreyas-fuj Shreyas-fuj changed the title Changed blocking heuristics for M,K,N in BRGEMM Matmul to enable better memory/thread utilization in aarch64 Optimising BRGEMM Matmul to enable better memory/thread utilization in aarch64 Oct 4, 2024
@Shreyas-fuj Shreyas-fuj changed the title Optimising BRGEMM Matmul to enable better memory/thread utilization in aarch64 Optimising memory/thread utilization in BRGEMM Matmul for aarch64 Oct 4, 2024
@Shreyas-fuj Shreyas-fuj changed the title Optimising memory/thread utilization in BRGEMM Matmul for aarch64 cpu: aarch64: optimising memory/thread utilization in BRGEMM Matmul Oct 4, 2024
@spalicki spalicki merged commit 45ce1c8 into oneapi-src:main Oct 4, 2024
23 of 27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
platform:cpu-aarch64 Codeowner: @oneapi-src/onednn-cpu-aarch64
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants