cpu: aarch64: optimising memory/thread utilization in BRGEMM Matmul #2103

Shreyas-fuj · 2024-09-20T06:31:33Z

Description

This PR brings some optimizations to brgemm matmul operator by improving memory utilization and multithreading capabilities.

This PR contains the following changes:

Modification of blocking parameters for M,K,N based on some heuristics obtained by testing matmul on shapes of majority of language models.
Assembly level optimization which removes the necessity of the fadd() instruction before storing the accumulator results in destination matrix.

General

[y] Do all unit and benchdnn tests (make test and make test_benchdnn_*) pass locally for each commit?

make test

99% tests passed, 2 tests failed out of 200

Total Test time (real) = 2142.22 sec

The following tests FAILED:
	159 - test_graph_unit_dnnl_large_partition_usm_cpu (Failed)
	181 - test_benchdnn_modeC_graph_ci_cpu (Failed)
Errors while running CTest
Output from these tests are in: /home/shreyas/G/shr-fuj/oneDNN_open_source/build/Testing/Temporary/LastTest.log
Use "--rerun-failed --output-on-failure" to re-run the failed cases verbosely.
make: *** [Makefile:71: test] Error 8

[y] Have you formatted the code using clang-format?

src/cpu/aarch64/brgemm/brgemm_types.hpp

.gitignore

theComputeKid · 2024-09-23T13:54:17Z

@Sqvid can you please review this for the aarch64? Thanks.

Sqvid · 2024-09-23T18:45:54Z

@Shreyas-fuj Thanks for your contribution. In addition to some of the comments could you also squash and re-push with a commit message that follows the conventions outlined here: https://github.com/oneapi-src/oneDNN/blob/main/CONTRIBUTING.md#code-contribution-guidelines. Thanks!

vpirogov · 2024-09-23T18:58:02Z

We need a commit message linter...

Shreyas-fuj · 2024-09-24T04:53:30Z

Hi @Sqvid ,
I have squashed the commits in to a single one by following the guidelines. Thanks.

src/cpu/aarch64/brgemm/brgemm_types.hpp

mgouicem

Hi @Sqvid , I have squashed the commits in to a single one by following the guidelines. Thanks.

Thanks. (nit) commit message: hueristics -> heuristics.

src/cpu/aarch64/brgemm/jit_brgemm_kernel.cpp

theComputeKid · 2024-09-24T15:37:16Z

@vpirogov we can have a python script called by a dedicated github action, that fails a PR if the start of the PR does not start with a word followed by a semi colon. I can help with this, we can create an issue and I can work on it if you think it is worth it. Thoughts?

vpirogov · 2024-09-24T19:34:10Z

@vpirogov we can have a python script called by a dedicated github action, that fails a PR if the start of the PR does not start with a word followed by a semi colon. I can help with this, we can create an issue and I can work on it if you think it is worth it. Thoughts?

Sounds like a plan. In ideal world I would use an off the shelf action for this, but I don't see any. The only tool I was able to find is CommitLint.

Sqvid · 2024-09-25T10:18:12Z

The only tool I was able to find is CommitLint

CommitLint looks very capable for the task. The only issue I could see (which is not necessarily a bad thing) is that it allows the sub-category to be in parenthesis cpu(aarch64): rather than consecutive colons cpu: aarch64:. However, it's not a terrible thing if we were to follow a standard like Conventional Commits.

Either way a custom parser should be fairly easy to throw together with some regex.

Shreyas-fuj · 2024-09-26T07:04:36Z

@mgouicem, @dzarukin, please let me know if any more changes are required for approval, thanks.

Shreyas-fuj · 2024-09-27T16:09:56Z

@mgouicem @dzarukin, wanted to know if anything from my side is remaining for approval of the PR, thanks.

mgouicem · 2024-09-27T16:14:41Z

Hi @Shreyas-fuj. This looks fine on my side.
Given the aarch64 target, you will need approval from @oneapi-src/onednn-cpu-aarch64 code owners.

Shreyas-fuj · 2024-09-28T14:06:02Z

@theComputeKid, @Sqvid requesting for review and approval of the PR, thanks.

Sqvid · 2024-09-30T15:07:22Z

@Shreyas-fuj Yes looking into now. Will revert in ~48 hours. Thanks.

Sqvid · 2024-10-02T14:55:02Z

@Shreyas-fuj performance looks good to me.

Just a couple of nitpicks I'd appreciate if you could address:
1. Two files are no longer needed in the patchset (as the changes have been removed).
2. In some cases you have deleted or moved code and left a comment in the original location to the effect of "deleted this.". This makes sense in a side-by-side diff but will be confusing to a future reader.

Other than this I'm happy to approve the changes. Thanks.

Shreyas-fuj · 2024-10-03T05:01:10Z

@Shreyas-fuj performance looks good to me.

Just a couple of nitpicks I'd appreciate if you could address: 1. Two files are no longer needed in the patchset (as the changes have been removed). 2. In some cases you have deleted or moved code and left a comment in the original location to the effect of "deleted this.". This makes sense in a side-by-side diff but will be confusing to a future reader.

Other than this I'm happy to approve the changes. Thanks.

@Sqvid , I have applied the suggested changes, please have a look, thanks.

Radu2k

LGTM.

spalicki · 2024-10-03T19:03:18Z

@Shreyas-fuj Could you please rebase the changes on top of main and squash the commits? Or change the PR name to match the desired commit title. If you could do both it would be even better - merging PR with multiple commits squashes the PR (we want to avoid merge commits on release branches) and takes the commit message from PR title, which currently is too long and not in sync with our guidelines.

Shreyas-fuj · 2024-10-04T04:55:11Z

@Shreyas-fuj Could you please rebase the changes on top of main and squash the commits? Or change the PR name to match the desired commit title. If you could do both it would be even better - merging PR with multiple commits squashes the PR (we want to avoid merge commits on release branches) and takes the commit message from PR title, which currently is too long and not in sync with our guidelines.

@spalicki , I have rebased and squashed the commits, and renamed the PR as suggested, thanks.

Shreyas-fuj requested review from a team as code owners September 20, 2024 06:31

Shreyas-fuj changed the title ~~Changed blocking in heuristics for M,K,N in BRGEMM Matmul to enable better memory/thread utilization~~ Changed blocking heuristics for M,K,N in BRGEMM Matmul to enable better memory/thread utilization Sep 20, 2024

Shreyas-fuj changed the title ~~Changed blocking heuristics for M,K,N in BRGEMM Matmul to enable better memory/thread utilization~~ Changed blocking heuristics for M,K,N in BRGEMM Matmul to enable better memory/thread utilization in aarch64 Sep 20, 2024

dzarukin reviewed Sep 20, 2024

View reviewed changes

src/cpu/aarch64/brgemm/brgemm_types.hpp Outdated Show resolved Hide resolved

mgouicem reviewed Sep 23, 2024

View reviewed changes

.gitignore Outdated Show resolved Hide resolved

github-actions bot added the platform:cpu-aarch64 Codeowner: @oneapi-src/onednn-cpu-aarch64 label Sep 23, 2024

Shreyas-fuj force-pushed the brgemm_blocking_optimisation branch from 99af259 to af69a0b Compare September 24, 2024 04:51

mgouicem reviewed Sep 24, 2024

View reviewed changes

src/cpu/aarch64/brgemm/brgemm_types.hpp Outdated Show resolved Hide resolved

mgouicem reviewed Sep 24, 2024

View reviewed changes

Shreyas-fuj commented Sep 24, 2024

View reviewed changes

src/cpu/aarch64/brgemm/jit_brgemm_kernel.cpp Outdated Show resolved Hide resolved

Shreyas-fuj force-pushed the brgemm_blocking_optimisation branch from af69a0b to 80175f7 Compare September 26, 2024 04:49

mgouicem approved these changes Sep 27, 2024

View reviewed changes

theComputeKid mentioned this pull request Oct 2, 2024

github: workflows: Enable PR checks #2143

Merged

Radu2k approved these changes Oct 3, 2024

View reviewed changes

cpu: aarch64: matmul: optimise blocking hueristics for brgemm matmul

a957b2f

Shreyas-fuj force-pushed the brgemm_blocking_optimisation branch from df7e752 to a957b2f Compare October 4, 2024 04:52

Shreyas-fuj changed the title ~~Changed blocking heuristics for M,K,N in BRGEMM Matmul to enable better memory/thread utilization in aarch64~~ Optimising BRGEMM Matmul to enable better memory/thread utilization in aarch64 Oct 4, 2024

Shreyas-fuj changed the title ~~Optimising BRGEMM Matmul to enable better memory/thread utilization in aarch64~~ Optimising memory/thread utilization in BRGEMM Matmul for aarch64 Oct 4, 2024

Shreyas-fuj changed the title ~~Optimising memory/thread utilization in BRGEMM Matmul for aarch64~~ cpu: aarch64: optimising memory/thread utilization in BRGEMM Matmul Oct 4, 2024

spalicki merged commit 45ce1c8 into oneapi-src:main Oct 4, 2024
23 of 27 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cpu: aarch64: optimising memory/thread utilization in BRGEMM Matmul #2103

cpu: aarch64: optimising memory/thread utilization in BRGEMM Matmul #2103

Shreyas-fuj commented Sep 20, 2024

theComputeKid commented Sep 23, 2024

Sqvid commented Sep 23, 2024

vpirogov commented Sep 23, 2024

Shreyas-fuj commented Sep 24, 2024

mgouicem left a comment

theComputeKid commented Sep 24, 2024 •

edited

Loading

vpirogov commented Sep 24, 2024

Sqvid commented Sep 25, 2024

Shreyas-fuj commented Sep 26, 2024

Shreyas-fuj commented Sep 27, 2024

mgouicem commented Sep 27, 2024

Shreyas-fuj commented Sep 28, 2024 •

edited

Loading

Sqvid commented Sep 30, 2024

Sqvid commented Oct 2, 2024 •

edited

Loading

Shreyas-fuj commented Oct 3, 2024

Radu2k left a comment

spalicki commented Oct 3, 2024

Shreyas-fuj commented Oct 4, 2024 •

edited

Loading

cpu: aarch64: optimising memory/thread utilization in BRGEMM Matmul #2103

cpu: aarch64: optimising memory/thread utilization in BRGEMM Matmul #2103

Conversation

Shreyas-fuj commented Sep 20, 2024

Description

General

theComputeKid commented Sep 23, 2024

Sqvid commented Sep 23, 2024

vpirogov commented Sep 23, 2024

Shreyas-fuj commented Sep 24, 2024

mgouicem left a comment

Choose a reason for hiding this comment

theComputeKid commented Sep 24, 2024 • edited Loading

vpirogov commented Sep 24, 2024

Sqvid commented Sep 25, 2024

Shreyas-fuj commented Sep 26, 2024

Shreyas-fuj commented Sep 27, 2024

mgouicem commented Sep 27, 2024

Shreyas-fuj commented Sep 28, 2024 • edited Loading

Sqvid commented Sep 30, 2024

Sqvid commented Oct 2, 2024 • edited Loading

Shreyas-fuj commented Oct 3, 2024

Radu2k left a comment

Choose a reason for hiding this comment

spalicki commented Oct 3, 2024

Shreyas-fuj commented Oct 4, 2024 • edited Loading

theComputeKid commented Sep 24, 2024 •

edited

Loading

Shreyas-fuj commented Sep 28, 2024 •

edited

Loading

Sqvid commented Oct 2, 2024 •

edited

Loading

Shreyas-fuj commented Oct 4, 2024 •

edited

Loading