Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expanding the scope of 2D thread distribution to improve multi-threaded DGEMM performance #4655

Conversation

yamazakimitsufumi
Copy link
Contributor

This pull request proposes a fix to the issue #4644 .

The currently implemented 2D thread distribution in level3_thread.c works well for small matrices, however, it falls into a simple one-dimensional distribution in the M direction as the size of matirix becomes larger.​ This pull request improves thread parallell performance for large matrices by expanding the scope of 2D thread distribution.​

Performance improved by about 10% on Graviton3E (64 cores) and more than 20% on Xeon Platinum 8375C (32 cores x 2 sockets).​

Graviton3E
1

Xeon Platinum 8375C
2

The calculations are distributed so that each thread handles about the same size of the range in the M and N directions, even when the input matrices are rectangle.​ Although not confirmed on all platforms, this relatively simple fix is expected to be generally effective on modern manycore CPUs.​

@martin-frbg
Copy link
Collaborator

That's impressively elegant and effective.

@martin-frbg martin-frbg added this to the 0.3.28 milestone Apr 18, 2024
@rageshhajela16
Copy link

@martin-frbg Thanks for the review. Are there any other comments for us to incorporate? CI failures seem to be not related to the code changes, might be environment issues, if you can please help to review and confirm. Thanks
cc: @yamazakimitsufumi

@martin-frbg martin-frbg merged commit 6ca9ffa into OpenMathLib:develop May 14, 2024
67 of 70 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants