Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compact legendre polynomials #164

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

lukasm91
Copy link
Collaborator

Legendre polynomials don't need to be stored zero-padded, we can just concatenate them with proper zero padding.

E.g. on tco2559, this saves up to almost 40GB per rank (used to be 64 GB, now it is 26 GB). Initially, we did this for Leonardo, because legendre coefficients used almost the whole device memory.

@lukasm91 lukasm91 changed the base branch from main to develop October 16, 2024 11:34
@samhatfield samhatfield added the enhancement New feature or request label Oct 17, 2024
@lukasm91 lukasm91 marked this pull request as draft October 30, 2024 09:07
@lukasm91
Copy link
Collaborator Author

(convert to draft for the moment ==> it is fnished but we should first merge the other PR, then rebase and review)

@samhatfield samhatfield added the gpu label Dec 9, 2024
@lukasm91 lukasm91 force-pushed the compact-legendre-polynomials branch from 04194bc to d0d2349 Compare December 16, 2024 08:51
@lukasm91 lukasm91 force-pushed the compact-legendre-polynomials branch from d0d2349 to 2a87188 Compare December 16, 2024 09:00
@lukasm91
Copy link
Collaborator Author

lukasm91 commented Dec 16, 2024

Beside replacing ext_acc with the proper "copy module", we are only left with this PR from the old GPU branch which I now rebased on top of develop. This PR is compacting the legendre polynomials and with this removing zero padding.

This PR is not expected to interfere with any other PR, as it only touches the GEMMs. Since I am touching the CUDA interfaces anyway, I also added const to pointers in the interface.

  • Applied clang-format
  • Tested that everything is properly deallocated
  • Tested large sizes for no overflows.

@samhatfield @wdeconinck Feel free to review when you find time.

@lukasm91 lukasm91 marked this pull request as ready for review December 16, 2024 09:06
Copy link
Collaborator

@wdeconinck wdeconinck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks all fine and what a nice saving!
Perhaps @samhatfield can run it in few places to double check all is in order ?

@samhatfield
Copy link
Collaborator

If I can find the time this week, I will take a look. It might have to wait until next year.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request gpu
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants