Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory profiling #266

Closed
wants to merge 24 commits into from
Closed

Memory profiling #266

wants to merge 24 commits into from

Conversation

wiederm
Copy link
Member

@wiederm wiederm commented Oct 7, 2024

Pull Request Summary

This PR adds infrastructure to perform GPU memory profiling as described in this series of blog posts by the PyTorch folks. I have added functions to initialize, record and save GPU memory traces and a notebook that uses these functions to profile forward/backward pass on a 25A waterbox.

I also added a notebook that compares the timing and memory GPU consumption of each of the implemented potentials.

This PR also adds optimization for each of the implemented networks.

ANI architecture

Investigating the memory trace shows that the computation of the angular aev allocates the largest junk of the memory:
image

This is due to the creation of large intermediate tensors due to broadcasting over multiple dimensions; refactoring this saves around 100 MB of GPU memory.

  • refactor large memory allocation in angular aev

Key changes

  • adding profile module with helper functions to profile GPU memory traces

Associated Issue(s)

  • Issue 1

Pull Request Checklist

  • Issue(s) raised/addressed and linked
  • Includes appropriate unit test(s)
  • Appropriate docstring(s) added/updated
  • Appropriate .rst doc file(s) added/updated
  • PR is ready for review

@wiederm wiederm marked this pull request as draft October 7, 2024 10:22
@wiederm wiederm self-assigned this Oct 7, 2024
@codecov-commenter
Copy link

codecov-commenter commented Oct 10, 2024

Codecov Report

Attention: Patch coverage is 19.31034% with 117 lines in your changes missing coverage. Please review.

Project coverage is 83.49%. Comparing base (7b92c63) to head (df04e2b).

Additional details and impacted files

@wiederm wiederm added the enhancement New feature or request label Oct 10, 2024
@wiederm wiederm marked this pull request as ready for review October 11, 2024 09:20
@wiederm wiederm linked an issue Oct 15, 2024 that may be closed by this pull request
@wiederm wiederm marked this pull request as draft October 15, 2024 16:50
@wiederm wiederm mentioned this pull request Oct 20, 2024
2 tasks
@wiederm
Copy link
Member Author

wiederm commented Nov 10, 2024

Continue in PR #318

@wiederm wiederm closed this Nov 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add functions to profile model performance
2 participants