Memory profiling #266

wiederm · 2024-10-07T10:21:56Z

Pull Request Summary

This PR adds infrastructure to perform GPU memory profiling as described in this series of blog posts by the PyTorch folks. I have added functions to initialize, record and save GPU memory traces and a notebook that uses these functions to profile forward/backward pass on a 25A waterbox.

I also added a notebook that compares the timing and memory GPU consumption of each of the implemented potentials.

This PR also adds optimization for each of the implemented networks.

ANI architecture

Investigating the memory trace shows that the computation of the angular aev allocates the largest junk of the memory:

This is due to the creation of large intermediate tensors due to broadcasting over multiple dimensions; refactoring this saves around 100 MB of GPU memory.

refactor large memory allocation in angular aev

Key changes

adding profile module with helper functions to profile GPU memory traces

Associated Issue(s)

Issue 1

Pull Request Checklist

Issue(s) raised/addressed and linked
Includes appropriate unit test(s)
Appropriate docstring(s) added/updated
Appropriate .rst doc file(s) added/updated
PR is ready for review

…rofiler

codecov-commenter · 2024-10-10T15:32:40Z

Codecov Report

Attention: Patch coverage is 19.31034% with 117 lines in your changes missing coverage. Please review.

Project coverage is 83.49%. Comparing base (7b92c63) to head (df04e2b).

Additional details and impacted files

…lforge into memory-profiling

…rivatives)

wiederm · 2024-11-10T15:16:51Z

Continue in PR #318

wiederm and others added 2 commits October 4, 2024 22:23

adding notebook for memory profiling

abb9569

notebook and module to profile GPU memory utilization using PyTorch p…

01164dd

…rofiler

wiederm marked this pull request as draft October 7, 2024 10:22

wiederm self-assigned this Oct 7, 2024

wiederm and others added 9 commits October 7, 2024 12:44

Merge branch 'main' into memory-profiling

0bba855

optimize memory allocation in the ANI network architecture

4a2e6ef

minor modifications to schnet, using in place operation where possible

e543b0f

small modifications for PaiNN

511654c

remove gradient calculation for parameters in inference mode

269f4b3

Merge branch 'main' into memory-profiling

afe81bf

Merge branch 'main' into memory-profiling

98aa70e

fix bug

ef8ff17

upload profiling notbooks

ea5cf91

please the linter

64d81ed

wiederm added the enhancement New feature or request label Oct 10, 2024

Merge branch 'main' into memory-profiling

44f32f2

wiederm marked this pull request as ready for review October 11, 2024 09:20

wiederm requested a review from chrisiacovella October 11, 2024 09:20

wiederm and others added 10 commits October 11, 2024 19:38

add test for profiling functions

3731723

add tests

4ca1f84

Merge branch 'memory-profiling' of https://github.com/choderalab/mode…

9ef1fbb

…lforge into memory-profiling

import openmmtools

d47835c

skip the profining test if cuda is not available

e1af735

Merge branch 'main' into memory-profiling

08ca1bf

Merge branch 'main' into memory-profiling

279e4bf

update schnet

b8fdf38

Merge branch 'memory-profiling' of https://github.com/choderalab/mode…

3e5b492

…lforge into memory-profiling

bugfix (we need to retain graph if we want to calculate high order de…

68687c4

…rivatives)

wiederm linked an issue Oct 15, 2024 that may be closed by this pull request

Add functions to profile model performance #282

Open

wiederm marked this pull request as draft October 15, 2024 16:50

move functions from notebook to package

df04e2b

wiederm mentioned this pull request Oct 20, 2024

cherry picked changes from other PRs #291

Merged

2 tasks

wiederm closed this Nov 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory profiling #266

Memory profiling #266

wiederm commented Oct 7, 2024 •

edited

Loading

codecov-commenter commented Oct 10, 2024 •

edited

Loading

wiederm commented Nov 10, 2024

Memory profiling #266

Memory profiling #266

Conversation

wiederm commented Oct 7, 2024 • edited Loading

Pull Request Summary

ANI architecture

Key changes

Associated Issue(s)

Pull Request Checklist

codecov-commenter commented Oct 10, 2024 • edited Loading

Codecov Report

wiederm commented Nov 10, 2024

wiederm commented Oct 7, 2024 •

edited

Loading

codecov-commenter commented Oct 10, 2024 •

edited

Loading