Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memory not free after backward #13

Open
WeichaoCode opened this issue Dec 3, 2024 · 6 comments
Open

memory not free after backward #13

WeichaoCode opened this issue Dec 3, 2024 · 6 comments

Comments

@WeichaoCode
Copy link

WeichaoCode commented Dec 3, 2024

Hi, I try to use the discrete adjoint method and print the GPU memory usage during training.
memory before scheduling: 16.37 MB
Memory after scheduling: 16.38 MB
Memory after backward pass: 16.38 MB
Iter 0020 | Total Loss 0.609421
memory before scheduling: 16.48 MB
Memory after scheduling: 16.49 MB
Memory after backward pass: 16.49 MB
Iter 0040 | Total Loss 0.599637
memory before scheduling: 16.59 MB
Memory after scheduling: 16.59 MB
Memory after backward pass: 16.59 MB
Iter 0060 | Total Loss 0.530792
memory before scheduling: 16.70 MB
Memory after scheduling: 16.70 MB
Memory after backward pass: 16.70 MB
Iter 0080 | Total Loss 0.893818
memory before scheduling: 16.80 MB
Memory after scheduling: 16.81 MB

I add those lines in the code:
memory_before_scheduling = show_net_dyn_memory_usage()
pred_y = ode.odeint_adjoint(batch_y0, batch_t)
# memory_before_scheduling = show_net_dyn_memory_usage()
# pred_y = scheduling(epsilon=0)
# memory_after_scheduling = show_net_dyn_memory_usage()
loss = torch.mean(torch.abs(pred_y - batch_y))
memory_after_scheduling = show_net_dyn_memory_usage()
loss.backward()
memory_after_backward = show_net_dyn_memory_usage()
# memory_after_backward = show_net_dyn_memory_usage()
optimizer.step()

The GPU memory increases and the Memory after the backward is not the same as the memory before scheduling 16.37 MB, I think it will always be around 16.37 MB if the memory used during the backward is freed. I believe the original Pytorch backward will free the memory, so could you help me understand it?

If I make a mistake or the code implementation uses more memory,

Thanks!

@caidao22
Copy link
Owner

caidao22 commented Dec 3, 2024

Which example were you running for the experiment above?

@WeichaoCode
Copy link
Author

Which example were you running for the experiment above?
I use ode_demo_petsc.py.

@caidao22
Copy link
Owner

caidao22 commented Dec 4, 2024

From the results you posted, it is difficult to tell when the GPU memory consumption increases. But the memory after the backward run (16.38MB)is pretty much the same as the memory before scheduling (16.37MB), which is expected.

In addition, this is not a good example for memory study because it is way too small.

The GPU memory increases and the Memory after the backward is not the same as the memory before scheduling 16.37 MB, I think it will always be around 16.37 MB if the memory used during the backward is freed.

@WeichaoCode
Copy link
Author

From the results you posted, it is difficult to tell when the GPU memory consumption increases. But the memory after the backward run (16.38MB)is pretty much the same as the memory before scheduling (16.37MB), which is expected.

In addition, this is not a good example for memory study because it is way too small.

The GPU memory increases and the Memory after the backward is not the same as the memory before scheduling 16.37 MB, I think it will always be around 16.37 MB if the memory used during the backward is freed.

Thanks for your help. But the GPU memory for each train epoch should be the same before doing forward like run "pred_y = ode.odeint_adjoint(batch_y0, batch_t)", for example, it should always be around 16MB or 17MB every epoch. however, if run this code for 100 epoch or more, the GPU memory keep increase, increase about 0.1-0.2 MB per epoch in this case. I also run it in more complex cases. So is it normal?

@caidao22
Copy link
Owner

caidao22 commented Dec 5, 2024

You can run the code line by line in a debugger and monitor the memory consumption to find out when the memory used increases by 0,1MB.

@WeichaoCode
Copy link
Author

You can run the code line by line in a debugger and monitor the memory consumption to find out when the memory used increases by 0,1MB.

OK, thanks for your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants