memory not free after backward #13

WeichaoCode · 2024-12-03T16:01:38Z

Hi, I try to use the discrete adjoint method and print the GPU memory usage during training.
memory before scheduling: 16.37 MB
Memory after scheduling: 16.38 MB
Memory after backward pass: 16.38 MB
Iter 0020 | Total Loss 0.609421
memory before scheduling: 16.48 MB
Memory after scheduling: 16.49 MB
Memory after backward pass: 16.49 MB
Iter 0040 | Total Loss 0.599637
memory before scheduling: 16.59 MB
Memory after scheduling: 16.59 MB
Memory after backward pass: 16.59 MB
Iter 0060 | Total Loss 0.530792
memory before scheduling: 16.70 MB
Memory after scheduling: 16.70 MB
Memory after backward pass: 16.70 MB
Iter 0080 | Total Loss 0.893818
memory before scheduling: 16.80 MB
Memory after scheduling: 16.81 MB

I add those lines in the code:
memory_before_scheduling = show_net_dyn_memory_usage()
pred_y = ode.odeint_adjoint(batch_y0, batch_t)
# memory_before_scheduling = show_net_dyn_memory_usage()
# pred_y = scheduling(epsilon=0)
# memory_after_scheduling = show_net_dyn_memory_usage()
loss = torch.mean(torch.abs(pred_y - batch_y))
memory_after_scheduling = show_net_dyn_memory_usage()
loss.backward()
memory_after_backward = show_net_dyn_memory_usage()
# memory_after_backward = show_net_dyn_memory_usage()
optimizer.step()

The GPU memory increases and the Memory after the backward is not the same as the memory before scheduling 16.37 MB, I think it will always be around 16.37 MB if the memory used during the backward is freed. I believe the original Pytorch backward will free the memory, so could you help me understand it?

If I make a mistake or the code implementation uses more memory,

Thanks!

caidao22 · 2024-12-03T16:09:25Z

Which example were you running for the experiment above?

WeichaoCode · 2024-12-04T00:15:32Z

Which example were you running for the experiment above?
I use ode_demo_petsc.py.

caidao22 · 2024-12-04T17:41:22Z

From the results you posted, it is difficult to tell when the GPU memory consumption increases. But the memory after the backward run (16.38MB)is pretty much the same as the memory before scheduling (16.37MB), which is expected.

In addition, this is not a good example for memory study because it is way too small.

The GPU memory increases and the Memory after the backward is not the same as the memory before scheduling 16.37 MB, I think it will always be around 16.37 MB if the memory used during the backward is freed.

WeichaoCode · 2024-12-05T03:03:29Z

From the results you posted, it is difficult to tell when the GPU memory consumption increases. But the memory after the backward run (16.38MB)is pretty much the same as the memory before scheduling (16.37MB), which is expected.

In addition, this is not a good example for memory study because it is way too small.

The GPU memory increases and the Memory after the backward is not the same as the memory before scheduling 16.37 MB, I think it will always be around 16.37 MB if the memory used during the backward is freed.

Thanks for your help. But the GPU memory for each train epoch should be the same before doing forward like run "pred_y = ode.odeint_adjoint(batch_y0, batch_t)", for example, it should always be around 16MB or 17MB every epoch. however, if run this code for 100 epoch or more, the GPU memory keep increase, increase about 0.1-0.2 MB per epoch in this case. I also run it in more complex cases. So is it normal?

caidao22 · 2024-12-05T14:59:27Z

You can run the code line by line in a debugger and monitor the memory consumption to find out when the memory used increases by 0,1MB.

WeichaoCode · 2024-12-05T20:04:19Z

You can run the code line by line in a debugger and monitor the memory consumption to find out when the memory used increases by 0,1MB.

OK, thanks for your help.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

memory not free after backward #13

memory not free after backward #13

WeichaoCode commented Dec 3, 2024 •

edited

Loading

caidao22 commented Dec 3, 2024

WeichaoCode commented Dec 4, 2024

caidao22 commented Dec 4, 2024

WeichaoCode commented Dec 5, 2024

caidao22 commented Dec 5, 2024

WeichaoCode commented Dec 5, 2024

memory not free after backward #13

memory not free after backward #13

Comments

WeichaoCode commented Dec 3, 2024 • edited Loading

caidao22 commented Dec 3, 2024

WeichaoCode commented Dec 4, 2024

caidao22 commented Dec 4, 2024

WeichaoCode commented Dec 5, 2024

caidao22 commented Dec 5, 2024

WeichaoCode commented Dec 5, 2024

WeichaoCode commented Dec 3, 2024 •

edited

Loading