When you are training a model on GPU, if you face an issue like below

RuntimeError: CUDA out of memory. Tried to allocate 142.00 MiB (GPU 0; 11.17 GiB total capacity; 10.25 GiB already allocated; 104.44 MiB free; 10.66 GiB reserved in total by PyTorch)

The only way to recover and proceed forward is to restart the ipython kernel.

From fastai docs on GPU RAM Fragmentation

Given that GPU RAM is a scarce resource, it helps to always try free up anything that’s on CUDA as soon as you’re done using it, and only then move new objects to CUDA. Normally a simple del obj does the trick. However, if your object has circular references in it, it will not be freed despite the del() call, until gc.collect() will not be called by python. And until the latter happens, it’ll still hold the allocated GPU RAM! And that also means that in some situations you may want to call gc.collect() yourself.

Tricks to minimize the gpu memory usage

del() call on the objects once you are done with them
explicitly call gc.collect
use smaller batch size

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!