CUDA out of memory error during model.fit() #104

TimurNurlygayanov · 2022-01-22T14:50:26Z

I'm trying basic example from
https://towardsdatascience.com/build-a-custom-trained-object-detection-model-with-5-lines-of-code-713ba7f6c0fb

my video card is "NVIDIA GeForce MX150" (laptop) with 2 Gb video RAM.
OS: ubuntu 20.04 + NVidia driver 470

I have 61 custom images with marked object on them

when I execute this simple code:

from detecto import core, utils, visualize
dataset = core.Dataset('images_to_learn/')
model = core.Model(['my_object'])
model.fit(dataset)

it fails of model.fit(dataset) with the error:

Epoch 1 of 10
Begin iterating over training dataset
  0%|                                                           | 0/61 [00:00<?, ?it/s]/home/xwizard/.local/lib/python3.9/site-packages/torch/functional.py:445: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at  ../aten/src/ATen/native/TensorShape.cpp:2157.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
  2%|▊                                                  | 1/61 [00:02<02:07,  2.12s/it]
Traceback (most recent call last):
  File "/home/xwizard/test/main.py", line 24, in <module>
    model.fit(dataset)
  File "/home/xwizard/.local/lib/python3.9/site-packages/detecto/core.py", line 505, in fit
    loss_dict = self._model(images, targets)
  File "/home/xwizard/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/xwizard/.local/lib/python3.9/site-packages/torchvision/models/detection/generalized_rcnn.py", line 96, in forward
    proposals, proposal_losses = self.rpn(images, features, targets)
  File "/home/xwizard/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/xwizard/.local/lib/python3.9/site-packages/torchvision/models/detection/rpn.py", line 354, in forward
    proposals = self.box_coder.decode(pred_bbox_deltas.detach(), anchors)
  File "/home/xwizard/.local/lib/python3.9/site-packages/torchvision/models/detection/_utils.py", line 180, in decode
    pred_boxes = self.decode_single(
  File "/home/xwizard/.local/lib/python3.9/site-packages/torchvision/models/detection/_utils.py", line 223, in decode_single
    pred_boxes1 = pred_ctr_x - c_to_c_w
RuntimeError: CUDA out of memory. Tried to allocate 2.00 MiB (GPU 0; 1.96 GiB total capacity; 1.12 GiB already allocated; 2.88 MiB free; 1.26 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Pytorch just takes all available memory and crashes.

The text was updated successfully, but these errors were encountered:

alankbi · 2022-02-01T22:24:00Z

Could you try some of the solutions listed in this post to see if any of those help?

makya-stell · 2022-03-18T04:10:00Z

By adding:

import gc
del dataset
gc.collect()

right before I created and ran my dataset, this fixed the issue. Hope this helps @TimurNurlygayanov

TimurNurlygayanov added the bug Something isn't working label Jan 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA out of memory error during model.fit() #104

CUDA out of memory error during model.fit() #104

TimurNurlygayanov commented Jan 22, 2022

alankbi commented Feb 1, 2022

makya-stell commented Mar 18, 2022

CUDA out of memory error during model.fit() #104

CUDA out of memory error during model.fit() #104

Comments

TimurNurlygayanov commented Jan 22, 2022

alankbi commented Feb 1, 2022

makya-stell commented Mar 18, 2022