Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA out of memory error during model.fit() #104

Open
TimurNurlygayanov opened this issue Jan 22, 2022 · 2 comments
Open

CUDA out of memory error during model.fit() #104

TimurNurlygayanov opened this issue Jan 22, 2022 · 2 comments
Labels
bug Something isn't working

Comments

@TimurNurlygayanov
Copy link

I'm trying basic example from
https://towardsdatascience.com/build-a-custom-trained-object-detection-model-with-5-lines-of-code-713ba7f6c0fb

my video card is "NVIDIA GeForce MX150" (laptop) with 2 Gb video RAM.
OS: ubuntu 20.04 + NVidia driver 470

I have 61 custom images with marked object on them

when I execute this simple code:

from detecto import core, utils, visualize
dataset = core.Dataset('images_to_learn/')
model = core.Model(['my_object'])
model.fit(dataset)

it fails of model.fit(dataset) with the error:

Epoch 1 of 10
Begin iterating over training dataset
  0%|                                                           | 0/61 [00:00<?, ?it/s]/home/xwizard/.local/lib/python3.9/site-packages/torch/functional.py:445: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at  ../aten/src/ATen/native/TensorShape.cpp:2157.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
  2%|▊                                                  | 1/61 [00:02<02:07,  2.12s/it]
Traceback (most recent call last):
  File "/home/xwizard/test/main.py", line 24, in <module>
    model.fit(dataset)
  File "/home/xwizard/.local/lib/python3.9/site-packages/detecto/core.py", line 505, in fit
    loss_dict = self._model(images, targets)
  File "/home/xwizard/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/xwizard/.local/lib/python3.9/site-packages/torchvision/models/detection/generalized_rcnn.py", line 96, in forward
    proposals, proposal_losses = self.rpn(images, features, targets)
  File "/home/xwizard/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/xwizard/.local/lib/python3.9/site-packages/torchvision/models/detection/rpn.py", line 354, in forward
    proposals = self.box_coder.decode(pred_bbox_deltas.detach(), anchors)
  File "/home/xwizard/.local/lib/python3.9/site-packages/torchvision/models/detection/_utils.py", line 180, in decode
    pred_boxes = self.decode_single(
  File "/home/xwizard/.local/lib/python3.9/site-packages/torchvision/models/detection/_utils.py", line 223, in decode_single
    pred_boxes1 = pred_ctr_x - c_to_c_w
RuntimeError: CUDA out of memory. Tried to allocate 2.00 MiB (GPU 0; 1.96 GiB total capacity; 1.12 GiB already allocated; 2.88 MiB free; 1.26 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Pytorch just takes all available memory and crashes.

@TimurNurlygayanov TimurNurlygayanov added the bug Something isn't working label Jan 22, 2022
@alankbi
Copy link
Owner

alankbi commented Feb 1, 2022

Could you try some of the solutions listed in this post to see if any of those help?

@makya-stell
Copy link

By adding:

import gc
del dataset
gc.collect()

right before I created and ran my dataset, this fixed the issue. Hope this helps @TimurNurlygayanov

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants