Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

training error on colab #43

Open
hb0313 opened this issue Sep 14, 2022 · 3 comments
Open

training error on colab #43

hb0313 opened this issue Sep 14, 2022 · 3 comments

Comments

@hb0313
Copy link

hb0313 commented Sep 14, 2022

My all setup is successful on colab for training. However, when I run

!python tools/train.py --cfg configs/CONFIG_FILE.yaml

I get error:

Found 20210 training images.
Found 2000 validation images.
Epoch: [1/500] Iter: [0/2526] LR: 0.00100000 Loss: 0.00000000: 0% 0/2526 [00:00<?, ?it/s]
Traceback (most recent call last):
File "tools/train.py", line 128, in
main(cfg, gpu, save_dir)
File "tools/train.py", line 69, in main
for iter, (img, lbl) in pbar:
File "/usr/local/lib/python3.7/dist-packages/tqdm/std.py", line 1195, in iter
for obj in iterable:
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 681, in next
data = self._next_data()
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1376, in _next_data
return self._process_data(data)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1402, in _process_data
data.reraise()
File "/usr/local/lib/python3.7/dist-packages/torch/_utils.py", line 461, in reraise
raise exception
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
data = fetcher.fetch(index)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 49, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/content/semantic-segmentation/semseg/datasets/ade20k.py", line 73, in getitem
image, label = self.transform(image, label)
File "/content/semantic-segmentation/semseg/augmentations.py", line 20, in call
img, mask = transform(img, mask)
File "/content/semantic-segmentation/semseg/augmentations.py", line 329, in call
mask = TF.pad(mask, padding, fill=self.seg_fill)
File "/usr/local/lib/python3.7/dist-packages/torchvision/transforms/functional.py", line 481, in pad
return F_t.pad(img, padding=padding, fill=fill, padding_mode=padding_mode)
File "/usr/local/lib/python3.7/dist-packages/torchvision/transforms/functional_tensor.py", line 418, in pad
img = torch_pad(img, p, mode=padding_mode, value=float(fill))
RuntimeError: value cannot be converted to type uint8_t without overflow

@sithu31296
Copy link
Owner

I think it is the pytorch version mismatch error. Please try different pytorch version.

@scl666
Copy link

scl666 commented Oct 11, 2022

I think it is the pytorch version mismatch error. Please try different pytorch version.

Hello, I use the camvid to train,get the error:
min_value = pred[min(self.min_kept, pred.numel() - 1)]
IndexError: index -1 is out of bounds for dimension 0 with size 0

@ilanaKarimov
Copy link

Hello,

I encountered the same error, and updating torch and torchvision did not resolve it. The issue appears to arise when seg_fill receives a value of -1, as defined in the ade20k config file (IGNORE_LABEL: -1). Changing the value of IGNORE_LABEL resolved the problem. Could you please advise on the appropriate value that IGNORE_LABEL should be set to?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants