Issue in the middle of training #2

danperazzo · 2021-07-31T17:37:18Z

Hello! Again, thanks for releasing this code!
I got this error in the middle of training:

l.data.data_pipeline[246001] DEBUG Loading from file data/MineRLObtainIronPickaxeVectorObf-v0/v3_red_guava_merman-2_181-16308
store: 23941it [25:12, 15.83it/s]
post-process: 100%|████████████████| 1981888/1981888 [20:11<00:00, 1636.03it/s]
1981888 experiences moved into hdf5 file
100%|██████████████████████████████| 1981888/1981888 [19:07<00:00, 1726.80it/s]
Original num of samples: 1981888
New num of samples: 917409
dataset: 60%|███████████████████▊ | 9/15 [24:11<17:55, 179.18s/it]Skipping episodes/episode_starts. Has different amount of samples than expected.
dataset: 93%|█████████████████████████████▊ | 14/15 [41:07<03:01, 181.33s/it]Skipping vector_centroids. Has different amount of samples than expected.
dataset: 100%|████████████████████████████████| 15/15 [41:07<00:00, 164.52s/it]
Steps 0 Time 1 Avrg loss 7.75259
train: 0%| | 8246/3669636 [08:20<61:43:01, 16.48it/s]
Traceback (most recent call last):
File "train.py", line 110, in
main()
File "train.py", line 107, in main
main_train_bc(parsed_args, remaining_args)
File "/home/daniel/Desktop/minecraft-bc-2020/train_bc_lstm.py", line 203, in main
network_output, new_states = network(pov, obs_vector, hidden_states=hidden_states, return_sequence=True)
File "/home/daniel/anaconda3/envs/minerl/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/daniel/Desktop/minecraft-bc-2020/torch_codes/modules.py", line 347, in forward
x = self.cnn_head(x)
File "/home/daniel/anaconda3/envs/minerl/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/daniel/Desktop/minecraft-bc-2020/torch_codes/modules.py", line 107, in forward
x = self.blocks(x)
File "/home/daniel/anaconda3/envs/minerl/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/daniel/anaconda3/envs/minerl/lib/python3.6/site-packages/torch/nn/modules/container.py", line 117, in forward
input = module(input)
File "/home/daniel/anaconda3/envs/minerl/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/daniel/anaconda3/envs/minerl/lib/python3.6/site-packages/torch/nn/modules/pooling.py", line 155, in forward
self.return_indices)
File "/home/daniel/anaconda3/envs/minerl/lib/python3.6/site-packages/torch/_jit_internal.py", line 267, in fn
return if_false(*args, **kwargs)
File "/home/daniel/anaconda3/envs/minerl/lib/python3.6/site-packages/torch/nn/functional.py", line 586, in _max_pool2d
input, kernel_size, stride, padding, dilation, ceil_mode)
RuntimeError: non-empty 3D or 4D input tensor expected but got ndim: 4

I do not know what could be causing this error :/

Miffyli · 2021-07-31T20:38:34Z

Hmm this suggests the pov tensor is empty when network(pov, ...) is called. One way this could happen if all episodes happen to end when the they were sampled from the dataset. Did you change the batch size to something smaller?

You could debug this by checking what the sample_mask contains. If all are "False", then that probably crashes the code. In that case you can skip to next iteration.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue in the middle of training #2

Issue in the middle of training #2

danperazzo commented Jul 31, 2021

Miffyli commented Jul 31, 2021

Issue in the middle of training #2

Issue in the middle of training #2

Comments

danperazzo commented Jul 31, 2021

Miffyli commented Jul 31, 2021