Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue in the middle of training #2

Open
danperazzo opened this issue Jul 31, 2021 · 1 comment
Open

Issue in the middle of training #2

danperazzo opened this issue Jul 31, 2021 · 1 comment

Comments

@danperazzo
Copy link

Hello! Again, thanks for releasing this code!
I got this error in the middle of training:

l.data.data_pipeline[246001] DEBUG Loading from file data/MineRLObtainIronPickaxeVectorObf-v0/v3_red_guava_merman-2_181-16308
store: 23941it [25:12, 15.83it/s]
post-process: 100%|████████████████| 1981888/1981888 [20:11<00:00, 1636.03it/s]
1981888 experiences moved into hdf5 file
100%|██████████████████████████████| 1981888/1981888 [19:07<00:00, 1726.80it/s]
Original num of samples: 1981888
New num of samples: 917409
dataset: 60%|███████████████████▊ | 9/15 [24:11<17:55, 179.18s/it]Skipping episodes/episode_starts. Has different amount of samples than expected.
dataset: 93%|█████████████████████████████▊ | 14/15 [41:07<03:01, 181.33s/it]Skipping vector_centroids. Has different amount of samples than expected.
dataset: 100%|████████████████████████████████| 15/15 [41:07<00:00, 164.52s/it]
Steps 0 Time 1 Avrg loss 7.75259
train: 0%| | 8246/3669636 [08:20<61:43:01, 16.48it/s]
Traceback (most recent call last):
File "train.py", line 110, in
main()
File "train.py", line 107, in main
main_train_bc(parsed_args, remaining_args)
File "/home/daniel/Desktop/minecraft-bc-2020/train_bc_lstm.py", line 203, in main
network_output, new_states = network(pov, obs_vector, hidden_states=hidden_states, return_sequence=True)
File "/home/daniel/anaconda3/envs/minerl/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/daniel/Desktop/minecraft-bc-2020/torch_codes/modules.py", line 347, in forward
x = self.cnn_head(x)
File "/home/daniel/anaconda3/envs/minerl/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/daniel/Desktop/minecraft-bc-2020/torch_codes/modules.py", line 107, in forward
x = self.blocks(x)
File "/home/daniel/anaconda3/envs/minerl/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/daniel/anaconda3/envs/minerl/lib/python3.6/site-packages/torch/nn/modules/container.py", line 117, in forward
input = module(input)
File "/home/daniel/anaconda3/envs/minerl/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/daniel/anaconda3/envs/minerl/lib/python3.6/site-packages/torch/nn/modules/pooling.py", line 155, in forward
self.return_indices)
File "/home/daniel/anaconda3/envs/minerl/lib/python3.6/site-packages/torch/_jit_internal.py", line 267, in fn
return if_false(*args, **kwargs)
File "/home/daniel/anaconda3/envs/minerl/lib/python3.6/site-packages/torch/nn/functional.py", line 586, in _max_pool2d
input, kernel_size, stride, padding, dilation, ceil_mode)
RuntimeError: non-empty 3D or 4D input tensor expected but got ndim: 4

I do not know what could be causing this error :/

@Miffyli
Copy link
Owner

Miffyli commented Jul 31, 2021

Hmm this suggests the pov tensor is empty when network(pov, ...) is called. One way this could happen if all episodes happen to end when the they were sampled from the dataset. Did you change the batch size to something smaller?

You could debug this by checking what the sample_mask contains. If all are "False", then that probably crashes the code. In that case you can skip to next iteration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants