Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting stuck at validation step #4

Open
sindhura234 opened this issue Nov 18, 2021 · 9 comments
Open

Getting stuck at validation step #4

sindhura234 opened this issue Nov 18, 2021 · 9 comments

Comments

@sindhura234
Copy link

Epoch 0: 0%| | 0/10 [00:00<?, ?it/s]Trying to infer the batch_size from an ambiguous collection. The batch size we found is 1. To avoid any miscalculations, use self.log(..., batch_size=batch_size). The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and now uses scale_factor directly, instead of relying on the computed output size. If you wish to restore the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details.

Epoch 0: 70%|█████████ | 7/10 [00:14<00:06, 2.12s/it, loss=0.724, v_num=11]
Validating: 0it [00:00, ?it/s] Validating: 0%| | 0/3 [00:00<?, ?it/s]

I have been trying for larger dataset unlike the error here. But im always getting stuck at validation stage.
I tried with hippocampus dataset, the results are fine. But with my custom data, Im facing this problem. What could be the reason?

@hoangtan96dl
Copy link
Contributor

Can you set the num_sanity_val_steps arg of Trainer class to -1 and run it again? It will run the whole validation before starting the training.

I have never seen this error Trying to infer the batch_size from an ambiguous collection. The batch size we found is 1. To avoid any miscalculations, use self.log(..., batch_size=batch_size). then maybe there is a problem with the logging step and your data. I wrote it in validation_step and validation_epoch_end of LightningModule, you can try to comment out it first.

@noushinha
Copy link

noushinha commented Nov 19, 2021

@cndu234
I had this error in the beginning and I solved it.
Trying to recall what did I set in the trainer to get things to work.
I will check and write back to you.

@kingjames1155
Copy link

Do you remember how to solve this problem? I have the same problem as you. Thank you very much

@noushinha
Copy link

@kingjames1155 Yes, as @hoangtan96dl also mentioned you should either set num_sanity_val_steps or turn it off. you can find more about it in this page.

@kingjames1155
Copy link

num_sanity_val_steps
Thank you very sincerely for your help, which has played a great role
According to your method, I'm not stuck Sanity Checking DataLoader,but always getting stuck at alidation DataLoader,As follows, how do you solve this problem,Thank you again for your help

Epoch 0: 0%| | 0/31 [00:00<?, ?it/s]The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and now uses scale_factor directly, instead of relying on the computed output size. If you wish to restore the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details.
Trying to infer the batch_size from an ambiguous collection. The batch size we found is 8. To avoid any miscalculations, use self.log(..., batch_size=batch_size).
Epoch 0: 26%|██████████████████████████████████▎ | 8/31 [01:43<04:57, 12.93s/it, loss=0.645, v_num=6]Trying to infer the batch_size from an ambiguous collection. The batch size we found is 3. To avoid any miscalculations, use self.log(..., batch_size=batch_size).
Epoch 0: 29%|██████████████████████████████████████▌ | 9/31 [01:49<04:28, 12.20s/it, loss=0.647, v_num=6]
Validation DataLoader 0: 0%| | 0/22 [00:00<?, ?it/s]

@kingjames1155
Copy link

@kingjames1155 Yes, as @hoangtan96dl also mentioned you should either set num_sanity_val_steps or turn it off. you can find more about it in this page.

I found that the problem was Sliding Window Inference,, which caused me to get stuck in validation_step and predict_step,I haven't changed any parameters about Sliding Window Inference,Have you ever encountered such a situation?

@noushinha
Copy link

No, I didn't have such a problem. I had a problem with the number of outputs. I had 4 labels while the number of outputs was set to 2. However, it wasn't throwing a validation step error. What patch and batch size do you use?

@kingjames1155
Copy link

No, I didn't have such a problem. I had a problem with the number of outputs. I had 4 labels while the number of outputs was set to 2. However, it wasn't throwing a validation step error. What patch and batch size do you use?

I used luna16 with V100,batch size is 8, train and batch is [64,64,64],I tried to lower these parameters, but it was still stuck

@ZHANGJUN-OK
Copy link

I had the same problem,How to avoid this problem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants