Getting stuck at validation step #4

sindhura234 · 2021-11-18T18:27:07Z

Epoch 0: 0%| | 0/10 [00:00<?, ?it/s]Trying to infer the batch_size from an ambiguous collection. The batch size we found is 1. To avoid any miscalculations, use self.log(..., batch_size=batch_size). The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and now uses scale_factor directly, instead of relying on the computed output size. If you wish to restore the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details.

Epoch 0: 70%|█████████ | 7/10 [00:14<00:06, 2.12s/it, loss=0.724, v_num=11]
Validating: 0it [00:00, ?it/s] Validating: 0%| | 0/3 [00:00<?, ?it/s]

I have been trying for larger dataset unlike the error here. But im always getting stuck at validation stage.
I tried with hippocampus dataset, the results are fine. But with my custom data, Im facing this problem. What could be the reason?

The text was updated successfully, but these errors were encountered:

hoangtan96dl · 2021-11-19T03:09:45Z

Can you set the num_sanity_val_steps arg of Trainer class to -1 and run it again? It will run the whole validation before starting the training.

I have never seen this error Trying to infer the batch_size from an ambiguous collection. The batch size we found is 1. To avoid any miscalculations, use self.log(..., batch_size=batch_size). then maybe there is a problem with the logging step and your data. I wrote it in validation_step and validation_epoch_end of LightningModule, you can try to comment out it first.

noushinha · 2021-11-19T09:15:59Z

@cndu234
I had this error in the beginning and I solved it.
Trying to recall what did I set in the trainer to get things to work.
I will check and write back to you.

kingjames1155 · 2022-05-11T11:47:45Z

Do you remember how to solve this problem? I have the same problem as you. Thank you very much

noushinha · 2022-05-11T14:18:35Z

@kingjames1155 Yes, as @hoangtan96dl also mentioned you should either set num_sanity_val_steps or turn it off. you can find more about it in this page.

kingjames1155 · 2022-05-12T03:16:45Z

num_sanity_val_steps
Thank you very sincerely for your help, which has played a great role
According to your method, I'm not stuck Sanity Checking DataLoader，but always getting stuck at alidation DataLoader,As follows, how do you solve this problem，Thank you again for your help

Epoch 0: 0%| | 0/31 [00:00<?, ?it/s]The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and now uses scale_factor directly, instead of relying on the computed output size. If you wish to restore the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details.
Trying to infer the batch_size from an ambiguous collection. The batch size we found is 8. To avoid any miscalculations, use self.log(..., batch_size=batch_size).
Epoch 0: 26%|██████████████████████████████████▎ | 8/31 [01:43<04:57, 12.93s/it, loss=0.645, v_num=6]Trying to infer the batch_size from an ambiguous collection. The batch size we found is 3. To avoid any miscalculations, use self.log(..., batch_size=batch_size).
Epoch 0: 29%|██████████████████████████████████████▌ | 9/31 [01:49<04:28, 12.20s/it, loss=0.647, v_num=6]
Validation DataLoader 0: 0%| | 0/22 [00:00<?, ?it/s]

kingjames1155 · 2022-05-16T02:00:33Z

@kingjames1155 Yes, as @hoangtan96dl also mentioned you should either set num_sanity_val_steps or turn it off. you can find more about it in this page.

I found that the problem was Sliding Window Inference,, which caused me to get stuck in validation_step and predict_step，I haven't changed any parameters about Sliding Window Inference，Have you ever encountered such a situation？

noushinha · 2022-05-16T07:11:21Z

No, I didn't have such a problem. I had a problem with the number of outputs. I had 4 labels while the number of outputs was set to 2. However, it wasn't throwing a validation step error. What patch and batch size do you use?

kingjames1155 · 2022-05-16T08:32:19Z

No, I didn't have such a problem. I had a problem with the number of outputs. I had 4 labels while the number of outputs was set to 2. However, it wasn't throwing a validation step error. What patch and batch size do you use?

I used luna16 with V100,batch size is 8, train and batch is [64,64,64],I tried to lower these parameters, but it was still stuck

ZHANGJUN-OK · 2022-12-11T06:52:47Z

I had the same problem，How to avoid this problem？

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Getting stuck at validation step #4

Getting stuck at validation step #4

sindhura234 commented Nov 18, 2021

hoangtan96dl commented Nov 19, 2021

noushinha commented Nov 19, 2021 •

edited

Loading

kingjames1155 commented May 11, 2022

noushinha commented May 11, 2022

kingjames1155 commented May 12, 2022

kingjames1155 commented May 16, 2022

noushinha commented May 16, 2022

kingjames1155 commented May 16, 2022

ZHANGJUN-OK commented Dec 11, 2022

Getting stuck at validation step #4

Getting stuck at validation step #4

Comments

sindhura234 commented Nov 18, 2021

hoangtan96dl commented Nov 19, 2021

noushinha commented Nov 19, 2021 • edited Loading

kingjames1155 commented May 11, 2022

noushinha commented May 11, 2022

kingjames1155 commented May 12, 2022

kingjames1155 commented May 16, 2022

noushinha commented May 16, 2022

kingjames1155 commented May 16, 2022

ZHANGJUN-OK commented Dec 11, 2022

noushinha commented Nov 19, 2021 •

edited

Loading