Loss on Cifar10 with ResNet18 goes to NaN #1

Heimine · 2021-01-30T00:12:03Z

Hi,

I'm trying to apply your method on Cifar-10 with ResNet18 (Without BatchNorm), and I used the parameters listed for Imagenet. But the training loss always goes to NaN after a few iterations. I tried to change the parameters a bit, but the problem seems unsolved... So I'm wondering could you please share the parameters you used for this setting?

Thanks in advance!

huangleiBuaa · 2021-04-12T07:20:06Z

@Heimine
Sorry for the late reply,I noticed this massage just now. In my Cifar-10 experiments, I use the ResNet20 (see the ResNet paper for the Cifar-10 experiments), which has significantly small parameters than the ResNet18 (for ImageNet-18). To relieve your NaN problems, I sugguests that you should use the smaller scale (Which is set to sqrt(2) for network without residual connects by defaut) for the ResNet without BN, for example (1) 0.8 or 0.6 (empiricially); (2) a more practical method is to forward a mini-batch data to estimate the scale such that the layer-activation (for each layer) has a unit-variance, for the ResNet without BN.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loss on Cifar10 with ResNet18 goes to NaN #1

Loss on Cifar10 with ResNet18 goes to NaN #1

Heimine commented Jan 30, 2021

huangleiBuaa commented Apr 12, 2021

Loss on Cifar10 with ResNet18 goes to NaN #1

Loss on Cifar10 with ResNet18 goes to NaN #1

Comments

Heimine commented Jan 30, 2021

huangleiBuaa commented Apr 12, 2021