Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loss on Cifar10 with ResNet18 goes to NaN #1

Open
Heimine opened this issue Jan 30, 2021 · 1 comment
Open

Loss on Cifar10 with ResNet18 goes to NaN #1

Heimine opened this issue Jan 30, 2021 · 1 comment

Comments

@Heimine
Copy link

Heimine commented Jan 30, 2021

Hi,

I'm trying to apply your method on Cifar-10 with ResNet18 (Without BatchNorm), and I used the parameters listed for Imagenet. But the training loss always goes to NaN after a few iterations. I tried to change the parameters a bit, but the problem seems unsolved... So I'm wondering could you please share the parameters you used for this setting?

Thanks in advance!

@huangleiBuaa
Copy link
Owner

@Heimine
Sorry for the late reply,I noticed this massage just now. In my Cifar-10 experiments, I use the ResNet20 (see the ResNet paper for the Cifar-10 experiments), which has significantly small parameters than the ResNet18 (for ImageNet-18). To relieve your NaN problems, I sugguests that you should use the smaller scale (Which is set to sqrt(2) for network without residual connects by defaut) for the ResNet without BN, for example (1) 0.8 or 0.6 (empiricially); (2) a more practical method is to forward a mini-batch data to estimate the scale such that the layer-activation (for each layer) has a unit-variance, for the ResNet without BN.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants