You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to apply your method on Cifar-10 with ResNet18 (Without BatchNorm), and I used the parameters listed for Imagenet. But the training loss always goes to NaN after a few iterations. I tried to change the parameters a bit, but the problem seems unsolved... So I'm wondering could you please share the parameters you used for this setting?
Thanks in advance!
The text was updated successfully, but these errors were encountered:
@Heimine
Sorry for the late reply,I noticed this massage just now. In my Cifar-10 experiments, I use the ResNet20 (see the ResNet paper for the Cifar-10 experiments), which has significantly small parameters than the ResNet18 (for ImageNet-18). To relieve your NaN problems, I sugguests that you should use the smaller scale (Which is set to sqrt(2) for network without residual connects by defaut) for the ResNet without BN, for example (1) 0.8 or 0.6 (empiricially); (2) a more practical method is to forward a mini-batch data to estimate the scale such that the layer-activation (for each layer) has a unit-variance, for the ResNet without BN.
Hi,
I'm trying to apply your method on Cifar-10 with ResNet18 (Without BatchNorm), and I used the parameters listed for Imagenet. But the training loss always goes to NaN after a few iterations. I tried to change the parameters a bit, but the problem seems unsolved... So I'm wondering could you please share the parameters you used for this setting?
Thanks in advance!
The text was updated successfully, but these errors were encountered: