Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why use learning rate 30~50 #18

Open
IgorSusmelj opened this issue Nov 26, 2019 · 3 comments
Open

Why use learning rate 30~50 #18

IgorSusmelj opened this issue Nov 26, 2019 · 3 comments

Comments

@IgorSusmelj
Copy link

I saw your note and it seems rather unusual to use such a large learning rate:

Note: When training linear classifiers on top of ResNets, it's important to use large learning rate, e.g., 30~50.

Is there something I'm missing? I can't imagine how you get stable gradient descent with such high learning rates.

@HobbitLong
Copy link
Owner

Hi, @IgorSusmelj ,

Good catch! I also could not imagine it back then.

I think it's because the scale of features leaned by contrastive methods is very different. If you are not comfortable with large learning rate. Here are two ways to fix it: (1) add a non-parametric BN (e.g., nn.BatchNorm1d(in_channel, affine=False)) right before the linear classification layer to normalize the feature, then you will have a normal learning rate. (2) use Adam optimizer.

@IgorSusmelj
Copy link
Author

Ok, thanks for the quick reply. That would indeed be a another way to deal with it.

However, I would further investigate the distribution of weights and features to figure out what exactly happened. This might even improve the stability of the model. I can look into it but won't have much time in the coming weeks.

@HobbitLong
Copy link
Owner

@IgorSusmelj , it would be great to see the distribution of features and weights and understand why! Perhaps It is a good research problem, which I also wonder a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants