Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Estimating normalization factor Z #23

Open
KaimingHe opened this issue Dec 13, 2019 · 7 comments
Open

Estimating normalization factor Z #23

KaimingHe opened this issue Dec 13, 2019 · 7 comments

Comments

@KaimingHe
Copy link

self.params[0] = out.mean() * self.outputSize

This one-time estimation is problematic, especially if the dictionary is not random noise. Computing Z as a moving average of this would give a more reasonable result.

@HobbitLong
Copy link
Owner

HobbitLong commented Dec 14, 2019

Hi,

Thanks for your comment! Which specific result are you referring to? Or are you suggesting that an EMA of Z could potentially improve all InsDis, MoCo and CMC with NCE loss?

@KaimingHe
Copy link
Author

You reported a low number of MoCo with the NCE loss. This is because your implementation of NCE is problematic and correcting it should gives a more reasonable MoCo w/ NCE number.

@HobbitLong
Copy link
Owner

HobbitLong commented Dec 14, 2019

@KaimingHe , yeah, probably the current NCE implementation is less suitable for MoCo, and I am happy to rectify it. What is the best momentum multiplier for updating Z you would like to suggest?

@KaimingHe
Copy link
Author

0.99 for updating Z works well. In ImageNet-1K, MoCo with NCE is ~2% worse than MoCo with InfoNCE, similar to the case of the memory bank counterpart.

@HobbitLong
Copy link
Owner

Thanks for your input! I have temporarily removed the NCE numbers in README to avoid any confusion, and will keep them vacant until I get a chance to look into it.

@kibok90
Copy link

kibok90 commented Jan 25, 2020

Is it necessary to fix or EMA-update Z? Maybe it is unstable if we always compute Z = out.mean() * self.outputSize every time? Also, I couldn't find any statement about this approximation of Z in the paper, or maybe I missed it. Could you designate a reference point of this?

@kibok90
Copy link

kibok90 commented Feb 11, 2020

Later I found the statement in InsDis: "Empirically, we find the approximation derived from initial batches sufficient to work well in practice."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants