Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tips for training CPC #4

Open
YuffieHuang opened this issue May 4, 2022 · 2 comments
Open

Tips for training CPC #4

YuffieHuang opened this issue May 4, 2022 · 2 comments

Comments

@YuffieHuang
Copy link

YuffieHuang commented May 4, 2022

Thanks @vgaraujov for providing the code. I've been playing with the code for a month and found some tricks to better train the CPC model. Let me share them here for all who are interested.

  1. Add a dropout layer after the GRU layer
    Overfitting occurs when training CPC with the default setup. I add a dropout layer right after GRU to fix the issue. Also, we need to enlarge GRU hidden size to increase the generality of the model. I set the dropout rate to be 0.3 and rise the GRU hidden size from 2400 to 4000. It might not be the best combination but it works.

  2. Increase max sentence length and concatenate some sentences in the BookCorpus dataset
    The BookCorpus dataset contains quite a lot of short sentences. I concatenate adjacent sentences and increase the limitation of sentence length so that each epoch contains fewer iterations. It greatly increases the training speed.

Before:
image

After:

image

Now I have got a similar training result as is shared in here.
image

I tested the classification on Movie Review using a checkpoint and got an accuracy as 71%, which is not as good as what is stated in the paper (76.9%). I will spend more time in hyperparameter optimization.

@vgaraujov
Copy link
Owner

Hey @YuffieHuang, thanks for sharing. I plan to revisit this code and model at the end of this month, so any insight is welcome.

One thing I want to test, and maybe you should try, is the normalization of GRU resulting representations. For instance: normalized_output = F.normalize(output, dim=1). Also, you can use a temperature parameter in the InfoNCE loss. See equation 1 of this paper.

If you agree, we could keep talking about improving the model to update it then.

@YuffieHuang
Copy link
Author

Hi @vgaraujov. Sure, let me add the normalization first and see how it goes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants