You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks @vgaraujov for providing the code. I've been playing with the code for a month and found some tricks to better train the CPC model. Let me share them here for all who are interested.
Add a dropout layer after the GRU layer
Overfitting occurs when training CPC with the default setup. I add a dropout layer right after GRU to fix the issue. Also, we need to enlarge GRU hidden size to increase the generality of the model. I set the dropout rate to be 0.3 and rise the GRU hidden size from 2400 to 4000. It might not be the best combination but it works.
Increase max sentence length and concatenate some sentences in the BookCorpus dataset
The BookCorpus dataset contains quite a lot of short sentences. I concatenate adjacent sentences and increase the limitation of sentence length so that each epoch contains fewer iterations. It greatly increases the training speed.
Before:
After:
Now I have got a similar training result as is shared in here.
I tested the classification on Movie Review using a checkpoint and got an accuracy as 71%, which is not as good as what is stated in the paper (76.9%). I will spend more time in hyperparameter optimization.
The text was updated successfully, but these errors were encountered:
Hey @YuffieHuang, thanks for sharing. I plan to revisit this code and model at the end of this month, so any insight is welcome.
One thing I want to test, and maybe you should try, is the normalization of GRU resulting representations. For instance: normalized_output = F.normalize(output, dim=1). Also, you can use a temperature parameter in the InfoNCE loss. See equation 1 of this paper.
If you agree, we could keep talking about improving the model to update it then.
Thanks @vgaraujov for providing the code. I've been playing with the code for a month and found some tricks to better train the CPC model. Let me share them here for all who are interested.
Add a dropout layer after the GRU layer
Overfitting occurs when training CPC with the default setup. I add a dropout layer right after GRU to fix the issue. Also, we need to enlarge GRU hidden size to increase the generality of the model. I set the dropout rate to be 0.3 and rise the GRU hidden size from 2400 to 4000. It might not be the best combination but it works.
Increase max sentence length and concatenate some sentences in the BookCorpus dataset
The BookCorpus dataset contains quite a lot of short sentences. I concatenate adjacent sentences and increase the limitation of sentence length so that each epoch contains fewer iterations. It greatly increases the training speed.
Before:
After:
Now I have got a similar training result as is shared in here.
I tested the classification on Movie Review using a checkpoint and got an accuracy as 71%, which is not as good as what is stated in the paper (76.9%). I will spend more time in hyperparameter optimization.
The text was updated successfully, but these errors were encountered: