Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update char_cnn and fasttext #66

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open

Conversation

gmichalo
Copy link

@gmichalo gmichalo commented Jul 4, 2020

By examining the great models that you have created, I saw that the code in char_cnn (Character-level Convolutional Networks for TextClassification) is not exactly the same as the model that the authors described in their paper.

In particular, in this pull request I added:

a)For the char_cnn model:
1)I changed the last (sixth) convolutional networks and the first linear layer of the char_cnn to the format that the author used.
2) I wrote the equation (line 33,34) for calculating the input size of the first linear layer based on the maximum number of characters of a sentence (this number can be added to the config file but I did not want to change more files so I just I added it as a constant in the model.py)

b)For the FastText model:

I change the F.avg_pool2d(x, (x.shape[1], 1)).squeeze(1) to F.avg_pool1d(x, x.shape[1]).squeeze(1) which are equivalent but I believe that by using F.avg_pool1d will make the code more understandable as the code right now uses F.avg_pool2d to basically do avg_pool1d.

@daemon
Copy link
Member

daemon commented Aug 31, 2020

Thanks for the PR. Have you run any benchmarks?

@gmichalo
Copy link
Author

gmichalo commented Sep 1, 2020

For the fasttext model, I tested the model on Reuters and the AAPD dataset and the new version get the exact same results as the previous version as it was expected.

For the char_cnn model the main difference between the new version and the old version is the input dimension of the first linear layer which in the old version was fixed to 256 but in the new version is calculated (as it was mentioned in the paper) and in the new version of the model is 8448 for Reuters (lines 33-36 on the code and last paragraph before section 2.4 in the paper). Due to this change when I tried to run on Reuters I got 0 f1 on dev due to the limited data. However, if we follow the paper I believe this is the correct way to create the char_cnn

@daemon
Copy link
Member

daemon commented Sep 4, 2020

Okay, it's probably a good idea to note in some readme that the correct implementation of char_cnn produces an F1 of 0 on Reuters, so people don't waste their time running it.

daemon
daemon previously approved these changes Sep 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants