Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training set sometimes required for parsing #1

Open
tdozat opened this issue Jun 18, 2017 · 0 comments
Open

Training set sometimes required for parsing #1

tdozat opened this issue Jun 18, 2017 · 0 comments
Assignees
Labels

Comments

@tdozat
Copy link
Owner

tdozat commented Jun 18, 2017

The model saves a list of all the tokens in the vocabulary in save_dir/words.txt. If there's a case mismatch between the character model and the token model--that is, if you want the character model to be cased and the word vocabulary to be caseless--it reads through the training set to build up the character vocabulary. This is a problem when you only want to parse and the training set isn't available.

Solution: modify the code to save cased and caseless vocabularies in save_dir/words-cased.txt and save_dir/words-caseless.txt, and at parse time load whichever one is dictated by the cased configuration setting.

@tdozat tdozat added the bug label Jun 18, 2017
@tdozat tdozat self-assigned this Jun 18, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant