Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Projects using this and evaluation results #7

Open
NebelAI opened this issue Nov 9, 2019 · 2 comments
Open

Projects using this and evaluation results #7

NebelAI opened this issue Nov 9, 2019 · 2 comments

Comments

@NebelAI
Copy link

NebelAI commented Nov 9, 2019

Hi @kwonmha,

your project is exactly what came into my mind when dealing with Bert vocab creation. Currently I'm doing some vocab optimizations for my Bert project, too.

Can you say something about improvements/degradations related to your vocab changes? I'm really curious if this approach delivers better results.

@kwonmha
Copy link
Owner

kwonmha commented Nov 11, 2019

Well, I haven't trained BERT for many times with different vocab types.
This is the only vocab I tried that has the same format with official google research's BERT.
So there's nothing to compare.

I have plans to utilize pos tag info with subwords as I'm doing research on Korean.
But I'm not sure it will work on English or other alphabet-based languages.

@YuBeomGon
Copy link

Can I know the input file pattern for korean text?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants