BERT trained on custom corpus #15

anidiatm41 · 2020-10-10T05:37:59Z

Hi M. H. Kwon,
Your tokenization script is really helpful.

I trained a bert model with custom corpus using Google's Scripts like create_pretraining_data.py, run_pretraining.py ,extract_features.py etc..as a result I got vocab file, .tfrecord file, .jason file and check point files.

Now how to use those file for the below tasks:

to predict a missing word in a given sentence?
for next sentence prediction
Q and A model

Need your help.

kwonmha · 2020-10-13T01:59:39Z

Hi, anidiatm41,
Thank you.

For 3. Q and A model,
Visit official bert github.
There are instructions about how to do tasks like QA(SQuAD).

Predicting missing words and next sentence prediction are usually used for training.
If you want to predict missing words for practical purpose, you need to make your own code.
You can refer to evaluation part of run_pretraining.py.
It's almost same.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BERT trained on custom corpus #15

BERT trained on custom corpus #15

anidiatm41 commented Oct 10, 2020

kwonmha commented Oct 13, 2020

BERT trained on custom corpus #15

BERT trained on custom corpus #15

Comments

anidiatm41 commented Oct 10, 2020

kwonmha commented Oct 13, 2020