Question regarding oversampling #9

gurunathparasaram · 2019-06-07T12:32:31Z

In the Low-resource paper, you have mentioned that NUCLE was over-sampled 10 times for domain-adaptation for CoNLL-14 dataset.
I tried benchmarking the pre-trained models provided in this repo on the WI+LOCNESS test-set.
Single model gave an F-score of 34.15 whereas the ensemble of 4 models+reranking gave an F-score of 53.27. The ensemble gives fewer false positives than the single model leading to higher precision.
Metrics of single model on WI+LOCNESS test-set
Metrics of ensemble on WI+LOCNESS test-set
Does oversampling on NUCLE data lead to a decrease in precision for the single model from 69-70 on CONLL-14 test set to 31.3 on WI+LOCNESS test-set?

Thanks!

snukky · 2019-06-11T15:30:00Z

I don't think oversampling would downgrade the scores (but I haven't run these models on the BEA datasets yet). Such a low precision and high recall may suggest that there is something wrong with pre/postprocessing. How did you pre/postprocess the data? CoNLL uses NLTK and BEA uses Spacy for tokenization. Maybe they differ too much. Did you take a look at corrections made by the system?

Another thing might be that the weight for LM is too high. It was grid-searched on CoNLL 2013.

We don't use re-ranking in this system. We only ensemble with a language model.

gurunathparasaram · 2019-06-13T10:35:00Z

I performed spell-correction using Jamspell on the BEA source sentences before giving them to the models. Will take a look into the system outputs soon and also try decreasing the weightage for LM. Sorry for the confusion, should have been ensemble+LM instead of reranking in my previous comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question regarding oversampling #9

Question regarding oversampling #9

gurunathparasaram commented Jun 7, 2019 •

edited

Loading

snukky commented Jun 11, 2019

gurunathparasaram commented Jun 13, 2019 •

edited

Loading

Question regarding oversampling #9

Question regarding oversampling #9

Comments

gurunathparasaram commented Jun 7, 2019 • edited Loading

snukky commented Jun 11, 2019

gurunathparasaram commented Jun 13, 2019 • edited Loading

gurunathparasaram commented Jun 7, 2019 •

edited

Loading

gurunathparasaram commented Jun 13, 2019 •

edited

Loading