EACL2021-DravidianTask-Bitions and SN Computer Science Journal

This repo contains implementation for the paper "Bitions@DravidianLangTech-EACL2021: Ensemble of Multilingual Language Models with Pseudo Labeling for Offence Detection in Dravidian Languages" presented at EACL-2021, and the paper "Offence Detection in Dravidian Languages Using Code-Mixing Index-Based Focal Loss" accepted to SN Computer Science Journal.

In this paper, we propose use of ensembled multilingual models (transformers) with focal loss and inverse class weights. Further, we make use of pseudo labelling to improve our results. A novel loss based on the focal loss is proposed to cater to code-mixing in the data. A cosine normalization classifier is used to cater to few-shot classes as well.

To replicate the EACL 2021 results, use the output files from DistilmBERT/DistilmBERT(class_weightiing+pseudo_labelling).ipynb and ULMFiT/{lang}_PseudoLabelling_AggressionNLP.ipynb to feed to Soft Ensemble/Soft_Ensembling.ipynb

To replicate the SN Journal results, use the output files from DistilmBERT(cmi_loss+cosnorm+pseudo_labelling).ipynb and IndicBert(cmi_loss+cosnorm+pseudo_labelling).ipynb to feed to Soft Ensemble/Soft_Ensembling.ipynb

If you find this repo useful, please cite our papers :

@inproceedings{tula2021bitions,
   title={Bitions@DravidianLangTech-EACL2021: Ensemble of Multilingual Language Models with Pseudo Labeling for offence Detection in Dravidian Languages},
    author = {Tula, Debapriya  and
      Potluri, Prathyush  and
      MS, Shreyas and
      Doddapaneni, Sumanth and
      Sahu, Pranjal and
      Sukumaran, Rohan   and
      Patwa, Parth},      
    booktitle = {Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages},
    month = {April},
    year = {2021},
    publisher = {Association for Computational Linguistics}
    url = {https://aclanthology.org/2021.dravidianlangtech-1.42/}
}

@Article{Tula2022,
author = {Tula, Debapriya
and Shreyas, M. S.
and Reddy, Viswanatha
and Sahu, Pranjal
and Doddapaneni, Sumanth
and Potluri, Prathyush
and Sukumaran, Rohan
and Patwa, Parth},
title = {Offence Detection in Dravidian Languages Using Code-Mixing Index-Based Focal Loss},
journal = {SN Computer Science},
year = {2022},
month = {Jun},
day = {13},
volume = {3},
number = {5},
pages = {330},
issn = {2661-8907},
doi = {10.1007/s42979-022-01190-1},
url = {https://doi.org/10.1007/s42979-022-01190-1}
}

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
DistilmBERT		DistilmBERT
IndicBERT		IndicBERT
Soft Ensemble		Soft Ensemble
ULMFiT		ULMFiT
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EACL2021-DravidianTask-Bitions and SN Computer Science Journal

About

Releases

Packages

Contributors 3

Languages

License

Debapriya-Tula/EACL2021-DravidianTask-Bitions

Folders and files

Latest commit

History

Repository files navigation

EACL2021-DravidianTask-Bitions and SN Computer Science Journal

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages