You can access the competition via the following link : Kaggle
We are the team named meow
.
The report is available in the file NLP_report.pdf
following this link
- Rayane Bouaita
- Erwan David
- Pierre El Anati
- Guillaume Faynot
- Gabriel Trier
Text classification with sparsely represented training data is not a trivial task. We are going to present our solution using large language models (LLMs) to classify texts from almost 390 different languages. After studying the data provided to us, we decided to use different approaches using machine learning models (XLM-Roberta & BERT). Our final model achieved an accuracy of 88.0%, placing our team in the top 10 of the ranking.
To install the required packages, you can run the following command:
pip install -r requirements.txt
To train the model, you can run the following command from the root directory:
python models/roberta.py
You can also use the model.ipynb notebook to train the model.