This repository serves as a curated collection of resources and repositories for Natural Language Processing (NLP) tasks specific to Arabic and Darija, the Moroccan Arabic dialect. These resources are aimed at students and researchers interested in Arabic and Darija processing and analysis. You can find Arabic and Darija resources in various platforms including Kaggle, Mendeley, Huggingface, as well as the following:
- Arabic and Darija NLP Models
- Arabic and Darija NLP Datasets
- Arabic and Darija Linguistic Resources
- Arabic and Darija NLP Frameworks
- Arabic and Darija NLP Evaluation Benchmarks
- Arabic and Darija NLP Books and Reference papers
- Arabic and Darija NLP Research Labs
- Arabic and Darija NLP Conferences
- Arabic and Darija Communities and Scientific Societies
- DarijaBERT Arabizi
- T5 darija summarization
- DarijaBERT Mix
- MorRoBERTa
- MorrBERT
- MARBERT
- AraBERT summarization goud
- DarijaBERT
- Aragpt2 base
- Bert base arabertv2
- Bert-base-arabic-camelbert-da-sentiment
- Magbert-ner
- Goud.ma news website
- POS tagged tweets in dialects of Arabic
- Moroccan Darija Wikipedia dataset
- Darija Stories Dataset
- Moroccan news articles in modern Arabic
- Sentiment Analysis dataset for under-represented African languages
- Darija Dataset
- Darija Open Dataset is an open-source collaborative project for darija ⇆ English translation
- MSDA open Datasets um6p
- Moroccan Arabic Corpus (MAC) is a large Moroccan corpus for sentiment analysis
- ADI17: A Fine-Grained Arabic Dialect Identification Dataset
- DART, includes Maghrebi, Egyptian, Levantine, Gulf, and Iraqi Arabic
- ARABIC NLP DATA CATALOGUE
- MADAR
- SAFAR: SAFAR is a monolingual framework developed in accordance with software engineering requirements and dedicated to Arabic language, especially, the modern standard Arabic and Moroccan dialect.
- Farasa: Farasa is a package to deal with Arabic Language Processing.
- CAMeL Lab at New York University Abu Dhabi.: CAMeL Tools is a suite of Arabic natural language processing tools.
- Books and reference papers
- Survey Documents
- International Conference on Arabic Computational Linguistics
- International Conference on Arabic Language Processing ICALP: Next 2023 edition will take place at ENSIAS.
- WANLP: Arabic NLP workshop
- OSACT: Workshop on Open-Source Arabic Corpora and Processing Tools
- Doctoral Symposium on Arabic Language Engineering
- IWABigDAI: International Workshop on Arabic Big Data & AI
- Eval4NLP: Workshop on Evaluation and Comparison of NLP Systems
Alongside other prominent NLP conferences such as LREC, EMNLP, and ACL.
- ALELM (Arabic Language Engineering and Learning Modeling)
- Arabic Language Technologies Group at Qatar Computing Research Institute (QCRI)
- CAMeL Lab at New York University Abu Dhabi
- SinaLab for Computational Linguistics and Artificial Intelligence
- Oujda NLP Team
- Arabic Natural Language Processing Research Group (ANLP-RG)
- ARBML community: ARBML is a community of +500 researchers working on Arabic NLP research and development.
- ACL Special Interest Group on Arabic Natural Language Processing
Feel free to contribute to this collection by adding more resources and repositories related to Darija NLP. You can submit pull requests or create issues to suggest additions or modifications to the existing content.
Note: Please adhere to the guidelines provided by each resource or repository.