Awesome-Moroccan-Arabic-nlp

A list of Natural Language Processing resources for Moroccan Arabic (Darija)

Modeling, Simulation and Data Analysis (MSDA) Datasets: Contains a dataset of 50k tweets labeled for sentiment analysis, topic detection and dialect detection as it contains tweets from 5 countries including Morocco.
Darija Open Dataset (DODA): An open-source project for building a dataset of Darija-English vocabulary.
DVOICE: Darija audio dataset, contains audio files and their corresponding text.
Darija Wikipedia articles
Moroccan News and Comments from Hespress
Moroccan Sentiment Analysis corpus
ElecMorocco2016: A sentiment analysis dataset of Arabic facebook comments about the Moroccan elections of 2016.
Goud-sum: A text summarization dataset of 158k examples.
Arabic POS dialect: Dialectal Arabic POS tagging dataset that contains sets of 350 manually segmented and POS tagged tweets for each of 4 dialects: Egyptian, Levantine, Gulf, and Maghrebi.

Models

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
README.md		README.md