Skip to content

Dive into the world of Arabic NLP with this extensive collection of resources, tools, datasets, and best practices tailored for the Arabic language.

Notifications You must be signed in to change notification settings

Curated-Awesome-Lists/awesome-arabic-nlp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

2 Commits
ย 
ย 

Repository files navigation

Awesome arabic-nlp

Welcome to this meticulously curated list of resources dedicated to Natural Language Processing (NLP) in the Arabic language! Arabic is known for its complexity and richness, making advancements in NLP for this language a challenging yet rewarding endeavor.

In this repository, youโ€™ll find a wide array of resources including academic papers, tools, datasets, libraries, and best practices, all specifically tailored to Arabic NLP. Whether you are a researcher, developer, or someone simply interested in applying NLP techniques to Arabic text, this list is an invaluable resource.

The resources included cover a broad spectrum of topics ranging from syntactic analysis, machine translation, named entity recognition, text classification, and much more, all while addressing the unique challenges and characteristics of the Arabic language.

Feel free to contribute to this awesome list by submitting a pull request or suggesting new resources. Together, we can build a comprehensive and up-to-date repository that benefits the entire community working on Arabic NLP. Enjoy your learning journey!

Table of Contents

GitHub projects

  • arabert : Pre-trained Transformers for Arabic Language Understanding and Generation (Arabic BERT, Arabic GPT2, Arabic ELECTRA) ๐ŸŒŸ
  • ARBML : Implementation of many Arabic NLP and CV projects. Providing real-time experience using many interfaces like web, command line, and notebooks. ๐Ÿ’ป
  • Shakkala : Deep learning for AR text Vocalization - ุงู„ุชุดูƒูŠู„ ุงู„ุงู„ูŠ ู„ู„ู†ุตูˆุต ุงู„ุนุฑุจูŠุฉ ๐Ÿ“ˆ
  • arabic-stop-words : Largest list of Arabic stop words on GitHub. ุฃูƒุจุฑ ู‚ุงุฆู…ุฉ ู„ู…ุณุชุจุนุฏุงุช ุงู„ูู‡ุฑุณุฉ ุงู„ุนุฑุจูŠุฉ ุนู„ู‰ ุฌูŠุช ู‡ุงุจ ๐Ÿ“•
  • Hadith-Data-Sets : All Hadith With Tashkil and Without Tashkeel from the Nine Books that are 62,169 Hadith. ๐Ÿ“–
  • ar-php : Set of functionalities enable Arabic website developers to serve professional search, present, and process Arabic content in PHP ๐ŸŒ
  • Maha : Maha is a text processing library specially developed to deal with Arabic text. ๐Ÿ“œ
  • tajmeeaton : ุชุฌู…ูŠุนุฉ ู…ู† ุงู„ู…ุดุงุฑูŠุนุŒ ูˆุฎุตูˆุตุง ู…ูุชูˆุญุฉ ุงู„ู…ุตุฏุฑุŒ ู„ู„ู†ู‡ูˆุถ ุจุงู„ู„ุบุฉ ุงู„ุนุฑุจูŠุฉ ูˆุงู„ุฃู…ุฉ. ๐Ÿ‘จโ€๐Ÿ’ป ๐Ÿ‘จโ€๐Ÿ”ฌ๐Ÿ‘จโ€๐Ÿซ๐Ÿง•
  • SOQAL : Arabic Open Domain Question Answering System using Neural Reading Comprehension โ“
  • Arabic-BERT : Arabic edition of BERT pretrained language models
  • ARBML/masader ๐ŸŒŸ๐ŸŒŸ๐ŸŒŸ The largest public catalogue for Arabic NLP and speech datasets. Includes +500 datasets annotated with more than 25 attributes.
  • Qutuf/Qutuf ๐ŸŒŸ๐ŸŒŸ๐ŸŒŸ Qutuf (ู‚ูุทููˆู’ู): An Arabic Morphological analyzer and Part-Of-Speech tagger as an Expert System.
  • motazsaad/process-arabic-text ๐ŸŒŸ๐ŸŒŸ๐ŸŒŸ Pre-process Arabic text (remove diacritics, punctuations, and repeating characters).
  • UBC-NLP/marbert ๐ŸŒŸ๐ŸŒŸ๐ŸŒŸ UBC ARBERT and MARBERT Deep Bidirectional Transformers for Arabic.
  • MagedSaeed/farasapy ๐ŸŒŸ๐ŸŒŸ๐ŸŒŸ A Python implementation of the Farasa toolkit.
  • saidziani/Arabic-News-Article-Classification ๐ŸŒŸ๐ŸŒŸ๐ŸŒŸ Automatic categorization of documents based on their content using Supervised Machine Learning.
  • iamaziz/ar-embeddings ๐ŸŒŸ๐ŸŒŸ๐ŸŒŸ Sentiment Analysis for Arabic Text (tweets, reviews, and standard Arabic) using word2vec.
  • mohabmes/Arabycia ๐ŸŒŸ๐ŸŒŸ Arabic NLP tool for Text Search, POS tagging, Translation, auto-diacritization, and more.
  • adhaamehab/textblob-ar ๐ŸŒŸ๐ŸŒŸ Arabic support for textblob library.
  • motazsaad/arabic-sentiment-analysis ๐ŸŒŸ๐ŸŒŸ Sentiment Analysis in Arabic tweets.

Articles & Blogs

Online Courses

Books

Research Papers

Videos

Tools & Software

  • KALIMAT Multipurpose Arabic Corpus: A corpus that could be of help for researchers working on Arabic NLP. It consists of 20,291 Arabic articles collected from the Omani newspaper Alwatan.
  • EASC (Essex Arabic Summaries Corpus): Arabic natural language resources containing 153 Arabic articles and 765 human-generated extractive summaries of those articles.
  • Khawas: An Arabic Corpora Processing Tool for analyzing Arabic corpora.
  • NLTK: A leading platform for building Python programs to work with human language data, including NLP libraries and an active discussion forum.
  • Stanford CoreNLP: A Java suite of core NLP tools, providing linguistic annotations such as tokenization, parts of speech, named entities, sentiment analysis, and more.
  • Arabic Corpus: A collection of more than 460 Arab books that can be used for language engineering applications.
  • Osman Arabic Text Readability: An open-source tool for measuring Arabic text readability, allowing users to calculate readability for Arabic text with or without diacritics.
  • Alkhalil Morpho Sys: A morphosyntactic parser for Arabic words that can process both vocalized and non-vocalized texts.
  • Best Natural Language Understanding (NLU) Software in 2023 | G2: Natural language understanding (NLU), a form of natural language processing (NLP), allows users to better understand text through machine learning. This website provides real-time, up-to-date product reviews from verified users to help you choose the right NLU software.
  • spaCy download | SourceForge.net: spaCy is an industrial-strength NLP library built on the latest research for advanced NLP in Python and Cython. It is designed for real-world applications and can be used for building products and gaining insights.
  • Natural Language Toolkit download | SourceForge.net: The Natural Language Toolkit (NLTK) is a library for NLP. It provides tools and resources for tasks such as tokenization, stemming, lemmatization, parsing, semantic reasoning, and more.

Conferences & Events

Slides & Presentations

Podcasts

  • NLP Highlights: In this podcast, researchers from the AllenNLP team at Allen Institute for AI discuss their work in various areas of natural language processing.
  • NLP MasterCLASS: Hosted by NLP Master Trainers Tina Taylor and Steve Crabb, this podcast explores the amazing ways to use NLP for personal and professional changes.
  • NLP Talks with Laura Evans: Hosted by Laura Evans, an International Trainer of NLP, this podcast features insightful interviews with people who know how NLP changes lives and offers tips and strategies for success.
  • The Brain Language Podcast: This podcast introduces NLP concepts that enhance personal and business life, providing golden nuggets of NLP knowledge.
  • WARA Media & Language Podcast: In this podcast, you will hear the latest research within AI in the field of Media, Language, and Gaming, with insights from industry leaders and tech companies.
  • Microsoft Research Podcast: This podcast brings you conversations with researchers at Microsoft, discussing cutting-edge advancements in technology.
  • NLP Talks: A podcast in Greek about Neuro-Linguistic Programming (NLP) by Athens NLP Studies, aiming to help individuals create positive and lasting changes in their lives.
  • Women in AI: A biweekly podcast featuring leading female minds in AI, Deep Learning, and Machine Learning, discussing cutting-edge work, technological advancements, and the impact of AI for social good and diversity in the workplace.

This initial version of the Awesome List was generated with the help of the Awesome List Generator. It's an open-source Python package that uses the power of GPT models to automatically curate and generate starting points for resource lists related to a specific topic.

About

Dive into the world of Arabic NLP with this extensive collection of resources, tools, datasets, and best practices tailored for the Arabic language.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published