Skip to content

A collaborative catalog of resources for Indian language NLP

Notifications You must be signed in to change notification settings

aoxolotl/indicnlp_catalog

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 

Repository files navigation

A Catalog of resources for Indian language NLP

Please suggest any other resources you may be aware of. Raise an issue to add more resources to the catalog. Put the proposed entry in the following format:

[Wikipedia Dumps](https://dumps.wikimedia.org/)

Add a small, informative description of the dataset and provide links to any paper/article/site documenting the resource.

Major Indic Language NLP Repositories

Text Corpora

Unicode Standard

Monolingual Corpus

Lexical Resources

NER Corpora

Parallel Translation Corpus

Parallel Transliteration Corpus

Textual Entailment

  • XNLI corpus: Hindi and Urdu test sets and machine translated training sets (from English MultiNLI).

Sentiment Analysis

POS Tagged corpus

Chunk Corpus

Dependency Parse Corpus

Dialog

Speech Corpora

OCR Corpora

Multimodal Corpora

Models

Word Embeddings

Sentence Embeddings

Multilingual Word Embeddings

SMT Models

Libraries

  • Indic NLP Library: Python Library for various Indian language NLP tasks like tokenization, sentece splitting, normalization, script conversion, transliteration, etc
  • pyiwn: Python Interface to IndoWordNet
  • [Indic-OCR] (https://indic-ocr.github.io/) : OCR for Indic Scripts

About

A collaborative catalog of resources for Indian language NLP

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published