Unimorph_Inflect: A Python NLP Library for Generating Morphological Inflection in Many Human Languages

Setup

Unimorph_inflect supports Python 3.7. We strongly recommend that you install Unimorph_inflect from source following the steps below. A PyPI release is forthcoming.

Install from source from this git repository will give you more flexibility in developing on top of unimorph_inflect and training your own models. For this option, run

git clone https://github.com/antonisa/unimorph_inflect.git
cd unimorph_inflect
python setup.py install
#pip install -e .

Getting Started

You can get started by simply following these steps in your Python interactive interpreter:

>>> import unimorph_inflect
>>> unimorph_inflect.download('eng')   # This downloads the English models, if you don't have them already
>>>
>>> from unimorph_inflect import inflect
>>> result = inflect("laugh", "V;PST", language='eng')
>>> print(result[0])
laughed

Note: inflect() returns a list of outputs, hence the "[0]") there

You don't really need to explicitly download each dataset (as shown in the second line above); the inflect() function will ask you about downloading the model for a language if it is not downloaded already.

Trained Models for unimorph_inflect

We currently provide models trained on all Unimorph data (except 1000 examples used as a development set) for some high-resource languages, trained in a monolingual setting.

You can list the available languages/models with:

>>> unimorph_inflect.supported_languages
['ady', 'ang', 'ast', 'bel', 'bul', 'cat', 'dan', ...]

The accuracy on the development sets are as follows:

Language	ISO	Supported PoS	Dev Accuracy
Adyghe	ady	N, ADJ	90.0
Ancient Greek	grc	N, ADJ	89.0
Armenian	hye	V, N, ADJ	98.9
Albanian	sqi	V, N,	69.0
Asturian	ast	V, N, ADJ	99.0
Arabic	ara	V, N, ADJ	23.0
Bashkir	bak	N, ADJ	81.0
Basque	eus	V	48.0
Belarusian	bel	V, N, ADJ	91.0
Bulgarian	bul	V, N, ADJ	99.0
Catalan	cat	V	100
Czech	ces	V, N, ADJ	94.0
Danish	dan	N, ADJ	82.0
Dutch	nld	N, ADJ	98.0
English	eng	V	97.0
Estonian	est	V, N	84.0
Faroese	fao	V, N, ADJ	95.0
Farsi	fas	V	93.0
French	fra	V	97.0
Galician	gal	V,	100
German	deu	V, N	100
Georgian	kat	V, N, ADJ	100
Greek	ell	V, N, ADJ	84.0
Hebrew	heb	V, N	90.0
Hindi	hin	V	78.0
Hungarian	hun	V, N	97.2
Irish	gle	V	85.6
Icelandic	isl	V, N	93.0
Italian	ita	V	99.2
Latvian	lav	V, N, ADJ	99.0
Lithuanian	lit	V, N, ADJ	96.0
Lower Sorbian	dsb	V, N, ADJ	94.0
Makedonian	mkd	V, N, ADJ	100
Navajo	nav	N, ADJ	90.0
North Sami	sme	V, N, ADJ	95.0
Norwegian Bokmål	nob	V, N, ADJ	77.0
Old English	ang	V, N, ADJ	84.0
Old Saxon	osx	V, N, ADJ	93.0
Polish	pol	V, N, ADJ	95.0
Portuguese	por	V, N, ADJ	100
Quechua	que	V, N, ADJ	32.0
Romanian	ron	V, N, ADJ	83.0
Russian	rus	V, N, ADJ	94.0
Sanskrit	san	N, ADJ	79.0
Serbocroatian	hbs	V, N, ADJ	92.7
Slovenian	slv	V, N, ADJ	97.0
Spanish	spa	V	100
Swahili	swc	V, N, ADJ	66.0
Swedish	swe	V, N, ADJ	96.0
Turkish	tur	V, N, ADJ	84.2
Ukranian	ukr	V, N, ADJ	97.0
Urdu	urd	V, N	71.0
Welsh	cym	V	97.0
Venetian	vec	V	98.0
Zulu	zul	V, N, ADJ	87.0

A simple call of the inflect function with your desired language should download the necessary models, but you can also download them from here.

References

If you use our models in your research, please cite our EMNLP 2019 paper along with the necessary Unimorph datasets:

@inproceedings{anastasopoulos19emnlp,
    title = {Pushing the Limits of Low-Resource Morphological Inflection},
    author = {Anastasopoulos, Antonios and Neubig, Graham},
    booktitle = {Proc. EMNLP},
    address = {Hong Kong},
    month = {November},
    year = {2019},
}

This release is not the same as CMU's SIGMORPHON 2019 Shared Task system. The system is a cleaned up version of the shared task code and the models are trained on almost all Unimorph data for each language, whereas in the competition we used the designated datasets.

Issues and Usage Q&A

Please use the GitHub Issue Tracker for bug reports, language/feature requests, and other questions.

LICENSE

Unimorph_inflect is released under the Apache License, Version 2.0. See the LICENSE file for more details.

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
__pycache__		__pycache__
examples		examples
models/latest		models/latest
src		src
utils		utils
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
_version.py		_version.py
example.py		example.py
setup.py		setup.py
unimorph_inflect.py		unimorph_inflect.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Unimorph_Inflect: A Python NLP Library for Generating Morphological Inflection in Many Human Languages

Setup

Getting Started

Trained Models for unimorph_inflect

References

Issues and Usage Q&A

LICENSE

About

Releases

Packages

Contributors 2

Languages

License

antonisa/unimorph_inflect

Folders and files

Latest commit

History

Repository files navigation

Unimorph_Inflect: A Python NLP Library for Generating Morphological Inflection in Many Human Languages

Setup

Getting Started

Trained Models for unimorph_inflect

References

Issues and Usage Q&A

LICENSE

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages