Releases

0.0.39 (2021/05/20)

(models/fro) Updated the model

0.0.38 (2021/05/20)

(deps) Moved to PaPie

0.0.37 (2021/04/14)

(models/fro) Fixed regression for apostrophes

0.0.36 (2021/04/12)

(models/fro) Added [REF:...] excluder

0.0.35 (2021/04/12)

(models/Freem) Update model to handle morph
(models/Freem+Fr) Added reference excluder
(models/Freem+Fr) Excluders updated to use CharRegistry
(pipeline/excluders) reworked ApostropheExcluder and the like to use CharRegistry

0.0.34 (2021/04/02)

(CLI) Allows to specify --max-tokens at the CLI level

0.0.33 (2021/04/02)

(model/lasla) Updated model to use multi-part pie model

0.0.32 (2021/03/22)

(model/grc) Added a [REF:.*] excluder
(Tagger) Passed the argument for lower to the main Tagger (might not change a thing)

0.0.31 (2021/02/17)

(requirements) Upgrade pie version because I (@ponteineptique) messed up.

0.0.30 (2021/02/17)

(requirements) Upgrade pie version

0.0.29 (2021/02/11)

(models/grc) Added morphology tags and updated to 0.0.2

0.0.28 (2021/02/11)

(requirements) Upgrade pie version

0.0.27 (2021/02/01)

(models/lasla) Support ignoring character tokens through [IGN:char]
(pipeline/excluders) Made sure excluder would use the same replacement character through a CharRegistry dictionary

0.0.26 (2021/01/14)

(models/lasla) Use model LASLA+ from 0.0.5b trained on PyTorch 1.3.1

0.0.25 (2021/01/13)

Added a max_tokens per sentence limit in DataIterators.

0.0.24 (2020/12/14)

(models/fro) Updated model fro to 0.3.0 using multiple tasks

0.0.23 (2020/12/04)

(models/dum) Added a new model with Middle Dutch thanks to Mike Kestemont
(tokenizers) Added a SimpleTokenizer based on length

0.0.22 (2020/12/02)

(models/lasla) Apply unidecode
(models/lasla) Use model LASLA+ from 0.0.5alpha trained on PyTorch 1.3.1
(models/lasla) Updated the abbreviation list
(CI) Added Github Actions
(Documentation) Added a warning about supported python versions
(Documentation) Fixed the example
(pipeline) Created AbbreviationsRemoverExcluder
(dependencies) Cleaned the version requirements due to pip update

0.0.21 (2020/09/24)

(models/LASLA) Fixed a bug where clitics are not split correctly after nouns
Fixed multiple typos in CHANGES.md in version numbers

0.0.20 (2020/09/22)

New Latin model which handles capitalized input, entities and better disambiguation.

0.0.19 (2020/09/18)

(Latin Model) Fixed a long standing bug where Latin would not tag Gender because I forgot it in the GlueProcessor... Big Facepalm

0.0.18 (2020/09/08) - Release named A2M

Fixed the way the DataIterator deals with documents ending with a sentence formed of excluded tokens only.
Fixed a typo in an import pattern
(Latin Model) Dealt with some weird Unicode numerals which unexpectedly broke our .isnumeric() usage (e.g. ↀ )

0.0.17 (2020/07/26)

Added a way to tag texts where word are already tokenized: new lines are word separator, double new lines are sentence separator
Reworked the way preprocessing of special chars is done prior to sentence tokenization and after it. Creation of the class Excluder (pie_extended.pipeline.tokenizers.utils.excluder)
- Allows for more code sharing across models.
Fixed a typo that would prevent to tag with FREEM (and nobody saw that ! ;) )

0.0.16 (2020/06/22)

Fixed Early Modern French Model (reusing processor and tokenizer of FR model)
Added Ancient Greek Model (Very basic addition, need more work probably ?)

0.0.15 (2020/06/22)

Added Early Modern French Model (reusing processor and tokenizer of FR model)

0.0.14 (2020/04/24)

Hotfixed columns order in tsv output

0.0.13 (2020/04/24)

Hotfixed lowercase for latin model

0.0.12 (2020/04/24)

Unfilled TODO

0.0.11 (2020/04/24)

Unfilled TODO

0.0.10 (2020/04/24)

PIE_EXTENDED_DOWNLOADS environment variable can be used to set up a non default directory for models and linked data.
- eg. PIE_EXTENDED_DOWNLOADS=~/PieData pie-extended download fr

0.0.9 (2020/04/24)

(Breaking) Postprocessors now must return a list of dict instead of a dict with .get_dict() methods (#c8be021)
Added a better tokenizer for Classical French
- Keeps aujourd'hui intact
- Keeps union dash with pronouns for -le in sentence such as mange-le.
- Keeps the -t euphonique together with the pronoun: mange-t-il becomes mange and -t-il
  - Its removed from lemmatization until a new model is trained (old model had -t on the verb)
  - Elision work as intended for non euphonique such as "Va-t'en" -> va, -t', en
Updated Classical French models (#15)
Added a post-processor to split tokens (#17)