Skip to content

Neural syntax annotator, supporting sequence labeling, lemmatization, and dependency parsing.

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT
Notifications You must be signed in to change notification settings

tensordot/syntaxdot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

75f4184 · Oct 22, 2023
Oct 14, 2023
Aug 6, 2022
Oct 14, 2023
Oct 16, 2023
Oct 14, 2023
Oct 14, 2023
Oct 14, 2023
Oct 22, 2023
Oct 14, 2023
Oct 14, 2023
Oct 20, 2020
Apr 23, 2023
Jul 6, 2021
Oct 16, 2023
Mar 18, 2021
Jul 6, 2021
Jul 6, 2021
Mar 22, 2021
Apr 24, 2023

Repository files navigation

SyntaxDot

Introduction

SyntaxDot is a sequence labeler and dependency parser using Transformer networks. SyntaxDot models can be trained from scratch or using pretrained models, such as BERT or XLM-RoBERTa.

In principle, SyntaxDot can be used to perform any sequence labeling task, but so far the focus has been on:

  • Part-of-speech tagging
  • Morphological tagging
  • Topological field tagging
  • Lemmatization
  • Named entity recognition

The easiest way to get started with SyntaxDot is to use a pretrained sticker2 model (SyntaxDot is currently compatbile with sticker2 models).

Features

  • Input representations:
    • Word pieces
    • Sentence pieces
  • Flexible sequence encoder/decoder architecture, which supports:
    • Simple sequence labels (e.g. POS, morphology, named entities)
    • Lemmatization, based on edit trees
    • Simple API to extend to other tasks
    • Dependency parsing as sequence labeling
  • Dependency parsing using deep biaffine attention and MST decoding.
  • Multi-task training and classification using scalar weighting.
  • Encoder models:
    • Transformers
    • Finetuning of BERT, XLM-RoBERTa, ALBERT, and SqueezeBERT models
  • Model distillation
  • Deployment:
    • Standalone binary that links against PyTorch's libtorch
    • Very liberal license

Documentation

References

SyntaxDot uses techniques from or was inspired by the following papers:

Issues

You can report bugs and feature requests in the SyntaxDot issue tracker.

License

For licensing information, see COPYRIGHT.md.