9 Branches 226 Tags

Name	Name	Last commit message	Last commit date
Latest commit actions-user 0.6.33 Feb 9, 2021 77b131e · Feb 9, 2021 History 646 Commits
.github	.github	Update test-coverage.yaml	Jan 19, 2021
bo_crawler	bo_crawler	style: run black, flake8 and isort with pre-commit	Aug 11, 2020
installers	installers	style: run black, flake8 and isort with pre-commit	Aug 11, 2020
openpecha	openpecha	0.6.33	Feb 9, 2021
tests	tests	fix: test for old annotations structure	Jan 21, 2021
usage	usage	test(usage): remove old input	Aug 19, 2020
utils	utils	fix: small bugs	Dec 23, 2020
.gitignore	.gitignore	update .gitignore	Jan 15, 2021
.pre-commit-config.yaml	.pre-commit-config.yaml	fix: formatting index layer	Aug 9, 2020
CHANGELOG.md	CHANGELOG.md	0.6.33	Feb 9, 2021
CONTRIBUTING.md	CONTRIBUTING.md	Update CONTRIBUTING.md	Jul 15, 2020
LICENSE	LICENSE	Create LICENSE	Sep 23, 2019
README.md	README.md	Update README.md	Aug 12, 2020
TODO.txt	TODO.txt	WIP	May 28, 2020
requirements-dev.txt	requirements-dev.txt	ci: add pre-commit	Aug 11, 2020
setup.cfg	setup.cfg	ci: setup semantic release	Aug 11, 2020
setup.py	setup.py	fix: blupdate testcase and add cli	Jan 18, 2021

Repository files navigation

OpenPecha Toolkit

OpenPecha Toolkit allows state of the art for distributed standoff annotations on moving texts, in which Base layer can be edited without affecting annotations.

The motivation for this project it that for perfect base-text, there no big obstacles but the technical problems come in when you have to be able to edit the base-text, which can be correcting or updating the base-text. So the existing solution like using character coordinates won’t work. So we purposed the CCTV (Character Coordinate Translation Vector) to track the annotations from source base-text to edited base-text without worrying about the annotations at all. Then user can export the edited based text with updated annotations in various docuemnt format like .md, .epub, .pdf, etc. But currently it supports only markdown file.

For NLP this toolkit will provide a way to have annoated corpra with minimal errors and extract a particular type of annotation or collection of different type of annotations. NLP researchers can then use these corpus to build language model, annotations to build NER model, entity linking, ect.

Prerequisite

Python3, you can download from here

Installation

Usage

First we need to download all the poti which are in openpecha format.

$ openpecha download --help
Usage: openpecha download [OPTIONS]

  Command to download poti. You need to give a work-id of a poti to download it.

Options:
  -n, --number WORK_ID      Work-id of the poti, for single poti download
  --help                    Show this message

Automatic updating annotations from source base-text (original) and destination base-text (edited)

$ openpecha update --help
Usage: openpecha update [OPTIONS] WORK_ID

  Command to update the base text with your edits.

Options:
  --help  Show this message and exit.

Exporting and Extracting layer

$ openpecha layer --help 
Usage: openpecha layer [OPTIONS] WORK_ID OUT

  Command to apply a single layer, multiple layers or all available layers
  (by default) and then export to markdown.

  Args:

      - WORK_ID is the work-id of the poti, from which given layer will be
      applied

      - OUT is the filename to the write the result. Currently support only
      Markdown file.

Options:
  -n, --name [title|tsawa|yigchung|quotes|sapche]
                                  name of a layer to be applied
  -l, --list TEXT                 list of name of layers to applied,
                                  name of layers should be comma separated
  --help                          Show this message and exit.

Developer Installation.

$ git clone https://github.com/OpenPoti/openpecha-toolkit.git
$ cd openpecha-toolkit
$ pip install -r requirements.txt
$ pip install -e .

Testing

$ pytest tests

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenPecha Toolkit

Prerequisite

Installation

Usage

Developer Installation.

Testing

About

Releases 196

Packages

Contributors 13

Languages

License

OpenPecha/Toolkit

Folders and files

Latest commit

History

Repository files navigation

OpenPecha Toolkit

Prerequisite

Installation

Usage

Developer Installation.

Testing

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 196

Packages 0

Contributors 13

Languages

Packages