Data Compiler

As its name suggests, Parabible's Data Compiler compiles module data from a given folder and generates an output sqlite file as well as compressed (zlib) CSVs for ingestion by the Parabible Database Server.

Usage

To compile the data for yourself, you will need to have a folder that contains all the modules you wish to compile. Each module should be in its own subfolder with no other nesting.

Note that the compiler uses Deno with the sqlite3 package, which requires the --unstable flag as well as some other allowances (thus "-A"). See here.

The compiler may be invoked with the following command:

deno run -A --unstable main.ts /path/to/modules

Modules

A module is a text such as a Bible version/translation (this now includes material such as Apostolic Fathers). Modules are defined below:

Modules supply two files at <module-name>/output/ to be consumed by the importer:

A json file with module information in module.json
A sqlite database called data.sqlite.

`module.json`

The module.json file provides module information such as copyright details, etc. An example of the format is taken from the BHS tagged by ETCBC:

{
 "abbreviation": "BHSA",
 "name": "Biblia Hebraica Stuttgartensia (Amstelodamensis)",
 "description": "Tagged BHS with linguistic annotations compiled by the ETCBC",
 "corpora": ["OT"],
 "language": "heb",
 "versification_schema": "bhs",
 "license": "Attribution-NonCommercial 4.0 International (<a href='https://creativecommons.org/licenses/by-nc/4.0/'>CC BY-NC 4.0</a>)",
 "url": "http://dx.doi.org/10.17026%2Fdans-z6y-skyh"
}

Field	Description
Abbreviation	Short name of the module (e.g. "BHS").
Name	Full name of the module (e.g. "Biblia Hebraica Stuttgartensia").
Description	Short description of the module (including provenance).
Corpora	The corpora included in the module. Available corpora include: `OT`, `NT`, `ApF`.
Language	Language code in ISO 639-2 format.
Versification_schema	The versification schema enables the importer to align verses across modules. Available schemas include: `bhs`, `kjv`, `lxx`.
License	Relevant copyright information.
Url	A link to the source data.

`data.sqlite`

In the case of the BHS in the module.json above, if the root folder for the BHS importer is hb-bhs-pipe, the importer expects to find hb-bhs-pipe/output/data.sqlite. Every module has a verse_text table. Tagged modules also include a word_features table.

The schema for word_features is as follows:

field	description
wid	The word's id (relative to this module)
text	The word as it should be rendered
prefix	Any punctuation etc. that precedes the word
trailer	Any punctuation etc. that succeeds the word
attribute	Word attributes that correspond to known "features" (see `validate/features.json`)
syntax_node	`sentence_node_id`, `clause_node_id`, `phrase_node_id`
rid	Reference id based on the relevant versification system

Each attribute should be listed in features.json

The importer will add an unversioned parallel_id based on the versification schema in module.json.

The schema for verse_text is much simpler:

field	description
rid	Reference id based on the relevant versification system
text	JSON string containing a verse text array*

The importer will add an unversioned parallel_id based on the versification schema in module.json.

* Verse text array is in the format:

type VerseTextArray = {
    wid: integer
    leader: string
    word: string
    trailer: string
}[]

Understanding Versification Schemas

Given a verse in a particular module, we need a way of finding a corresponding verse in some other module. To do this, the importer queries a sqlite database for alignment data to check, first, if there is any module already in the final database that has the matches the current node. If there are any results, the parallel_id that is found to be in use for the aligned node is returned. Finding none, the parallel_id is assigned as an autoincremented id.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
validate		validate
.gitignore		.gitignore
README.md		README.md
alignment.sqlite		alignment.sqlite
createOrderingIndex.ts		createOrderingIndex.ts
exportToCompressedCsv.ts		exportToCompressedCsv.ts
getParallelId.ts		getParallelId.ts
getValidModules.ts		getValidModules.ts
header.png		header.png
insertModuleInfo.ts		insertModuleInfo.ts
insertVerseText.ts		insertVerseText.ts
insertWordFeatures.ts		insertWordFeatures.ts
main.ts		main.ts
sortBookIdsBySchema.ts		sortBookIdsBySchema.ts
types.d.ts		types.d.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Compiler

Usage

Modules

`module.json`

`data.sqlite`

Understanding Versification Schemas

About

Releases

Packages

Languages

parabible/data-compiler

Folders and files

Latest commit

History

Repository files navigation

Data Compiler

Usage

Modules

module.json

data.sqlite

Understanding Versification Schemas

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

`module.json`

`data.sqlite`

Packages