alignment ouptut idea. #5

da1nerd · 2018-05-02T20:59:33Z

We need to support alignments across verses.
Here are three possible solutions.

{
          "confidence": 0.516905944153279,
          "sourceNgram": [0],
          "targetNgram": [10, { // add an object
            "position": 0,
            "verse": 2,
            "chapter": 3
          }],
          // or separate object
          "versification": {
            "target": {
              "nextVerseId": 1
            },
            "source": {
              
            }
          }
        },

or we can just keep the number ids and format it like this 11001 e.g. chapter 11 and verse 1.

The text was updated successfully, but these errors were encountered:

da1nerd · 2018-05-02T21:08:25Z

This is better

{
          "confidence": 0.516905944153279,
          "sourceNgram": [0],
          "targetNgram": [10, 0, 1, 10001]
        },

jag3773 · 2018-05-04T12:28:15Z

@neutrinog Can you explain what the values in [10, 0, 1, 10001] refer to?

da1nerd · 2018-05-04T17:45:13Z

NOTE: I think the 10 was in there by accident.

Structure

Given a context of verse 9, and that verse 9 contains two tokens, here is an alignment of three tokens from the target text to one token in the source text:

{
  "confidence": 0.516905944153279,
  "sourceNgram": [0],
  "targetNgram": [0, 1, 10001]
}

Within the targetNgram we see two tokens from verse 9 indicated by the positional values 0 and 1.
Additionally we include the second (zero indexed) token from verse 10 in this alignment indicated by 10001.

Rules

Referring to tokens outside of the current context proceeds as follows:

Prepend additional context as required.

If from a different verse, prepend the numerical verse number.
If from a different chapter, prepend the numerical chapter number.

NOTE: it is not supported, and we believe unnecessary to align tokens across different books.

As additional context is appended, the previous context must be zero filled to three digits.

Here the above example is shown in it's expanded, simplified, and parsed forms:

chapter	verse	token
`000`	`010`	`001`	expanded
	`10`	`001`	simple
	`10`	`1`	parsed

Parsing such a value is done by casting the value as a string and splitting it in chunks, 3 characters in length, originating from the end (right side).

da1nerd · 2018-05-04T17:46:51Z

@klappy fyi, I included this description ^

da1nerd · 2019-05-08T00:02:18Z

The most recent approach being considered involves storing a context id inside of the tokens. This will allow wordMap to be agnostic to the concept of crossing verse and chapter boundaries.

For example, here is a contrived example where wordMap has received two tokens:

{
              "text": "Lord",
              "occurrence": 1,
              "occurrences": 1,
              "contextId": "BOOK001001"
}
{
              "text": "The",
              "occurrence": 1,
              "occurrences": 1,
              "contextId": "BOOK001002"
}

In this case token at index 0 is from verse 1 and token at index 1 is from verse 2.
wordMap will be able to process these tokens like normal and the alignment will contain these token objects for later reference. See 4499068 as an example for passing the token object to the output.

With this method it should be noted that cross verse alignment would not be supported (at least not in a deterministic way) with simple string input to wordMap. The input must pre-tokenized with the context id added as needed.

da1nerd · 2019-05-08T00:07:54Z

@PhotoNomad0 ☝️

da1nerd mentioned this issue Jun 27, 2018

Create JSON -> USFM3 Converter NPM Module unfoldingWord/translationCore#4622

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

alignment ouptut idea. #5

alignment ouptut idea. #5

da1nerd commented May 2, 2018 •

edited

Loading

da1nerd commented May 2, 2018

jag3773 commented May 4, 2018

da1nerd commented May 4, 2018 •

edited

Loading

da1nerd commented May 4, 2018

da1nerd commented May 8, 2019 •

edited

Loading

da1nerd commented May 8, 2019

alignment ouptut idea. #5

alignment ouptut idea. #5

Comments

da1nerd commented May 2, 2018 • edited Loading

da1nerd commented May 2, 2018

jag3773 commented May 4, 2018

da1nerd commented May 4, 2018 • edited Loading

Structure

Rules

da1nerd commented May 4, 2018

da1nerd commented May 8, 2019 • edited Loading

da1nerd commented May 8, 2019

da1nerd commented May 2, 2018 •

edited

Loading

da1nerd commented May 4, 2018 •

edited

Loading

da1nerd commented May 8, 2019 •

edited

Loading