-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
alignment ouptut idea. #5
Comments
This is better
|
@neutrinog Can you explain what the values in |
StructureGiven a context of verse 9, and that verse 9 contains two tokens, here is an alignment of three tokens from the target text to one token in the source text:
Within the RulesReferring to tokens outside of the current context proceeds as follows: Prepend additional context as required.
As additional context is appended, the previous context must be zero filled to three digits. Here the above example is shown in it's expanded, simplified, and parsed forms:
Parsing such a value is done by casting the value as a string and splitting it in chunks, 3 characters in length, originating from the end (right side). |
@klappy fyi, I included this description ^ |
The most recent approach being considered involves storing a context id inside of the tokens. This will allow wordMap to be agnostic to the concept of crossing verse and chapter boundaries. For example, here is a contrived example where wordMap has received two tokens: {
"text": "Lord",
"occurrence": 1,
"occurrences": 1,
"contextId": "BOOK001001"
}
{
"text": "The",
"occurrence": 1,
"occurrences": 1,
"contextId": "BOOK001002"
} In this case token at index With this method it should be noted that cross verse alignment would not be supported (at least not in a deterministic way) with simple string input to wordMap. The input must pre-tokenized with the context id added as needed. |
@PhotoNomad0 ☝️ |
We need to support alignments across verses.
Here are three possible solutions.
or we can just keep the number ids and format it like this
11001
e.g. chapter11
and verse1
.The text was updated successfully, but these errors were encountered: