TTS lexicon #753

torchtrust · 2023-12-18T16:48:46Z

torchtrust
Dec 18, 2023

I am progressing to the lexicon now we can produce a whole DTBook with css added. In a lot of our books are Abbreviations for books of the Bible and I have worked out the look ahead to include the full stop(period) e.g. Ps. for Psalm
<lexeme> <grapheme positive-lookahead="(\.)?[ ]+[0-9]">Ps</grapheme> <alias>Psalm</alias> </lexeme>
But I want it to get rid of the full stop (period) as the TTS pauses too long as it thinks it is at the end of a sentence!
Putting a full stop in the grapheme doesn't work even with a backslash.

Any ideas?
thanks
Paul

bertfrees · 2023-12-18T20:03:45Z

bertfrees
Dec 18, 2023
Maintainer

I think what's going on is that the word detection does not detect the "Ps." as an abbreviation. The "." is seen as punctuation and falls outside of the word token. This is why "Ps." is not matched by the lexicon if you include the ".":

<lexeme>
    <grapheme regex="true" positive-lookahead="[ ]+[0-9]">Ps.</grapheme>
    <alias>Psalm</alias>
</lexeme>

I don't think this can be solved on the lexicon level if the word detection is wrong.

So we either

have to fix word detection. This needs to be done within the Java code.
provide to way for users to override the word detection, e.g. by taking into account already marked up words:
```
... <w>Ps.</w> ...
```

0 replies

torchtrust · 2023-12-20T11:43:54Z

torchtrust
Dec 20, 2023
Author

Thanks Bert, There are also plenty of other abbreviations such as: Vol. p. pp. c. d. No. So for now I can only think the solution is to pre-process the dtbook, which we could do in my system. If you want a solution within the DAISY pipeline, which would be ideal, then there does need to be another way around this issue. Not being a linguist I don't know if this is limited to english. Thanks Paul

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TTS lexicon #753

{{title}}

Replies: 2 comments

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

TTS lexicon #753

torchtrust Dec 18, 2023

Replies: 2 comments

bertfrees Dec 18, 2023 Maintainer

torchtrust Dec 20, 2023 Author

torchtrust
Dec 18, 2023

bertfrees
Dec 18, 2023
Maintainer

torchtrust
Dec 20, 2023
Author