Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questionable results when hyphenating #105

Open
DoubleDee73 opened this issue Dec 11, 2023 · 3 comments
Open

Questionable results when hyphenating #105

DoubleDee73 opened this issue Dec 11, 2023 · 3 comments

Comments

@DoubleDee73
Copy link

DoubleDee73 commented Dec 11, 2023

Sometimes the output of the automatic hyphenation leaves a bit to be desired.

Examples:

  • "liv-ing" or "reas-ons" instead of "li-ving" or "rea-sons"
  • Sometimes no syllabification at all, e. g. "everything", "around", "saying", "secret", "little", "better" (should be something like eve-ry-thing, a-round, say-ing, se-cret, lit-tle, bet-ter)
  • Doesn't work well with colloquial words e. g. "wanna", "gonna"; or with gerunds that are shortened with apostrophe, like "goin'", "workin'"
@bohning
Copy link
Contributor

bohning commented Dec 11, 2023

That’s why I switched to dictionary files for UltraStar Creator: https://github.com/UltraStar-Deluxe/UltraStar-Creator/tree/master/syllabification.

As a side note, we’re talking about syllabification (splitting in to singable syllables) rather than hyphenation (splitting of written words).

@rakuri255
Copy link
Owner

rakuri255 commented Jan 3, 2024

Ok something is broken..
Thanks @DoubleDee73 for the exampels.

UltraSinger actually already uses syllables and not simple hyphenation. hyphenator.Syllables(cleaned_string)
The funny thing is that it returns different results depending on the language and yet they are all wrong.

assert hyphenation("differently", Hyphenator("de_AT")) == ["dif", "fer", "ent", "ly"]
Expected :['dif', 'fer', 'ent', 'ly']
Actual :['dif', 'ferent', 'ly']
assert hyphenation("differently", Hyphenator("en_US")) == ["dif", "fer", "ent", "ly"]
Expected :['dif', 'fer', 'ent', 'ly']
Actual :['dif', 'fer', 'ently']

I need to check what the PyHyphen integration is actually doing there.
It actually should use the information from LibreOffice..

@bohning thanks for the list. Will try to use it, if i cant fix PyHyphen.

@mindtakerr thanks for the info about the howmanysyllables website.
This makes it easy to check and shows how syllabels are actually formed.

@rakuri255
Copy link
Owner

PyHyphen uses C in the background to create syllables. It's not really written in a maintenance-friendly way. I think it makes a few mistakes.

In addition, the hyphen pattern data from LibreOffice are converted from TEX data. They also appear to be outdated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants