You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Playing around with Int.Segmenter (#145) reveleaded an issue with combining accents.
E.g. the word "БЕЛАРУ́СКАЯ" in test6.html contains the letter "У" with a "COMBINING ACUTE ACCENT ◌́").
The current implementation for finding words first looks for consecutive characters of the unicode property escape \p{Letter} and then checks if this set contains a character that is not in the alphabet defined by the .wasm file of the respective language.
Since combining accents are not part of \p{Letter} and there is no normalized character the current implementation finds "БЕЛАР" and hyphenates just this part of the word. This could lead to errors.
How to solve (ideas):
Include \p{Mn} (or a subset) in the regex at line 562 (-> don't hyphenate this word at all)
Also include \p{Mn} in the "alphabet" but omit while hyphenating
The text was updated successfully, but these errors were encountered:
Playing around with Int.Segmenter (#145) reveleaded an issue with combining accents.
E.g. the word "БЕЛАРУ́СКАЯ" in
test6.html
contains the letter "У" with a "COMBINING ACUTE ACCENT ◌́").The current implementation for finding words first looks for consecutive characters of the unicode property escape
\p{Letter}
and then checks if this set contains a character that is not in the alphabet defined by the.wasm
file of the respective language.Since combining accents are not part of
\p{Letter}
and there is no normalized character the current implementation finds "БЕЛАР" and hyphenates just this part of the word. This could lead to errors.How to solve (ideas):
\p{Mn}
(or a subset) in the regex at line 562 (-> don't hyphenate this word at all)\p{Mn}
in the "alphabet" but omit while hyphenatingThe text was updated successfully, but these errors were encountered: