Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

word splitting regex fails with underdot characters #40

Open
chchch opened this issue Feb 12, 2018 · 0 comments
Open

word splitting regex fails with underdot characters #40

chchch opened this issue Feb 12, 2018 · 0 comments

Comments

@chchch
Copy link

chchch commented Feb 12, 2018

Hi,

I'm using hypher.js with transliterated Sanskrit, and it doesn't play well with characters such as ṇ, ṣ, ḍ, ṭ, etc. The problem seems to be the long regex used to split a string into words (line 107 of hypher.js). I guess your character class doesn't include the unicode ranges for underdot characters. I've replaced it with a simpler expression:
var words = str.split(/([\s\n\r\t.,:;'"!?-])/g);
which matches word boundary characters instead of word characters. It works for me but it's not totally comprehensive... you would have to add a few more boundary characters to it to make it work for more languages...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant