-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support CJK on Full Text Search #21
Comments
Hey, I don't have any experience with CJK sentences. Do you have any suggestions on how eliasdb could support this? Maybe a config option for |
If we look at the introduction of Ruby in Japanese here: https://www.ruby-lang.org/ja/, we see this: オープンソースの動的なプログラミング言語で、 シンプルさと高い生産性を備えています。 エレガントな文法を持ち、自然に読み書きができます。 Spaces, nor anything else is used at all to separate the words, We only have the comma 、 and the end of sentence 。. In CJK languages the reader has to find the word boundaries based on grammar or dictionaries. So defining a list of separator characters will not solve this. Rather, EliasDB should be extended to make it possible to look for non-delimited sub strings, something which is generally useful. |
Another solution is to use a CJK text segregation library. I just found one for Go: |
This requires stemming to do CJK bleve has some of these |
CJK sentences are not separated by spaces. For now eliasdb can't handle an attempt which intended to search a specific word in some sentence in CJK. It would be great to be able to do that.
The text was updated successfully, but these errors were encountered: