Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue detecting languages in non latin languages #4

Open
sidfeiner opened this issue Aug 6, 2017 · 1 comment
Open

Issue detecting languages in non latin languages #4

sidfeiner opened this issue Aug 6, 2017 · 1 comment

Comments

@sidfeiner
Copy link

Hello,
I've compiled the cld2 lib and built the Java project. When I try detecting some texts, it seems to work for latin languages (Dutch, Spanish, French, English) but when I feed it Arabic or Hebrew, the Result always returns "UNKNOWN".

@sk-
Copy link
Contributor

sk- commented Aug 7, 2017

Did you try: with some of the strings present in the test data.

For example:
" או לערוך את העדפות ההפצה אנא עקוב אחרי השלבים הבאים כנס לחשבון האישי שלך ב"
"احتيالية بيع أي حساب"

That should be detected by both detectors. Try running the different classifiers in isolation with the option --spring.profiles.active=cld2 or --spring.profiles.active=java_only, to check that that classifier is actually detecting the correct language.

It could be also an encoding problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants