-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Chinese language misclassified #98
Comments
I see that this is a known issue with Why has this issue not be resolved after 7 years? |
Not even remotely close.
Found a partial solution from chatGPT, before using this you have to fix the ko profile.
|
I suggest to use pycld2 instead. It has some issues as well, but none as grave as langdetect imo. |
Thank you @Dobatymo. |
I use
langdect
to classify the language of a website when the site does not have alang
attribute in the HTML. Occasionallylangdect
will misclassify a website written in Chinese. For example this website:https://news.sina.com.cn/c/xl/2022-01-23/doc-ikyamrmz6973062.shtml
Is classified as Korean and not Chinese by
langdect
.This is the title of the article -- 相约北京 习近平邀世界“共同见证”_手机新浪网
Why does
langdect
classify the language of this website as Korean and not Chinese?The text was updated successfully, but these errors were encountered: