Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Japanese Characters cause the entire string to be detected as a URL #39

Open
joeyfedor opened this issue Mar 3, 2023 · 1 comment
Open

Comments

@joeyfedor
Copy link

If you run the detector in the text below, it thinks the whole text is a URL.

我进入你的主页很卡顿,也许是你的关注人数或者其他数据太多了,其他人主页没有这么卡顿。来自amethyst客户端

Characters 。 and , are single characters and are not considered spaces in this library.

@mattn
Copy link

mattn commented May 1, 2023

Using linkedin/URL-Detector is not good for detecting URLs for content which can be contained with multi-byte strings. Following test case matches Chinese/Japanese text usual.

runTest("\u9053 \u83dc\u3002\u3002\u3002\u3002", UrlDetectorOptions.Default);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants