Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accented characters being rejected in new versions #107

Open
collimarco opened this issue May 21, 2020 · 13 comments
Open

Accented characters being rejected in new versions #107

collimarco opened this issue May 21, 2020 · 13 comments

Comments

@collimarco
Copy link

I have noticed that URLs containing non-ASCII characters, like accented characters, were accepted in the past. However now this library rejects all of them.

This change is probably related to one of these recent commits:
1945ae4
3dde863

Is this change made on purpose or not?

We should probably add a test to clarify this if it is made on purpose.

I have read that the standard requires only ASCII characters, so this is probably correct.

On the other side, I have noticed that many users of my website were affected and started getting validation errors when they tried to post external URLs on our website... So, if this is not a security issue (I don't know), maybe we can consider accepting them as we did in the past?

@kritik
Copy link
Member

kritik commented May 21, 2020

Standard allows to use utf-8 characters

@collimarco
Copy link
Author

@kritik Yes, I agree with you. However, on the other side, if users copy and paste some URLs opened in their browser, they may contain accented characters. Also, if you use client-side validation in forms (e.g. a URL field) most browsers allow accented characters (I don't know why).

@kritik
Copy link
Member

kritik commented May 21, 2020

can you give an example of wrong url?

@collimarco
Copy link
Author

@kritik e.g. https://example.com/è is now rejected by the validation (since last version)

@kritik
Copy link
Member

kritik commented May 21, 2020

oh, then it should be fixed

@collimarco
Copy link
Author

collimarco commented May 21, 2020

@kritik Based on the answers on SO only ASCII characters should be accepted. However that is not the common behavior of browsers, which currently accept also other characters (e.g. accented characters).

Accepting non ASCII characters is probably not compliant with the standard. However that would be more user-friendly and reflect the browser behavior.

In any case if we accept non ASCII characters we should make sure not to create security issues (for example when the link is included in a Rails link_to for example).

@kritik
Copy link
Member

kritik commented May 21, 2020

Let me put it in this way. non ASCII characters are accepted in the urls. If you network doesn't support it, possibly in US, then browser can encode it to ASCII. If domain name has non-ASCII characters then special prefix will be added (don't remember which one). Other parts of url will be encoded by url encode logic

@collimarco
Copy link
Author

non ASCII characters are accepted in the urls

That is widespread, but not standard. Here's more information.

@kritik
Copy link
Member

kritik commented May 21, 2020

May be that website old? When we did at Perfectline update for Estonian domain system upgrade then we had to support ASCII and Non-ASCII characters. It works the same way as in Russian or Chinese domains. For more info you can check https://github.com/internetee/registry

@TimFletcher
Copy link

I am unable to validate a URL with Chinese characters:

https://www.redcloudoverall.com/collections/jeans/products/redcloud-赤芸-lot-r423-66-ow-washed-regular-cut-selvedge-denim

@Getmrahul
Copy link

@kritik Any update on this issue? I'm getting validation error for this url https://ro.linkedin.com/in/andrei-marcel-țiț-66b019a8

@xtagon
Copy link

xtagon commented Dec 21, 2020

My two cents is that it should be an option, maybe something like validates :my_url, url: {strict: true}. There are going to be people who want it one way or another.

@jfloff
Copy link

jfloff commented Sep 11, 2023

I also have a similar problem with a linkedin profile that has in its url the word césar which is not accepted by this gem. Any plans on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants