-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add length_utf16 validator #245
base: master
Are you sure you want to change the base?
Conversation
I don't think it makes sense to add that to the library, it's better added as a custom validator. |
@Keats The need for an UTF-16 code unit length validator is very common - assume all of web form handling -, since the maxlength of HTML form fields counts UTF-16 code units. If the frontend counts UTF-16 code units, and the backend counts UTF-8 code units, then inconsistencies arise whenever values contain characters encoded with different length in UTF-16 vs UTF-8. This results in values being rejected by the server which passed client side validation, whenever the server's UTF-8 representation longer than the browser's UTF-16 representation. |
@@ -81,6 +87,7 @@ impl Validator { | |||
Validator::Regex(_) => "regex", | |||
Validator::Range { .. } => "range", | |||
Validator::Length { .. } => "length", | |||
Validator::LengthUTF16 { .. } => "length_utf16", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validator::LengthUTF16 { .. } => "length_utf16", | |
Validator::LengthUtf16 { .. } => "length_utf16", |
For consistency with Rust code style, you might want to use Utf
in identifiers.
E.g. https://doc.rust-lang.org/std/str/struct.EncodeUtf16.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll address code style suggestions if the approach with an extra builtin validator type is desired for the crate users.
We could gather more comments/thumbs up in the MR description for more feedback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think it makes more sense to go with a parameter approach to the length validator like mentioned in #250 otherwise we just duplicate things that are 99% the same
This PR adds length_utf16 validator.
My project exposes data from Salesforce via JsonSchema based API. I want to validate field lengths in the same way as Salesforce does - by counting UTF16 characters.
UTF16 is used for Unicode string representation in JavaScript, Java and Salesforce APEX. I think this validator could be useful to others as well. A good use case is to align backend and frontend length validators.
An example of mismatch between UTF16 and Unicode codepoints: '𝔠' symbol has 2 UTF16 characters but it's still 1 Unicode codepoint.
Should I wrap the implementation in optional feature
length_utf16
?