-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Case insensitive doesn't handle multi bytes UTF-8 character #490
Comments
By the way, I also noticed that re2go doesn't handle case-insensitive for characters inside square bracket. For example: func countA(input string) int {
for { /*!re2c
a { count++; continue } // This will match a and A
[a] { count++; continue } // This will only match a
* { continue }
$ { return count }
*/
}
} Is that the expected behavior? |
You are very welcome!
Sure. It is related, and I'm afraid no progress has been made in this area. It definitely is a good issue that I'd like to fix, but I don't have resources to fix it in the upcoming release 4.0. Leave this bugreport open as a remainder and I'll prioritize it for the next release.
I'd say no, although I'm not sure what was the original intention, as these features predate my experience with re2c. I think with Anyway, it's a good point. |
This should be an obvious remark, but: if default behavior does get changed, it should be very prominently marked in the release notes. |
@pmetzger True. So far re2c has done a good job of not breaking backwards compatibility, and we should hold on to this. I thought at first that in this case we'll only increase the subset of matching strings, so it won't break any existing code. But then I thought of the negative ranges and range subtraction, and it's not so easy after all. |
Imo, the best way to address both issues, extended Unicode car
insensitivity and the same for range is to add new flags and options. No
reason to change the existing and breaking people who rely on their current
behavior.
Cheers
Marcus
…On Mon, Sep 16, 2024, 20:52 Ulya Trofimovich ***@***.***> wrote:
@pmetzger <https://github.com/pmetzger> True. So far re2c has done a good
job of *not* breaking backwards compatibility, and we should hold on to
this. I thought at first that in this case we'll only increase the subset
of matching strings, so it won't break any existing code. But then I
thought of the negative ranges and range subtraction, and it's not so easy
after all.
—
Reply to this email directly, view it on GitHub
<#490 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABQ7NSO3HZAMXAOYKPWMCJTZW4SGBAVCNFSM6AAAAABOJR7JCSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNJTGY3TCNZSGY>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Hi @skvadrik, I'm sorry for opening another issue this quickly. This issue might be related with #118, but since that one was from 9 years ago I thought it's better to create a new issue than necroing that one.
For example, I have the following string:
I want to search
ö
case insensitively, so I create the following templates:When we run the generated code, that function above will only able to found 2
ö
characters and ignore its capitalÖ
. As workaround, we can use square bracket to explicitly specify both of them:However it would be nice if re2c can handle it internally.
Thanks!
The text was updated successfully, but these errors were encountered: