Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perl-compatible unsetting of captures during repeats #590

Open
NWilson opened this issue Dec 2, 2024 · 2 comments
Open

Perl-compatible unsetting of captures during repeats #590

NWilson opened this issue Dec 2, 2024 · 2 comments
Labels
enhancement New feature or request
Milestone

Comments

@NWilson
Copy link
Member

NWilson commented Dec 2, 2024

In maint/README:

Perl and PCRE2 sometimes differ in the settings of capturing subpatterns
inside repeats. One example of the difference is the matching of
/(main(O)?)+/ against mainOmain, where PCRE2 leaves $2 set. In Perl, it's
unset. Changing this in PCRE2 will be very hard because I think it needs much
more state to be remembered.

In pcre2compat:

  1. There are some differences that are concerned with the settings of captured
    strings when part of a pattern is repeated. For example, matching "aba" against
    the pattern /^(a(b)?)+$/ in Perl leaves $2 unset, but in PCRE2 it is set to
    "b".

This seems to be the most major thing in pcre2compat. It's also the only definite bug listed in maint/README (the rest seem to be fairly minor feature requests).

Unlike the technicalities of (*THEN) inside recursive patterns, or other trivia, that has major impact on "simple" regexes that just use standard syntax, like "abacb" =~ /^(a(b)?)+c\2$/ (as mentioned above).

@PhilipHazel
Copy link
Collaborator

Interestingly, nobody before has ever commented on these differences. I conclude that they are not in practice important.

@NWilson
Copy link
Member Author

NWilson commented Dec 2, 2024

I at least discovered this difference before noticing it in pcre2compat (and I didn't find maint/README until much later).

I was doing a very thorough investigation of behavioural differences between regex engines, in order to inform the behaviour choices in the Excel API.

I agree users are unlikely to notice or care.

@NWilson NWilson added the enhancement New feature or request label Dec 9, 2024
@NWilson NWilson added this to the 10.46 milestone Jan 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants