You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a feature request to allow to replace replacement, ie. to restart replace after next character of the match instead of the next character after the match.
Currently, recursive regexes must be manually restarted to match inner matches which imply some unneeded CPU overhead, especially in non-compiled programming languages.
I propose a new PCRE flag which will force the PCRE engine replace process to continue at +1 character instead of +N characters (where N is number of matched characters).
The text was updated successfully, but these errors were encountered:
I have just had a look at this, and what you suggest is not something that can easily be done because the code works by creating its output in a different buffer. When the global option is set, the scan continues in the old (input) buffer. I think you could implement what you want externally fairly efficiently by having two buffers. Start with buffer 1 holding the input, call pcre2_match(), remember the offset where it matched, call pcre2_substitute() with your match_data block and PCRE2_SUBSTITUTE_MATCHED but NOT the global option, and buffer 2 as the output. Start the next call to pcre2_match() with buffer 2 as the input and the appropriate offset and buffer 1 as the output. And so on.
Consider doing 'aa'.replace(/a/g, 'ba') (JavaScript) or re.sub(r'a', 'ba', 'aa') (Python). After replacing the first 'a' with 'ba', if you continue matching then you'd just inflate the string forever. So other languages don't provide the behaviour you describe.
To stop this, you'd have to implement some sort of restriction, that the matching wouldn't simply continue at +1 character (bump forwards into the replaced string), but would additionally skip matches until finding a match that intersects the next unmatched portion of the string.
This would allow cases like 'aaa'.replace(/aa/g, 'ba') to do 'aaa' → 'baa' (first replacement) → 'bba' (second). It would terminate, since each replacement guarantees to always consume at least one further character from the original input string.
However, this isn't what you requested, I think.
The substitution functions in other languages, like Perl/Python/JavaScript, don't have this feature.
Maybe we should close this ticket, by adding some documentation explaining how to implement this yourself (in the application, by doing repeated replacements). It seems unlikely we'd want it in PCRE2 itself.
This is a feature request to allow to replace replacement, ie. to restart replace after next character of the match instead of the next character after the match.
Currently, recursive regexes must be manually restarted to match inner matches which imply some unneeded CPU overhead, especially in non-compiled programming languages.
recursive regex usecase: https://github.com/PHP-CS-Fixer/PHP-CS-Fixer/blob/v3.38.2/src/DocBlock/TypeExpression.php#L171 (matching for phpdoc grammar)
recursive regex replace usecase: https://github.com/atk4/data/blob/6bd51b730d/src/Schema/TestCase.php#L176 (replacing of possibly nested SQL expressions)
I propose a new PCRE flag which will force the PCRE engine replace process to continue at +1 character instead of +N characters (where N is number of matched characters).
The text was updated successfully, but these errors were encountered: