Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regex: Fix negative sets capturing newlines #15

Merged
merged 2 commits into from
Jul 12, 2024

Conversation

AntonLydike
Copy link
Owner

LLVMs llvm::Regex has the following flag enabled on all FileCheck regex matches:

enum llvm::Regex::RegexFlags::Newline = 2U
Compile for newline-sensitive matching. With this flag '[^' bracket
expressions and '.' never match newline. A ^ anchor matches the
null string after any newline in the string in addition to its normal
function, and the $ anchor matches the null string before any
newline in the string in addition to its normal function.

This means that [^ ,], combined with this flag, is actually equivalent to [^\n ,] in python. Since python doesn't have a flag like this, we approximate this through a best-effort patch that tries to add \n to negative sets in the regexes as part of the regex translation.

This is not a good fix imho, as it will break on regexes where [^ appears inside a set, but that should be sufficiently rare that it won't happen 🤞

@AntonLydike AntonLydike self-assigned this Jul 12, 2024
@AntonLydike AntonLydike added the parity Diverging from upstream FileCheck label Jul 12, 2024
@AntonLydike AntonLydike merged commit acaaae0 into main Jul 12, 2024
4 checks passed
@AntonLydike AntonLydike deleted the anton/fix-negative-sets-matching-newlines branch July 12, 2024 22:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parity Diverging from upstream FileCheck
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant