Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regexp_extract and regexp_extract_all returns match in mismatched group #12119

Closed
HolyLow opened this issue Jan 20, 2025 · 0 comments
Closed
Labels
bug Something isn't working triage Newly created issue that needs attention.

Comments

@HolyLow
Copy link
Contributor

HolyLow commented Jan 20, 2025

Bug description

The implementation of Re2Extract and Re2ExtractAll has a bug, that it might consider a mismatched group as MATCHED empty string "" rather than MISMATCHED std::nullopt.

For example, in the function calling: regexp_extract("rat cat\nbat dog", "ra(.)|blah(.)(.)", 2).
In this case, for group 2 the result must be std::nullopt because no substring would match pattern blah(.).
But the current implementation would mistake the matching of group 1 ra(.) as a empty match case for group 2, and thus return a empty matching, which is wrong.

System information

Velox System Info v0.0.2
Commit: b7ed8c7
CMake Version: 3.28.3
System: Linux-5.15.0-92-generic
Arch: x86_64
C++ Compiler: /usr/bin/c++
C++ Compiler Version: 11.4.0
C Compiler: /usr/bin/cc
C Compiler Version: 11.4.0
CMake Prefix Path: /usr/local;/usr;/;/usr/local/lib/python3.10/dist-packages/cmake/data;/usr/local;/usr/X11R6;/usr/pkg;/opt

Relevant logs

@HolyLow HolyLow added bug Something isn't working triage Newly created issue that needs attention. labels Jan 20, 2025
facebook-github-bot pushed a commit that referenced this issue Jan 22, 2025
Summary:
The implementation of Re2Extract has a bug, that it might consider a mismatched group as MATCHED empty string "" rather than MISMATCHED std::nullopt.

For example, in the function calling: regexp_extract("rat cat\nbat dog", "ra(.)|blah(.)(.)", 2).
In this case, for group 2 the result must be std::nullopt because no substring would match pattern `blah(.)`.
But the current implementation would mistake the matching of group 1 `ra(.)` as a empty match case for group 2, and thus return a empty matching, which is wrong.

See detailed description in #12119.

This PR fix this bug in Re2Extract implementation.

Also note that this bug behavior exists in Re2ExtractAll as well, but this PR doesn't modify in Re2ExtractAll because existing UTs of Re2ExtractAll already rely on this behavior.

Pull Request resolved: #12109

Reviewed By: kgpai

Differential Revision: D68334118

Pulled By: kagamiori

fbshipit-source-id: 5824102903760c0459af64a6fe62dd195c5f1928
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage Newly created issue that needs attention.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant