Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix CoreScalarResolver.INT regex to work with JS #403

Merged
merged 1 commit into from
Feb 6, 2025
Merged

Conversation

krzema12
Copy link
Owner

@krzema12 krzema12 commented Feb 6, 2025

Found when converting the corresponding test to KMP: #397

What helped me here is this tool that renders the regex as a graph: https://www.debuggex.com/

Before the fix - I noticed that ^ and $ are attached to the groups nearest to them:
Screenshot 2025-02-06 at 09 05 30

After the fix - creating a new group that contains the rest fixes the problem:
Screenshot 2025-02-06 at 09 06 08

@krzema12 krzema12 marked this pull request as ready for review February 6, 2025 08:16
@krzema12 krzema12 requested a review from OptimumCode February 6, 2025 08:16
@krzema12 krzema12 enabled auto-merge (squash) February 6, 2025 08:16
@krzema12 krzema12 merged commit 8632ef0 into main Feb 6, 2025
12 checks passed
@krzema12 krzema12 deleted the fix-regex-js branch February 6, 2025 08:18
@Vampire
Copy link
Contributor

Vampire commented Feb 6, 2025

Can you explain further, please?
Because that does not really make any sense at all.
The previous regex should match fine.
And (unless I miscopied on mobile) debuggex is also happy with the previous state.

The change you made does not change any functionality as long as whole strings are matched and even then the test would not fail as even a substring match would hold, besides that the change is part of my readability PR, just with non-capturing groups.

@krzema12
Copy link
Owner Author

krzema12 commented Feb 6, 2025

@Vampire the regex before this change led to test failures in #397. See this run: https://github.com/krzema12/snakeyaml-engine-kmp/actions/runs/13174353963/job/36770339573 and the build scan: https://scans.gradle.com/s/k4ayl4yetgdve/tests/overview?outcome=Failed. The change merged in this PR fixes the problem, that's it. For me, this proof is enough. We'll convert the group into a non-capturing one in your regex-related PR.

@krzema12
Copy link
Owner Author

krzema12 commented Feb 6, 2025

@Vampire if course if you see a other reason why the tests failed, let me know! This case also intrigued me, so I want to understand it well.

@Vampire
Copy link
Contributor

Vampire commented Feb 6, 2025

I had a closer look and it is like I said, it doesn't make any sense.
The reason is, that we found a bug in K/JS.
The docs of Regex#matches expectedly say

Indicates whether the regular expression matches the entire input.

The problem is, that the K/JS Implementation at https://github.com/JetBrains/kotlin/blob/de5284d7e4af5b94d9d4144119134eb0285a5fa9/libraries/stdlib/js/src/kotlin/text/regex.kt#L97 is bullshit. It executes the regex against the input, gets a result that might only match a part (in this case just 0 matched my the first alternative, then compares whether that match is starting at first and ending at last character. That it could have matched the entire input too is missed and thus the result of the method wrong.

Have a look at https://pl.kotl.in/sfmoOkwcB. With JVM all four cases are correctly true, with either of the JS targets the first case is wrongly false, as only the first character is matched, even though all could match.

@Vampire
Copy link
Contributor

Vampire commented Feb 6, 2025

Actually due to that bug, you should probably go through all regexes and check whether they are intended to match the entire input and add the anchors and non-matching group of they should. In my PR I only fixed the cases you fixed here, where the group was missing.

@krzema12
Copy link
Owner Author

krzema12 commented Feb 6, 2025

Nice find! Indeed nativePattern.lastIndex == input.length looks shady. FWIW, the JS-specific unit tests don't cover it (ref), and I haven't found common tests for it.

So yeah, this PR turns to be a workaround for the issue you found. Are you planning to report it?

@Vampire
Copy link
Contributor

Vampire commented Feb 6, 2025

No, I'm planning to fix it, I don't have a commit in Kotlin yet. :-D

@Vampire
Copy link
Contributor

Vampire commented Feb 6, 2025

Actually it is even worse.
They don't only not handle entire match in matches properly,
but also try to be overly clever in matchEntire, using the original pattern if it starts with ^ and ends with $ which is wrong of course and if not, strips any ^ from start and $ from end and places the same there without a group which of course is also wrong.
🙈

@Vampire
Copy link
Contributor

Vampire commented Feb 6, 2025

JetBrains/kotlin#5402

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants