Fix Regex Character Class Escape Tests #4195

Aurele-Barriere · 2024-08-14T12:52:19Z

Character class escape tests for \d, \D, \s, \S, \w, \W are not testing what they should.

Consider the escape \d.
We should test two things:

That it matches ALL digits (positive cases)
That it matches NONE of the other code points (negative cases).

This is not what the auto-generated tests are currently doing.
character-class-digit-class-escape.js only checks that \d finds a match in "0123456789".
And character-class-digit-class-escape-plus-quantifier.js only checks that \d+ finds a match in "0123456789".

First, these two tests are equivalent.
But also, they do not check that \d matches all digits, only one.
And they never check that non-digits are not matched.
It seems that an implementation where \d matches only "3" and "z" would pass all the current tests.

I suggest the following:

check positive cases by testing ^\d+$ (with anchors) on the string "0123456789".
check negative cases by testing the absence of any match of \d on the string that contains all code points except digits.
do the same for each character class escape.

Thanks to Noé De Santo @Ef55 for realizing that these tests were probably not what they were intended to be, while working on the Warblre project.

For each character class escape (\d, \D, \s, \S, \w, \W), check positive cases (the escape matches all characters it's supposed to match) and negative cases (the escape doesn't match any of the characters it should not match). Each of these checks is also done in Unicode mode. This is part of my work at the SYSTEMF lab at EPFL.

ptomato · 2024-08-14T15:48:59Z

cc @mathiasbynens, you might be interested in this.

mathiasbynens · 2024-08-15T03:49:37Z

CC @leobalter as the original author of these tests.

Thanks for kicking this off! Since these are generated tests, could you please update the source script that generates these files rather than editing them directly? It would be easier to review that patch.

Aurele-Barriere · 2024-08-15T12:08:21Z

I'll give it a try in the next few days!

leobalter · 2024-08-16T01:52:32Z

I no longer have the code on any of my machines, the diff information points out to https://github.com/bocoup/test262-regexp-generator. I hope that helps.

Aurele-Barriere · 2024-08-21T09:13:46Z

Hi,
It seems like the tests currently in Test262 are not the same as the tests of the generator.
Currently, the tests in the generator check that, for instance, str.replace(/\d/g,'') === str.replace(/|0-9]/g,'') where str is the string of all code points.
I think these generator tests are fine: if \d matched too many things, then the string on the left side of the equality would be shorter.
And if \d didn't match enough characters, it would be longer.

So I believe these generator tests correctly check positive and negative cases!
I still think that the tests for the plus quantifier are redundant, but one could argue that it's just an extra sanity check.

It seems to me like the reason for the differences between the two sets of tests is due to this commit, optimizing the tests by removing this str.replace comparison.

I see two ways forward:

We could simply revert the optimization commit. The tests would be then agree with the generator. But probably this is too slow.
I could modify the generator to produce the tests that I submitted in this PR. These tests do not do any long string comparison nor use str.replace, so maybe this is fast enough?

What do you think?

mathiasbynens · 2024-08-21T10:58:50Z

I could modify the generator to produce the tests that I submitted in this PR. These tests do not do any long string comparison nor use str.replace, so maybe this is fast enough?

IMHO this is the way to go.

Aurele-Barriere · 2024-09-10T16:45:19Z

Hello,
I've created a pull request here: bocoup/test262-regexp-generator#2

To generate the correct strings, I've used the regenerate.js file from this repo: https://github.com/mathiasbynens/unicode-property-escapes-tests/

I've also added a configuration with the v flag.

Best,
Aurèle

Aurele-Barriere requested a review from a team as a code owner August 14, 2024 12:52

Ms2ger added the needs review label Aug 14, 2024

Aurele-Barriere mentioned this pull request Sep 10, 2024

Fix Regex Character Class Escape Tests bocoup/test262-regexp-generator#2

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Regex Character Class Escape Tests #4195

Fix Regex Character Class Escape Tests #4195

Aurele-Barriere commented Aug 14, 2024

ptomato commented Aug 14, 2024

mathiasbynens commented Aug 15, 2024 •

edited

Loading

Aurele-Barriere commented Aug 15, 2024

leobalter commented Aug 16, 2024

Aurele-Barriere commented Aug 21, 2024

mathiasbynens commented Aug 21, 2024

Aurele-Barriere commented Sep 10, 2024

Fix Regex Character Class Escape Tests #4195

Are you sure you want to change the base?

Fix Regex Character Class Escape Tests #4195

Conversation

Aurele-Barriere commented Aug 14, 2024

ptomato commented Aug 14, 2024

mathiasbynens commented Aug 15, 2024 • edited Loading

Aurele-Barriere commented Aug 15, 2024

leobalter commented Aug 16, 2024

Aurele-Barriere commented Aug 21, 2024

mathiasbynens commented Aug 21, 2024

Aurele-Barriere commented Sep 10, 2024

mathiasbynens commented Aug 15, 2024 •

edited

Loading