-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Fix [RegExp|String].prototype methods and Regex matcher #6615
base: master
Are you sure you want to change the base?
Conversation
I'm not keen on killing all the fast paths - I suggested you'd need to check them but hopefully we can update conditions so they're only used when they won't cause a problem rather than just deleting them. |
…into regex-fixes
lib/Runtime/Library/RegexHelper.cpp
Outdated
@@ -2412,7 +2429,7 @@ namespace Js | |||
{ | |||
if (string->GetLength() > (0 > index || (uint64_t)index + (uint64_t)1 > UINT32_MAX ? UINT32_MAX : (uint32_t)(index + 1)) && | |||
NumberUtilities::IsSurrogateLowerPart(string->GetString()[index]) && | |||
NumberUtilities::IsSurrogateUpperPart(string->GetString()[index + 1])) | |||
NumberUtilities::IsSurrogateUpperPart(string->GetString()[index + 1]) && isUnicode) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: ideally should do the isUnicode check first - it's much quicker, than the IsSurrogateLowerPart and IsSuurrogateUpperPart checks; if we do it first can skip the slower bits when it's false.
Field(codepoint_t*) m_codePointString; // Code points of the string, may be nullptr if not initialized yet by GetCodePoints call | ||
Field(charcount_t) m_codePointsLength; // Length of the codepoints, may be k_InvalidCharCount if not initialized yet by GetCodePoints call |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will add two pointers to every single JS string - which isn't ideal - could we instead subclass JavascriptString with a new type like JavascriptStringWithCodePoints or something?
(Note I think this change is causing the compile failures as there is a static assert based on the size of a different subclass of JavascriptString)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I will need to create a new instance of JavascriptStringWithCodePoints on each SimpleMatch call if pattern shows us that it's unicode one.
Honestly I cannot imagine how i will cache the codepoint strings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right this is not an ideal solution - ugh - I didn't think this through did I. I think cache'ing is a no go.
Either a) we massively increase the size of all our strings OR b) we come up with a way to convert the type whenever we need to cache - which would not be simple as there could be many existing pointers to the string.
This PR will fix spec deviations in:
Also: