-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor char/string and byte search #54667
Open
jakobnissen
wants to merge
13
commits into
JuliaLang:master
Choose a base branch
from
jakobnissen:find_refactor
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
+195
−97
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
jakobnissen
added
strings
"Strings!"
search & find
The find* family of functions
performance
Must go faster
labels
Jun 4, 2024
jakobnissen
force-pushed
the
find_refactor
branch
from
September 11, 2024 06:46
df9c1d8
to
ba4e410
Compare
jakobnissen
added
the
status: waiting for PR reviewer
PR is complete and seems ready to merge. Has tests and news/compat if needed. CI failures unrelated.
label
Sep 12, 2024
This is good to go now. Test failures are unrelated. |
KristofferC
reviewed
Sep 12, 2024
jakobnissen
changed the title
WIP: Refactor char/string and byte search
Refactor char/string and byte search
Sep 12, 2024
jakobnissen
force-pushed
the
find_refactor
branch
from
November 18, 2024 07:26
0b08a60
to
c1f87e0
Compare
Bump. CI failures are unrelated. |
In text, the first UTF8 bytes of characters are typically more repetitive than the last byte. For example, most Greek characters start with 0xce or 0xcf. By searching for the more unique last byte, more time is spent in the memchr fast path. This gives a significant speedup.
It's more Julian to return nothing directly from the search function.
Many of these are identical to the generic fallback
This has two advantages: First, it consolidates the implementation of findnext and findall. Second, it allows a hypothetical lazy findall iterator to be trivially implemented later.
The search functions are a basic building block of the other functions, and may e.g. be called in a loop. It's wasteful to check bounds in these, as they are often called when we know for sure we are inbounds. Move the boundscheck closer to the top-level calls. This should slightly improve efficiency.
Take fast path not in every iteration, but just once, outside the loop.
jakobnissen
force-pushed
the
find_refactor
branch
from
November 27, 2024 11:49
ec0ef94
to
39a6449
Compare
Bump |
@nanosoldier |
@nanosoldier |
The perf results here look great! We really should try to get this reviewed and merged ASAP |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
performance
Must go faster
search & find
The find* family of functions
status: waiting for PR reviewer
PR is complete and seems ready to merge. Has tests and news/compat if needed. CI failures unrelated.
strings
"Strings!"
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a refactoring of
base/string/search.jl
. It is purely internal, and comes with no changes in behaviour. It's based on #54593 and #54579, so those needs to get merged first, then this PR will be rebased onto master.Included changes are:
_search
and_rsearch
functions to the outer top-level functions that call them. This is because the former may be called in a loop where repeated boundschecking is needless. This should speed up search a bit.findall
andfindnext
to share implementation, and will also make it trivially easy to implement a lazy findall in the future (see Implement lazyfindall
(Iterators.findall
, perhaps?) #43737)IMO there is still more work to be done on this file, but this requires a decision to be made on #43737, #54581 or #54584
Benchmarks
1.11.0-beta2:
This PR: