-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve hc_matchfinder_longest_match() performance on Apple Silicon #284
Comments
At level 9 the bottleneck is usually the inner loop in |
It doesn't really make sense to talk about just the |
I ran a test and can can confirm that having |
I was comparing performance on two MacBooks and was surprised at some of the results.
I used the included
benchmark
program to compare the following hardware:2015 MacBook Pro: 2.2 GHz Quad-Core Intel Core i7
2021 MacBook Pro: 8-core Apple M1 Pro
At level 9, M1 Pro is only 3% faster than a 6-year older Intel!
I tried profiling it with xctrace and as best I can tell the performance hit comes from
load_u32_unaligned
(I can attach the trace output if that would be helpful). I can confirm thatUNALIGNED_ACCESS_IS_FAST
is set, but beyond that I haven't been able to work out why there's an issue.Do you have any ideas, or is it really the case that the Intel hardware is simply better at this?
The text was updated successfully, but these errors were encountered: