Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance Optimisation for String Literal Matching #32

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

AntonLydike
Copy link
Owner

This PR adds a performance optimisation that skips compiling a regex for string literal matching.

Motivation:

Issue #25 points out, that we are about ~6x slower than filecheck 0.24 in the worst case, and about 3.3x on average.

We are also about 34x slower than LLVMs filecheck, but we can't get that down too far, due to pythons limitations. FileCheck is usually done before CPython finished loading the runtime.

Approach:

After some digging in traces (thanks to viztracer), I found that we spend a lot of time compiling regexes, even when they are just for fancy string literals (most of them are of the form test\s+string\s..., which is regular enough to special case. This time is dominating everything else by a huge margin:

image

The regex compile is about 135us of 156us total time spent, so about 85%. We then spend ~.8us on average in the actual matching logic. I was wondering how "slow" a non-regex implementation would compare.

I added logic in the existing check compiler that detects if the check is only made up of string literals, and returns a new LiteralMatcher that duck types re.Pattern for all cases that mater for our implementation. As it turns out that is just find and match.

Sadly we can't just replace re.search by string.find in all cases, as we need to handle white-space normalisation, which bloats the below code a bit. Otherwise it's quite readable though.

LiteralMatcher returns a special duck-typed version of re.Match called LiteralMatch that only has a single group. This is all that's needed for this little hack, and the other code can be left unmodified, thanks to the power of duck typing (and modifying some type hints).

Results:

The optimisation gets an average speedup in our benchmarks of 1.6x, making the new implementation only about 2.1x slower on average. This understates the effect though, as this optimisation manages to really cut down the longest benchmark (4.7k lines of CHECK-NEXT statements) times by more than 3x.

See the below chart for overall results:

image

The new trace shows us that we have indeed removed a bottleneck:

image

The new timing shows that compilation time is down to 5.5us on average, but the matching has grown to ~14.7us on average. Still the average CHECK-DAG statement now is down to 21.6us, so a reduction of 7x.

@AntonLydike AntonLydike added the perf Performance Issues label Jul 30, 2024
@AntonLydike AntonLydike self-assigned this Jul 30, 2024
@AntonLydike AntonLydike changed the title Implement performance optimisation for string literal matching Performance Optimisation for String Literal Matching Jul 30, 2024
@AntonLydike
Copy link
Owner Author

AntonLydike commented Jul 30, 2024

I will benchmark this tonight against some real-world workloads and report numbers.

Edit: Don't have tine tonight :/

@AntonLydike
Copy link
Owner Author

It's hard to tell if this actually speeds anything up in practice, MLIRs benchmark suite goes from a geomean of 12.66 between 6 runs, to 12.49, which is almost within error. xDSLs tests increase in duration from 8.06 to 8.25, but these runs have even bigger error bars.

Further digging needed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
perf Performance Issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants