Spicy regular expressions are far slower than binpac #1990
evantypanski
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
This is a relatively vague issue. I'm putting it in Spicy issues (not justrx) for visibility.
I think the "complete" condition for closing this issue would be a significant speedup for given test cases, or justification for why that will not happen. With a strong bias towards the former.
Looking at Spicy performance compared to binpac, I found that regular expressions are far slower than binpac's. I haven't looked much into why - this issue is for the "why" and "what to do about it"
For example, this Spicy unit takes 10 seconds to parse on my machine given the programmatically generated input:
The equivalent on binpac with a similar host application for timing takes about 0.3 seconds (the following is the binpac code):
This is a pretty big difference and one that, as a user, I would want explanation for. Capturing accounts for some of this; changing the regular expression unit to this lowers it from 10 seconds to about 3 seconds:
My guess is most use cases would want to capture large output like that, as it's likely some sort of data and not just resyncing and whatnot.
Aside: Character classes
The following unit takes differing amount of time depending on the input - All Z's makes it take 13 seconds, all A's makes it take 10 seconds (roughly):
Optimizing this case could reduce that discrepancy, but I don't believe it's the top priority here compared to the simple cases. Looking through regexes in parsers I don't see this kind of thing happening a lot, where a character class can't be reduced to a simple range that will get this worst-case parse time.
Beta Was this translation helpful? Give feedback.
All reactions