-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Regex
engine on PCRE2
#12840
Implement Regex
engine on PCRE2
#12840
Conversation
I just gave it a shot locally on my app. We have a chat processor that works like a middleware stack. When you send a chat, each handler in the stack does some sort of regex to check what you're sending. The stack checks memory and timing with All the regex we use works with pcre2! The interesting thing is when I run this branch against my app, it increases memory and runs slower. I tried to make a separate repo to reproduce, but of course, in that sample, the memory is actually reduces by half and timing is way faster 😂
Guess I need to figure out what I'm doing wrong in my app 😅 |
oh, so not sure if this matters or not, but I wanted to benchmark the 3 different versions together. I built my sample code (with release) using 1.6.2, this branch with force_pcre, and again with pcre2. # bench.cr
require "benchmark"
Benchmark.ips do |x|
x.report("current") do
`./regex`
end
x.report("pcre1") do
`./regex1`
end
x.report("pcre2") do
`./regex2`
end
end
This actually seems closer to what I'm seeing in my app. I ran it twice just to be sure, but it seems this is lining up more with what I'm seeing in my app. Here's a link to the code https://gist.github.com/jwoertink/2af47a1450a071ad4934e691259ea5e5
Maybe someone else has a better way to benchmark this? |
You should deploy that separate repo to production then ;-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm approving it in the hope that more testing and the follow up with JIT compilation will give us finer numbers on its performance. In the worst case scenario, we revert the bit about PCRE2 being the default for 1.7.
|
||
str = File.read(datapath("large_single_line_string.txt")) | ||
str.matches?(/^(?:[A-Za-z0-9+\/]{4})*(?:[A-Za-z0-9+\/]{2}==|[A-Za-z0-9+\/]{3}=)?$/).should be_false | ||
str.matches?(/^(?:[A-Za-z0-9+\/]{4})*(?:[A-Za-z0-9+\/]{2}==|[A-Za-z0-9+\/]{3}=)?$/) | ||
# We don't care whether this actually matches or not, it's just to make |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A little problem is that the name of the test is wrong. It matches a large single line string
, but it was not expected to match. A better name could be doesn't crash with a...
This appears to have broken a few std specs on FreeBSD 13.1-STABLE, pcre2-10.40 (please ignore UDP multicast failure, it's unrelated):
|
In fact this is also failing in |
@asterite Yes, it's merged. But this branch was green before merging, and master is also (mostly) green after merging (https://github.com/crystal-lang/crystal/actions/runs/3712561573). The failing spec is an unrelated interpreter test which appears to be failing every now and then. |
@dmgk Thanks for reporting. It's striking that no spec for regex itself is failing. The failures seem to be related to string scanner (which seems to be used way too much!). |
Is CI really using PCRE2? Is there any indication the Also I got even more failures on my M2, even for
It appears that this private def match_impl(str, byte_index, options)
match_data = match_data(str, byte_index, options) || return
ovector = LibPCRE2.get_ovector_pointer(match_data)
ovector_count = LibPCRE2.get_ovector_count(match_data)
LibPCRE2.match_data_free(match_data) # clears `ovector`
::Regex::MatchData.new(self, @re, str, byte_index, ovector, ovector_count.to_i32 - 1)
end By the way I created #12847 for Windows support, where |
@straight-shoota Can you try checking out
It's failing for me. Also all of the failing specs mentioned in #12840 (comment) are failing for me on |
Yeah right, CI isn't using PCRE2 yet. I think I had tested the entire stdlib spec suit locally, though. |
This reverts commit a5373ad.
This was reverted in #12850 due to bugs and missing tests. I'll re-issue a patched version shortly. |
This is an initial implementation of a
Regex
engine backend based on PCRE2 (see #12790).All existing regex specs succeed with PCRE2 🎉
This is an MVP and does not include JIT compilation (I'm planning a follow-up).
It also does not integrate specific tests for both library versions into CI (another followup).
The engine version is automatically selected at compile time.
libpcre2
takes preference and is used when available viapkg-config
.I think this makes sense to see PCRE2 used (when available) without requiring any additional configuration. This should help gather some feedback on usage problems. We can revisit if we want to prioritize the original
libpcre
for the next release.Library selection can be forced with the the compiler flags
-Dforce_pcre
and-Dforce_pcre2
.If you have Crystal software that makes heavy use of regular expressions, please give this patch a try in order to identify potential issues in practical use cases. Performance comparisons are also welcome.
libpcre2-8
(this is also the package name on some systems).pkg-config
needs the.pc
configuration which may require installing the devel package oflibpcre2
.