-
Notifications
You must be signed in to change notification settings - Fork 529
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recursion limits hit more often in PHP 7.3+ #333
Comments
I think this fell off my radar and I forgot to reply. Sorry. I think your analysis is quite good. Throwing seems like a good idea. I'd suggest we do this by adding a private Beside that, the solution is to fix the problematic regex so it does not recurse that much. At a quick glance, I find this line a bit suspect:
Maybe we could write it to be more strict and not depend on the surrounding expression as:
There might be other, but this one sounds like it could be the culprit. I'll have to investigate a bit more. Thank you for the detailed report. |
Interestingly I can no longer reproduce the issue if I enable JIT:
Compare with:
|
I found that converter also breaks down on long unquoted attributes such as Quoting the attribute such as in |
With the major release of PHP 7.3 there were some big changes under the hood with the regular expression engine, going from PCRE to PCRE2. While it is not explicitly called out in the changes, I noticed that this new PCRE2 engine is a lot more susceptible to the
PREG_RECURSION_LIMIT_ERROR
error when usingMarkdownExtra
. Let's dig in to the details.pcre.recursion_limit
is the INI setting that controls:In PHP 7.2 and below, we had actually decreased the 100,000 default to 10,000 for performance reasons (to stop out-of-control user content from slowing down PHP). But in PHP 7.3, having this 10,000 number now fails on pieces of content that did not fail before. This led me to think that PHP 7.3 is more susceptible to this.
What happens when the recursion limit is hit?
Simple, the
preg_split
call returnsfalse
, which this library doesn't expect, so the code fails at a later line due to unexpected data type (false
is not Countable, also a fairly recent & strict PHP error).Example Content
But what specifically is in the content that causes us to hit this limit? It turns out that it has to do with dangling
<
in the markdown -- for example,<3
for a heart symbol, or doing a math comparison like4 < 8
or really any other<
that is not part of an HTML tag. That alone isn't a problem, you also need to have "a lot" of content after the<
. I tested truncating the markdown to various lengths and it wouldn't fail until it was sufficiently long enough. For the INI setting of 10,000 this failed at a document length of around 5k chars where the<3
was near the top. In fact I'll attach the file here testpost.txt. to test, run with:If I understand the code correctly, it seems that it is trying to find HTML tags by looking for the starting
<
and then trying to find the ending>
. But what if there never is an ending>
? That seems to be our problem. I noticed that if I preprocessed the file to replace<3
with<3
, the problem went away and we did not hit the recursion limit error.Solutions
So what can we do here? For starters as a bare minimum, I'd like
MarkdownExtra
to detect regular expression errors such as this, and throw them as exceptions, so at least we can know what went wrong instead of the code failing at some arbitrary spot. I have an example prototype here demonstrating this which I could open as a PR. I noticed that this library does not throw exceptions at all, so this would be a first; I'm not sure if there is a more preferred way of handling this kind of error, such as maybe returning a blank string, or maybe returning the original text unmodified.In addition to a change like that (or instead of), I could simply just increase the recursion limit back to PHP's default of 100,000 and observe the performance, as PHP has gotten a lot faster since we originally lowered the setting.
However, I also wonder if some algorithm improvement could be made in Markdown to be able to detect dangling
<
such as these which are clearly not HTML tags, so it doesn't end up exhausting the recursion depth. I think even with the default of 100,000 people will start running into this issue more with PHP 7.3+ for especially large files.Would love your thoughts on this! Thanks.
The text was updated successfully, but these errors were encountered: