You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a feature request to lift the default length limit of 64k bytes.
PCRE library is used in many complex applications like to parse language grammars and the requirements naturally grow every year.
We are at year 2024 and the PCRE library itself should not impose too much unneeded limitation. Especially compile time limitation.
I propose to lift the compile time limit by raising the LINK_SIZE option to 4 by default. At least of 64-bit platforms.
The matching time/performance should not increase much.
If there should be a limit, it should be runtime configureable like other limits using pcre2_set_*_limit() function with default like 10M (UPDATE: implemented in 05aafb2).
I am not at all keen on making such a change, for a number of reasons:
It's a serious incompatible change that may affect existing users. I don't like doing that.
There are very many PCRE2 users for whom 64K is more than enough. I'm thinking of, for example, email routing (for which I originally wrote PCRE), text editing, and similar applications where the patterns are generally quite short.
I know (because there have been issues in the past) that there are applications using PCRE2 that run very large numbers of separate threads, each of which has very limited memory. Such a change may well impact them.
There is of course a performance cost, though it would depend very much on the pattern details.
If I were starting again (knowing what I now know) I would certainly do things differently, probably always compiling into 32-bit units, but hindsight is always 20/20.
It seems to me that the issue here is not LINK_SIZE, but rather that we compile (a|b){1,2000} in such a space-inefficient way.
The default of 64K for compiled patterns is reasonable (although higher might be ideal), since patterns are typically small, and we can handle matching against arbitrarily-long subject strings.
We just need some kind of limit to how much bloat is produced for patterns with fixed-maximum repetitions. Doing a handful of duplications of the pattern is OK; doing 1000 is not.
This is a feature request to lift the default length limit of 64k bytes.
PCRE library is used in many complex applications like to parse language grammars and the requirements naturally grow every year.
We are at year 2024 and the PCRE library itself should not impose too much unneeded limitation. Especially compile time limitation.
I propose to lift the compile time limit by raising the
LINK_SIZE
option to4
by default. At least of 64-bit platforms.The matching time/performance should not increase much.
If there should be a limit, it should be runtime configureable like other limits using
pcre2_set_*_limit()
function with default like 10M (UPDATE: implemented in 05aafb2).related past issues: #119 #271 JuliaText/TextAnalysis.jl#258
The text was updated successfully, but these errors were encountered: