You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The function compile_class_not_nested is very big and complicated. I reckon we can help it out:
Always have cranges. For the no-wide-char cases, we can just use a cranges object on the stack (to avoid malloc) and write into that. The wide-char cases would still need to malloc a cranges. But the advantage would be: all the code which follows can avoid having two separate paths.
We should put the bitmap into the struct cranges. Then, the struct cranges would be a complete description of the characters within the class (that is - all the non-property characters in the class), which would make it easier to reason about. "The characters matched by the class are identified by this bitmap, these cranges/clists, plus the Unicode properties."
We have code which is split out into separate places. For example, POSIX classes and escapes are handled twice - once when building the cranges, to add in the non-ASCII characters, and again after building the cranges, to add the ASCII characters to the bitmap.
I think we should do build the cranges from the METAs (ideally in one pass, although two passes would be acceptable); then one single pass over the cranges to build the output (without needing to back over the METAs - except possibly just to pick up Unicode property escapes which were completely ignored in the cranges).
Currently, the 16-bit library always, unconditionally, mallocs a cranges object even for super-simple character classes that just match ASCII characters! Using a small stack-alloced cranges as the starting point would be very nice, and we would only grow it to a heap-alloced one if the character class is too big to fit.
Items in the character class should handled in exactly one place: if the item is a character literal or POSIX class then it goes in the cranges, and shouldn't need to be handled elsewhere; if it's a Unicode escape then it's ignored in the cranges and is handled just once later on.
Additionally:
Consider removing XCL_END
The text was updated successfully, but these errors were encountered:
The function simplified a lot compared to its original form. It could cache the entire compiled class though. Then no processing during the lengthptr != case, just a memory copy.
The function compile_class_not_nested is very big and complicated. I reckon we can help it out:
I think we should do build the cranges from the METAs (ideally in one pass, although two passes would be acceptable); then one single pass over the cranges to build the output (without needing to back over the METAs - except possibly just to pick up Unicode property escapes which were completely ignored in the cranges).
Currently, the 16-bit library always, unconditionally, mallocs a cranges object even for super-simple character classes that just match ASCII characters! Using a small stack-alloced cranges as the starting point would be very nice, and we would only grow it to a heap-alloced one if the character class is too big to fit.
Items in the character class should handled in exactly one place: if the item is a character literal or POSIX class then it goes in the cranges, and shouldn't need to be handled elsewhere; if it's a Unicode escape then it's ignored in the cranges and is handled just once later on.
Additionally:
The text was updated successfully, but these errors were encountered: