-
I'm very new to both Rust and the DataFusion codebase, so apologies up front if I'm misinterpreting the code. If I understand it correctly regexes are currently compiled at least once per batch and sometimes cached per batch (e.g. in the If the regex argument is a string literal, it seems like it should in theory be feasible to optimize this by compiling the regex up front and reusing the compiled version for every invocation of the function. I spent some time today digging around the codebase trying to figure out how to implement this idea, but it wasn't immediately obvious to me if it's actually possible within the constraints of the current APIs. The best option I came up with was to instantiate a curried version of the scalar function in the Taking a step back, the more general question I had is if there is already some facility present in the codebase that supports performing some initial precomputation work per individual call site of a function as a way to improve performance. Does any of this make sense or am I looking at this problem the wrong way? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 8 replies
-
That's my understanding too.
would it work, if this is stored on a field of a ScalarUDFImpl instance?
The patter to use for these is to recognize |
Beta Was this translation helpful? Give feedback.
-
This is a good question. Others have asked about this too @pepijnve See - #11146 @zhuliquan seems to be actively working on it: FWIW it has never been a particular priority for me personally as I have never seen a performance profile where compiling the regex showed up (if a query has regex the vast majority of the time is in the evaluation of the regex, not the compialtion) |
Beta Was this translation helpful? Give feedback.
This is a good question. Others have asked about this too @pepijnve
See - #11146
@zhuliquan seems to be actively working on it:
FWIW it has never been a particular priority for me personally as I have never seen a performance profile where compiling the regex showed up (if a query has regex the vast majority of the time is in the evaluation of the regex, not the compialtion)