-
-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: add functionality to compile globs #85
Comments
Sorry I never replied to this - I spent some time thinking about it, but never came back to it. I'm not overly convinced "compiling" would improve performance - in fact, it may have a negative impact and would certainly consume more memory. I see two places where "compiled" data structures typically exist: in interpreted languages where the data structure is turned into dynamically evaluated code, and pre-processing of regular expressions. Go can't do the former, and globs aren't as complicated as the latter, so I'm not sure they'd benefit from it. In terms of a "glob as a data structure", the only glob operation that is kind of expensive is alternatives (ie, Anyway, I digress... there's barely any "parsing" of globs at all. Globs are mostly evaluated character-by-character, with small sub-loops for character classes and alternatives. So a "compiled" version wouldn't be substantially different from an array of characters. Contrast this with regular expressions which have much more complicated language constructs - parsing is not so linear. Repetition specifiers come to mind, as that means the regular expression parser must always look ahead for repetition specifiers. There's also more state to keep track of: capture groups, back-references, non-capture groups that change settings (such as case-sensitivity), etc. Every time a regular expression needs to backtrack, there's a lot to re-parse. I dunno... I'd have to think about it some more - maybe I'll study some regex compilation to see if there are any ideas that would benefit globs... |
Hm, very interesting! I hadn't given too much thought to how this kind of parsing works and I can definitely see there's a big difference from the "Programs" that regexes are compiled into to the more or less linear "matching" that's needed for globs. I'll admit my personal interest in this feature is less from a performance perspective and more related to the API of the package itself. I recently added a setting in user configuration, that I first implemented as a regexp, then later as a glob using this package. Regexp: runatlantis/atlantis#5254 (don't feel the need to understand what the code above is doing, suffice it to say the user specified a string that is used to match against other strings later). We decided to go with glob because it fit the use case better, but one thing I did like about regexp was I was able to "compile" when loading the configuration and store the regex itself. Later in the "business logic" all we had to do was call Match() against it. For globs, we could Validate the input, but had to store the config as a So given that, ironically my naive implementation in #99 is actually sufficient for my needs, though calling what I wrote |
You could always write a little helper function in your code to call Match but ignore the "error", or panic as you have in your PR. |
Yeah that's true. The only downside would be that if there was a bug such that ValidatePattern and Match disagreed about what a valid pattern is, my code would stack trace, whereas the API of Pattern.Match() could be such that that could not occur. Your earlier comment has definitely piqued my interest about how this kind of parsing works, I might go ahead and see if I can implement this as purely an academic exercise for myself. If I can get it to be at least as performant of Match() I'll have you take a look, but I'm under no expectations that that means it will be accepted. There are other considerations like broadening the API surface area, as well as maintaining consistency between (what I assume will be at least in part) separate implementations that we'll have to weigh. I mostly think it would be a fun exercise :) |
Right. I will add that validation during Match is essentially "free", if there's any concern about performance. There's no pre-processing validation step - errors are detected while the pattern runs. The only exception to this is that if the pattern is longer than the string we're trying to match against, then validation runs on the remainder of the pattern so that we don't silently ignore an error that wasn't reached during execution. For example, a pattern of Also, hopefully all the tests catch bugs in Validate vs Match behavior, but I suppose there's no guarantee that every corner case has been tested =( |
Ah, right, that's the specific thing that Given that, that PR is actually a working solution to this problem, right?
From here it's a question of "is it enough to justify the additional API surface area", which I'll leave up to you :) |
It would be great to have functionality to compile globs similar to https://github.com/gobwas/glob (which is seemingly no longer actively maintained and has bugs causing issues)
This would have performance benefits as one would not be parsing the pattern each time
The text was updated successfully, but these errors were encountered: