Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
i/prompting: implement path pattern precedence (canonical#13868)
* i/prompting: implement path pattern precedence When a path matches multiple patterns, the most specific of those patterns must be selected so that the associated rule can take precedence. When comparing the precedence of path patterns, those patterns cannot contain groups, since it is necessary to know which variants of the pattern were matched against the path in question. Thus, path pattern precedence must only be computed between patterns which have already been rendered. In general, at the first character where two patterns differ, the one which is more restrictive has precedence (i.e. '/' over '?', '?' over '*'). Less restrictive patterns are repeatedly pruned until a single most restrictive pattern remains. That pattern has precedence. One property which emerges from this precedence system is that the pattern which most specifically restricts the location of matching paths takes precedence. For example, the following patterns all match `/foo/bar/baz/myfile.txt` (though this is not an exhaustive list) and are sorted from highest to lowest precedence: 1. /foo/bar/baz/myfile.txt 2. /foo/bar/baz/m?file.* 3. /foo/bar/baz/m*file.txt 4. /foo/bar/baz/m*file* 5. /foo/bar/baz/* 6. /foo/bar/*/myfile.txt 7. /foo/bar/*/myfile* 8. /foo/bar/*/* 9. /foo/bar/**/baz/myfile.txt 10. /foo/bar/**/myfile.txt 11. /foo/bar/**/baz/*.txt 12. /foo/bar/**/baz/* 13. /foo/bar/** 14. /foo/b?r/baz/myfile.txt 15. /foo/?*/baz/myfile.txt 16. /foo/?*/** 17. /foo/*?/baz/myfile.txt 18. /foo/*/baz/myfile.txt 19. /foo/*/baz/* 20. /foo/*/*/* 21. /foo/*/**/baz/myfile.txt 22. /foo/**/baz/myfile.txt 23. /foo/**/baz/** 24. /**/myfile.txt 25. /** Signed-off-by: Oliver Calder <[email protected]> i/prompting: add tests for precedence between example patterns Signed-off-by: Oliver Calder <[email protected]> * i/prompting/patterns: add pattern variant type for precedence comparison After a path pattern is rendered into variants, parse each variant into its constituent components so that precedence between any variants can be decided based on the order and occurrence of those components. For example, `/foo/*/bar/b?z/**` has precedence over `/foo/*/b?r/baz/**`. There are several types of scenarios which make it difficult to compute precedence in all cases using a simple mechanism: - `/foo/` > `/foo` > `/foo/**` - `/foo/**/bar` > `/foo/**/` > `/foo/**` - `/foo/bar/` > `/foo/b*r/` - `/foo/**/x/b*r` > `/foo/**/bar/` - `/foo/bar/**` > `/foo/b?r/**/baz` > `/foo/b*r/baz` The current implementation fails to address all these scenarios, namely because it relies on the length of literals (to catch `/foo/bar/` vs `/foo/b*r/`, for example), but this falls apart for literals which occur after a previous `/**/`, since a shorter literal may match an earlier component of a hypothetical path. Additionally, we cannot always compute precedence between two variants: Both `/foo/**/bar/**` and `/foo/**/baz/**` match `/foo/bar/baz`, but nothing about the order or length of components can tell us which should have precedence. In this case, we would likely say the former has precedence, since `bar` matches earlier in the path, but if the path were `/foo/baz/bar`, we would likely say the latter has precedence. Thus, we must use the path against which the matches were made when computing precedence between pattern variants. Signed-off-by: Oliver Calder <[email protected]> * i/prompting/patterns: compute precedence based on matching path With knowledge of the path which matched every pattern variant we're comparing, we can compute which of those variants has the highest precedence. For example, the precedence of the following pattern variants cannot be computed a priori: - /foo/**/fizz/** - /foo/**/buzz/** However, with the matching path "/foo/bar/fizz/buzz", we can say that "/foo/**/fizz/**" has precedence, since "fizz" matches earlier in the path than "buzz". If instead the path were "/foo/bar/buzz/fizz", then "/foo/**/buzz/**" would have precedence, for the same reason. To compute this, we add a regular expression snippet for every component, concatenate and compile them into a single regex for the pattern variant, and match it against the path when computing precedence, keeping track of submatches. As we iterate through the components of the pattern variant, we also iterate through the submatches, and we are thus able to tell more about how many bytes each component matched. For variable-length component types (e.g. globstars and doublestars), matching fewer bytes yields higher precedence. Conversely, for literals, matching more bytes yields higher precedence, as this implies that the literal itself is longer and thus more specific, as literals may be followed by variable-length components. This also helps reliable compute less-pathological cases, such as the following: - /foo/**/myfile*/** - /foo/**/bar/myfile.txt We may be able to tell that the latter has precedence, but a deterministic algorithm to compute this is difficult to write and reason about. By including a matching path, such as "/foo/bar/myfile.txt", we are able to see that the "/**/" in the former pattern matches more characters than that of the latter, and thus is a less precise match, and has lower precedence. Signed-off-by: Oliver Calder <[email protected]> * i/prompting/patterns: use component submatch instead of storing length Signed-off-by: Oliver Calder <[email protected]> * i/prompting/patterns: fix bugs around handling doublestars and escaped runes Signed-off-by: Oliver Calder <[email protected]> * i/prompting/patterns: stack allocate pattern variants and improve documentation Signed-off-by: Oliver Calder <[email protected]> * i/prompting/patterns: add test case and improve robustness Signed-off-by: Oliver Calder <[email protected]> * i/prompting/patterns: fix path pattern json marshalling Signed-off-by: Oliver Calder <[email protected]> * i/prompting/patterns: improve documentation and unexport parsePatternVariant Signed-off-by: Oliver Calder <[email protected]> * i/prompting/patterns: add documentation and test cases Signed-off-by: Oliver Calder <[email protected]> --------- Signed-off-by: Oliver Calder <[email protected]>
- Loading branch information