-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor!: Rules are loaded into a radix trie instead of a list #1358
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1358 +/- ##
==========================================
+ Coverage 89.28% 89.69% +0.40%
==========================================
Files 270 270
Lines 8870 9051 +181
==========================================
+ Hits 7920 8118 +198
+ Misses 703 691 -12
+ Partials 247 242 -5 ☔ View full report in Codecov by Sentry. |
…of affected providers
dadrus
changed the title
wip: Radix trie for rule management and fast lookup
feat: Radix trie for rule management and fast lookup
Apr 29, 2024
dadrus
changed the title
feat: Radix trie for rule management and fast lookup
refactor!: Rules are loaded into a radix trie instead of a list
Apr 29, 2024
3 tasks
6 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Related issue(s)
closes #652
closes #661
closes #1037
closes #1038
Checklist
Background
The current implementation for rule management and matching is pretty simple. It is based on a list of precompiled glob or regex expressions. There are however multiple drawbacks/challenges:
<*>
glob expression matches all characters except/
and.
, which are separators in paths and hosts respectively. Used in paths, that expression is quite useless, as it captures path segments containing dots in an unexpected way (see Make single star globs more context-aware and useful #1037)Description
This PR cleans up with the above said challenges and replaces the list based rule management implementation as well as the glob/regex based lookup implementation with a new implementation based on radix tree. That
.
for host related expressions and/
for path related ones as separatorsThe following sections describe the corresponding user facing changes in detail.
Rule Matching Configuration
The old matching related configuration was
The new matching related configuration, introduced by this PR looks as follows:
With
path
defining the primaryexpression
to match the rulewith
defining additional conditions, required to be met to have the rule indeed matched. These are:scheme
- allowing definition of the HTTP scheme to match. If not set, both "http" and "https" schemes are acceptedmethods
- defining the list of allowed, respectively to be matched HTTP methods.host_glob
andhost_regex
allowing additional expressions to validate the host value against. The two are mutually exclusive. If not set, any host is accepted.path_glob
andpath_regex
are mutually exclusive as well and allow to further nail down the URL path of the given request after it is matched.backtracking_enabled
configures the backtracking behavior if the additional conditions fail. If enabled, the lookup in the radix tree will traverse back to a less specific path expression and potentially match a less specific rule.There may be multiple rules with the same
path
expression, but different additional conditions, like e.g. required methods.Here an example:
The configuration of the
default_rule
has been extendedFor security reasons backtracking is disabled by default. It can be enabled globally on the default rule level, and also enabled or disabled on the level of a particular upstream rule.
Expressions
The previous section talks about glob and regex expressions, which can be used to further nail down the
<host and port>
, as well as thepath
expression. Indeed all of these expression type were already shown in the example above. Latter, the path expression, allows usage of wildcards while specifying the path segments.There are two types of wildcards available:
*
and:
Both can be named and unnamed. Named wildcards allow accessing of the matched segments in the pipeline of the rule using the defined name as a key. Unnamed free wildcard is defined as
**
and unnamed single wildcard is defined as:*
. A named wildcard uses some identifier instead of the*
, so like*name
for free wildcard and:name
for single wildcard.The value of the path segment, respectively path segments available via the wildcard name is decoded. E.g. if you define the to be matched path in a rule as
/file/:name
, and the actual path of the request is/file/%5Bid%5D
, you'll get[id]
when accessing the captured path segment via thename
key.There are some simple rules, which must be followed while using wildcards:
:
or*
to define a wildcard:
or*
, but should not be considered as a wildcard, it must be escaped with\
.Here some path examples:
/apples/and/bananas
- Matches exactly the given path/apples/and/:something
- Matches/apples/and/bananas
,/apples/and/oranges
and alike, but not/apples/and/bananas/andmore
or/apples/or/bananas
. Since a named single wildcard is used, the actual value of the path segment matched by:something
can be accessed in the rule pipeline usingsomething
as a key./apples/and/some:thing
- Matches exactly/apples/and/some:thing
/apples/and/some**
- Matches exactly/apples/and/some**
/apples/:junction/:something
- Similar to above. But will also match/apples/or/bananas
in addition to/apples/and/bananas
and/apples/and/oranges
./apples/**
- Matches any path starting with/apples/
/apples/*remainingpath
- Same as above, but uses a named free wildcard/apples/**/bananas
- Is invalid, as there is a path segment after a free wildcard/apples/\*remainingpath
- Matches exactly/apples/*remainingpath
How to migrate match expressions of old rules to new ones
url: http://127.0.0.1:9090/foo/<**>
, you can replace it withurl: http://127.0.0.1:9090/<{,**.css,**.js,**.ico}>
, you need two rules now, one to match/
and one to match the resources:url: http://<**>/profile/api
, you can replace it withRule Matching Specificity & Backtracking
As written above, before this PR, the only way to ensure the more specific rules are matched before the more general ones, is placing the rules in one rule set and order the rules with more specific being placed before the more general ones.
This PR makes that requirement obsolete. The implementation ensures, that more specific path expressions are matched first regardless of the placement of rules in a rule set. Indeed the more specific rules are matched first even the corresponding rules are defined in different rule sets. This PR does also introduce optional backtracking for rule matching, which extends the existing capabilities related to defaults.
The following example demonstrates the aspects described above.
Imagine, there are the following three rules
rule 1
rule 2
rule 3
The request to
/files/team1/document.pdf
will be matched by the rule with idrule2
as it is more specific to rule 1. So the pipeline of rule 2 will be executed.The request to
/files/team3/document.pdf
will be matched by the rule 3 as it is more specific than rule 1 and 2. Again the corresponding pipeline will be executed.However, even the request to
/files/team4/document.pdf
will be matched by rule 2, the regular expression^/files/(team1|team2)/.*
will fail. Since backtracking is enabled, backtracking will start and the request will be matched by the rule 1.This allows not only providing additional fall backs, respectively defaults, but also further reduction, as well as simplification of rules. Here an additional example:
Imagine, you have a pretty complex rule, which covers read and write access to the resource
You can now split it into two rules:
which is much easier to digest about and also test. Rules for the same path expressions must come from the same rule set though. That way, the rule settings cannot be overwritten maliciously by another rule.
Since multiple rules with the same path expression might be present in a rule set, multiple rules could be matched based on their additional conditions definitions. Here an example:
Such conflicting configurations cannot be avoided while loading a rule set and there might be valid reasons to have different rules with more specific additional conditions for the same path expression as well. For that reason, heimdall will use the first matching rule when the incoming request is matched by multiple rules.
Path Segments Encoding
Unlike previously, the rules are now matched by traversing a radix tree. That means, rule specific settings cannot be taken into account as long as a rule is not found in the tree.
That also means,
/foo/bar
, it will never match/foo%2Fbar
(%2F
is an encoded slash) and vice versa, and/foo/[id]
, it will never match, as path segments are typically encoded and the actual path looks like/foo/%5Bid%5D
.With other words, you must specify the expected path as it comes over the wire as long as you're not using wildcards.
Beyond that the semantic of
allow_encoded_slashes
(introduced in #1071) has not been changed.Access to Matched Values
As written above, the usage of named wildcards enables access to matched values in rule pipelines. The corresponding key value pairs are available in the
.Request.URL.Captures
object. Here is an example (similar to the one described in #1038 as suggestion for the new API):Breaking Changes Introduced by this PR
match
property is completely different (See the section "Rule Matching Configuration" above).method
property has been moved into the redesignedmatch
object (there under thewith
property) and is optional. The configured HTTP verbs are therefore used to match the rule and not after the rule has been matched, allowing for definition of different rules for the same path.method
property any more. That means, heimdall will never respond with405 Method Not Allowed
any more.405 Method Not Allowed
is not returned by heimdall any more, there is no way to overwrite the corresponding response code to something else. So, support forrespond.with.method_error
in the configuration of the decision and proxy services has been dropped.rule_path_match_prefix
on endpoint configurations forhttp_endpoint
andcloud_blob
providers has been dropped. Same functionality is now given by allowing rules for the same path expressions to come from the same rule set only.400 Bad Request
.BEGIN_COMMIT_OVERRIDE
perf: O(log(n)) time complexity for lookup of rules (#1358)
feat: Support for free and single (named) wildcards for request path matching and access of the captured values from the pipeline (#1358)
feat: Support for backtracking while matching rules (#1358)
feat: Multiple rules can be defined for the same path, e.g. to have separate rules for read and write requests (#1358)
feat: Glob expressions are context aware and use
.
for host related expressions and/
for path related ones as separators (#1358)refactor!: Rule matching configuration API redesigned (#1358)
refactor!: Default rule rejects requests with encoded slashes in the path of the URL with
400 Bad Request
(#1358)refactor!: Support for
rule_path_match_prefix
on endpoint configurations forhttp_endpoint
andcloud_blob
providers has been dropped (#1358)END_COMMIT_OVERRIDE