-
Notifications
You must be signed in to change notification settings - Fork 453
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[4189] Fix colliding filters by updating match method for MultiCharSequenceFilter to iterate through all patterns for the most complete match and not return on the first pattern found #4188
Conversation
…ilter to iterate through all patterns for the most complete match and not return on the first pattern found
src/metrics/filters/filter_test.go
Outdated
@@ -27,6 +27,19 @@ import ( | |||
"github.com/stretchr/testify/require" | |||
) | |||
|
|||
func TestPrefixCompositeR2Filter(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like a better place to add these test cases is in TestMultiCharSequenceFilter
in this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I also think the naming also isn't great. Is R2 naming only relevant internally to Uber?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nah, there's R2 references aplenty in OSS world.
It's not immediately clear that this is a bug without mentioning how it's used in the public boolean Maybe explicitly mention this in the description and how it would cause false negatives. |
src/metrics/filters/filter.go
Outdated
return val[len(pattern):], true | ||
if f.backwards { | ||
if bytes.HasSuffix(val, pattern) { | ||
if bestPattern == nil || len(pattern) > len(bestPattern) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: seems like the bestPattern == nil
isn't needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, updated.
src/metrics/filters/filter_test.go
Outdated
@@ -349,6 +349,11 @@ func TestMultiCharSequenceFilter(t *testing.T) { | |||
validateLookup(t, f, "12test", true, "12") | |||
validateLookup(t, f, "12test2", true, "12") | |||
validateLookup(t, f, "123book", true, "123") | |||
|
|||
f, err = newMultiCharSequenceFilter([]byte("arachne_failures,arachne_failures_by_rack"), false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a similar test case where backwards == true
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
…patterns and backwards enabled
src/metrics/filters/filter_test.go
Outdated
id1 := "arachne_failures_by_rack" | ||
f, err := newMultiCharSequenceFilter([]byte("arachne_failures,arachne_failures_by_rack"), false) | ||
require.NoError(t, err) | ||
_, matches1 := f.matches([]byte(id0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm understanding the requirement wrong: Should "arachne_failures" match "arachne_failures" or "arachne_failures_by_rack" ? If your expectation was to match with "arachne_failures" with "arachne_failures" then wouldn't "len(pattern) > len(bestPattern)" make it match "arachne_failures_by_rack" ?
Follow-ups:
- Could you also assert the returned matchIndex ?
- Maybe add similar bestMatch testcase for backwards (HasSuffix) case as well ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're looking at an older commit possibly.
"arachne_failures" should match with "arachne-failures" and "arachne_failures_by_rack" should match with "arachne_failures_by_rack" even if it also matches with other patterns in the filter since this pattern is the best match.
In the latest commit I have tests for the length of the value in the returned byte array and with backwards set to true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha! Yeah was looking at the older commit. Thanks for the update!
@saad-zaman before landing, can you run existing/make new benchmarks for this change too -- https://dave.cheney.net/2013/06/30/how-to-write-benchmarks-in-go is a good guide to get you going. |
@@ -125,6 +125,14 @@ func BenchmarkRangeFilterRangeMatchNegation(b *testing.B) { | |||
benchRangeFilterRange(b, []byte("!a-z"), []byte("p"), false) | |||
} | |||
|
|||
func BenchmarkMultiRangeFilterPrefixCollision(b *testing.B) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@prateek PTAL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please put the results of the benchmark before and after the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To Justin's point - please add a comment to the code with "After change results"
And in the PR, update description with both before/after results
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the PR description to add benchmark results and a table to help with comparison. Also updated the code to add a small optimization to stop iterating through patterns if an optimal match was found and updated tests to ensure it was working
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated table with a more legible benchmark test comparison
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add b.ReportAllocs()
to these benchmarks?
Having a >50% impact seems pretty steep for this change, but I'm not seeing any obvious ways to make this function faster. Perhaps this impact is just very high because we're purposely checking the scenario where these "colliding filters" are present. Can we see the before/after numbers for the vanilla case where there aren't any colliding filters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran it with b.ReportAllocs()
and there are 0 allocs with the old and new logic.
Updated the tests to show the change for all benchmark tests related to my code. There is a small difference (~0-15%) in test cases where the prefix/suffix collision does not occur likely due to the additional conditionals present after a prefix or suffix detection is found.
…ugh patterns if complete match is found
What this PR does / why we need it: Currently, the multiCharSequenceFilter
matches
function returns on the first pattern found here even if there are other patterns available with better matches. This results in a suboptimal byte[] val being returned which causes the publicmatches
function to return a false negative here. Thelen(val)
is not zero because a suboptimal val was returned when a better pattern could have been found if the multiCharSequenceFiltermatches
function iterated through all patterns available and looked for the best match.Fixes # #4189
Special notes for your reviewer: Please provide any go style improvements or tips you think are relevant
Does this PR introduce a user-facing and/or backwards incompatible change?:
Does this PR require updating code package or user-facing documentation?:
Benchmark Test Results Before and After Changes to
matches
method ofmultiCharSequenceFilter
: