Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[4189] Fix colliding filters by updating match method for MultiCharSequenceFilter to iterate through all patterns for the most complete match and not return on the first pattern found #4188

Merged
merged 9 commits into from
Feb 22, 2023
37 changes: 33 additions & 4 deletions src/metrics/filters/filter.go
Original file line number Diff line number Diff line change
Expand Up @@ -585,14 +585,43 @@ func (f *multiCharSequenceFilter) matches(val []byte) ([]byte, bool) {
return nil, false
}

var matchIndex int
var bestPattern []byte
for _, pattern := range f.patterns {
if f.backwards && bytes.HasSuffix(val, pattern) {
return val[:len(val)-len(pattern)], true
if len(pattern) > len(val) {
continue
}

if !f.backwards && bytes.HasPrefix(val, pattern) {
return val[len(pattern):], true
if f.backwards {
if bytes.HasSuffix(val, pattern) {
if len(pattern) > len(bestPattern) {
bestPattern = pattern
matchIndex = len(val) - len(pattern)
// No need to continue searching through remaining patterns if a complete match is found
if len(bestPattern) == len(val) {
break
}
}
}
} else {
if bytes.HasPrefix(val, pattern) {
if len(pattern) > len(bestPattern) {
bestPattern = pattern
matchIndex = len(pattern)
// No need to continue searching through remaining patterns if a complete match is found
if len(bestPattern) == len(val) {
break
}
}
}
}
}

if bestPattern != nil {
if f.backwards {
return val[:matchIndex], true
}
return val[matchIndex:], true
}

return nil, false
Expand Down
10 changes: 10 additions & 0 deletions src/metrics/filters/filter_benchmark_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,16 @@ func BenchmarkRangeFilterRangeMatchNegation(b *testing.B) {
benchRangeFilterRange(b, []byte("!a-z"), []byte("p"), false)
}

func BenchmarkMultiRangeFilterPrefixCollision(b *testing.B) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prateek PTAL

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please put the results of the benchmark before and after the change.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To Justin's point - please add a comment to the code with "After change results"

And in the PR, update description with both before/after results

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the PR description to add benchmark results and a table to help with comparison. Also updated the code to add a small optimization to stop iterating through patterns if an optimal match was found and updated tests to ensure it was working

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated table with a more legible benchmark test comparison

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add b.ReportAllocs() to these benchmarks?

Having a >50% impact seems pretty steep for this change, but I'm not seeing any obvious ways to make this function faster. Perhaps this impact is just very high because we're purposely checking the scenario where these "colliding filters" are present. Can we see the before/after numbers for the vanilla case where there aren't any colliding filters?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran it with b.ReportAllocs() and there are 0 allocs with the old and new logic.

Updated the tests to show the change for all benchmark tests related to my code. There is a small difference (~0-15%) in test cases where the prefix/suffix collision does not occur likely due to the additional conditionals present after a prefix or suffix detection is found.

benchMultiRangeFilter(b, []byte("test_1,test_1_suffix,extra_unused_pattern"),
false, [][]byte{[]byte("test_1"), []byte("test_1_suffix")})
}

func BenchmarkMultiRangeFilterSuffixCollision(b *testing.B) {
benchMultiRangeFilter(b, []byte("test_1,prefix_test_1,extra_unused_pattern"),
true, [][]byte{[]byte("test_1"), []byte("prefix_test_1")})
}

func BenchmarkMultiRangeFilterTwo(b *testing.B) {
benchMultiRangeFilter(b, []byte("test_1,test_2"), false, [][]byte{[]byte("test_1"), []byte("fake_1")})
}
Expand Down
10 changes: 10 additions & 0 deletions src/metrics/filters/filter_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -349,6 +349,16 @@ func TestMultiCharSequenceFilter(t *testing.T) {
validateLookup(t, f, "12test", true, "12")
validateLookup(t, f, "12test2", true, "12")
validateLookup(t, f, "123book", true, "123")

f, err = newMultiCharSequenceFilter([]byte("test,test_post,extra_unused_filter"), false)
require.NoError(t, err)
validateLookup(t, f, "test", true, "")
validateLookup(t, f, "test_post", true, "")

f, err = newMultiCharSequenceFilter([]byte("test,pre_test,extra_unused_filter"), true)
require.NoError(t, err)
validateLookup(t, f, "test", true, "")
validateLookup(t, f, "pre_test", true, "")
}

func validateLookup(t *testing.T, f chainFilter, val string, expectedMatch bool, expectedRemainder string) {
Expand Down