Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[4189] Fix colliding filters by updating match method for MultiCharSequenceFilter to iterate through all patterns for the most complete match and not return on the first pattern found #4188

Merged
merged 9 commits into from
Feb 22, 2023
29 changes: 25 additions & 4 deletions src/metrics/filters/filter.go
Original file line number Diff line number Diff line change
Expand Up @@ -585,14 +585,35 @@ func (f *multiCharSequenceFilter) matches(val []byte) ([]byte, bool) {
return nil, false
}

var matchIndex int
var bestPattern []byte
for _, pattern := range f.patterns {
if f.backwards && bytes.HasSuffix(val, pattern) {
return val[:len(val)-len(pattern)], true
if len(pattern) > len(val) {
continue
}

if !f.backwards && bytes.HasPrefix(val, pattern) {
return val[len(pattern):], true
if f.backwards {
if bytes.HasSuffix(val, pattern) {
if bestPattern == nil || len(pattern) > len(bestPattern) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: seems like the bestPattern == nil isn't needed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, updated.

bestPattern = pattern
matchIndex = len(val) - len(pattern)
}
}
} else {
if bytes.HasPrefix(val, pattern) {
if bestPattern == nil || len(pattern) > len(bestPattern) {
bestPattern = pattern
matchIndex = len(pattern)
}
}
}
}

if bestPattern != nil {
if f.backwards {
return val[:matchIndex], true
}
return val[matchIndex:], true
}

return nil, false
Expand Down
12 changes: 12 additions & 0 deletions src/metrics/filters/filter_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,18 @@ import (
"github.com/stretchr/testify/require"
)

func TestPrefixCompositeR2Filter(t *testing.T) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a better place to add these test cases is in TestMultiCharSequenceFilter in this file.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I also think the naming also isn't great. Is R2 naming only relevant internally to Uber?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah, there's R2 references aplenty in OSS world.

id0 := "arachne_failures"
id1 := "arachne_failures_by_rack"
f, err := newMultiCharSequenceFilter([]byte("arachne_failures,arachne_failures_by_rack"), false)
require.NoError(t, err)
_, matches1 := f.matches([]byte(id0))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm understanding the requirement wrong: Should "arachne_failures" match "arachne_failures" or "arachne_failures_by_rack" ? If your expectation was to match with "arachne_failures" with "arachne_failures" then wouldn't "len(pattern) > len(bestPattern)" make it match "arachne_failures_by_rack" ?

Follow-ups:

  1. Could you also assert the returned matchIndex ?
  2. Maybe add similar bestMatch testcase for backwards (HasSuffix) case as well ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're looking at an older commit possibly.

"arachne_failures" should match with "arachne-failures" and "arachne_failures_by_rack" should match with "arachne_failures_by_rack" even if it also matches with other patterns in the filter since this pattern is the best match.

In the latest commit I have tests for the length of the value in the returned byte array and with backwards set to true.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha! Yeah was looking at the older commit. Thanks for the update!

require.True(t, matches1)
_, matches2 := f.matches([]byte(id1))
require.True(t, matches2)

}

func TestNewFilterFromFilterValueInvalidPattern(t *testing.T) {
inputs := []string{"ab]c[sdf", "abc[z-a]", "*con[tT]ains*"}
for _, input := range inputs {
Expand Down