-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Blooms retention #12258
feat: Blooms retention #12258
Conversation
78f085f
to
e33f8eb
Compare
87a973d
to
f30cafb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An unintended consequence here is that if a tenant has any retention configs larger than the range the compactor checks for, it'll never delete blooms for that tenant. I think it makes more sense to delete all tenants once they hit max_lookback_days
regardless of their configs.
pkg/bloomcompactor/metrics.go
Outdated
// 1second -> 5 years, 10 buckets | ||
Buckets: prometheus.DefBuckets, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default buckets are 0.005->10s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That comment is misleading. I forgot to remove that comment after copy-pasting. This metric tracks the time needed to apply retention. The def buckets should work.
Tenant(tenant, table string) Location | ||
ParseTenantKey(loc Location) (string, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe rename this to TenantPrefix
and ParseTenantFromKey
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -127,6 +128,51 @@ func (b *bloomStoreEntry) FetchBlocks(ctx context.Context, refs []BlockRef, _ .. | |||
return b.fetcher.FetchBlocks(ctx, refs) | |||
} | |||
|
|||
func (b *bloomStoreEntry) TenantFilesForInterval(ctx context.Context, interval Interval) (map[string][]string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: not needed now, but making this not need to buffer the entire file list would be good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a TODO inside the function to add pooling if this becomes a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I refactored the implementation to reuse the slices from the listed objects.
pkg/bloomcompactor/retention.go
Outdated
|
||
startDay := storageconfig.NewDayTime(today.Add(-smallestRetention)) | ||
endDay := storageconfig.NewDayTime(0) | ||
if r.cfg.MaxLookbackDays > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this should be allowed to be zero to prevent trying to iterate every day since 1970.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. Done.
pkg/bloomcompactor/retention.go
Outdated
break | ||
} | ||
|
||
for tenant, objectKeys := range tenants { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: it's probably better to pass limits to TenantFilesForInterval
so it doesn't need to allocate space for files that won't be removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ended up doing something slightly different. The TenantFilesForInterval
now takes a filter function. We use that filter function to filter out tenants whose retention hasn't expired yet.
} | ||
|
||
if _, ok := tenants[tenant]; !ok { | ||
tenants[tenant] = make([]string, 0, 100) | ||
tenants[tenant] = nil // Initialize tenant with empty slice |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do this since the function returns all tenants regardless of the filter. The filter filters out the files.
This way we can know which tenants are there for a given day, but filter the files whose retention are not expired yet.
The goal of
I see two problems with that:
I'd stick with using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a metric which fires when a tenant's retention is longer than the lookback the compactor checks so we can alert on it.
Left a nit, but giving you an approval so you can merge when fixed
pkg/bloomcompactor/retention.go
Outdated
} | ||
|
||
// findSmallestRetention returns the smallest retention period across all tenants. | ||
// It also returns a boolean indicating if there is any retention period set at all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is somewhat confusing. It returns the smallest retention, but skips zero (disabled). It also does not return a bool like it suggests. Maybe smallestEnabledRetention
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed, thanks.
What this PR does / why we need it:
This PR adds retention to bloom blocks and metas. The retention is applied by only one compactor (the one that owns the smallest token).
Compaction works as follows:
6.1. Get max retention across all streams
6.2. If day < (now - maxretention) --> delete all content for tenant in the day table.
We try to run retention up to once a day but:
We add the following configs:
Special notes for your reviewer:
They PR looks big but it's mostly tests.
Checklist
CONTRIBUTING.md
guide (required)CHANGELOG.md
updatedadd-to-release-notes
labeldocs/sources/setup/upgrade/_index.md
production/helm/loki/Chart.yaml
and updateproduction/helm/loki/CHANGELOG.md
andproduction/helm/loki/README.md
. Example PRdeprecated-config.yaml
anddeleted-config.yaml
files respectively in thetools/deprecated-config-checker
directory. Example PR