Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Modify optimized compaction to cover edge cases #25594

Open
wants to merge 19 commits into
base: master-1.x
Choose a base branch
from

Conversation

devanbenz
Copy link

@devanbenz devanbenz commented Nov 27, 2024

This PR changes the algorithm for compaction to account for the following
cases that were not previously accounted for:

  • Many generations with a groupsize over 2 GB
  • Single generation with many files and a groupsize under 2 GB

Where groupsize is the total size of the TSM files in said shard directory.

@devanbenz devanbenz force-pushed the db/4201/compaction-bugs branch from 6e9db1b to cab638c Compare December 13, 2024 17:20
@devanbenz devanbenz force-pushed the db/4201/compaction-bugs branch from 9f5098b to 8c9d7e7 Compare December 16, 2024 18:49
@devanbenz devanbenz changed the title feat(wip): WIP modifying compaction tests feat: Modify optimized compaction to cover edge cases Dec 16, 2024
@devanbenz devanbenz marked this pull request as ready for review December 16, 2024 19:32
tsdb/engine/tsm1/engine.go Outdated Show resolved Hide resolved
Copy link
Member

@gwossum gwossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass, mostly about comments.

This PR changes the algorithm for compaction to account for the following
cases that were not previously accounted for:

- Many generations with a groupsize over 2 GB
- Single generation with many files and a groupsize under 2 GB
Where groupsize is the total size of the TSM files in said shard directory.

closes #25666
@devanbenz devanbenz force-pushed the db/4201/compaction-bugs branch from 7de2cf2 to d631314 Compare December 16, 2024 23:32
@devanbenz devanbenz self-assigned this Dec 16, 2024
Copy link
Contributor

@davidby-influx davidby-influx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still need to review the tests more closely, as well.

tsdb/engine/tsm1/compact.go Outdated Show resolved Hide resolved
tsdb/engine/tsm1/compact.go Outdated Show resolved Hide resolved
tsdb/engine/tsm1/compact.go Show resolved Hide resolved
tsdb/engine/tsm1/compact.go Outdated Show resolved Hide resolved
tsdb/engine/tsm1/compact_test.go Outdated Show resolved Hide resolved
tsdb/engine/tsm1/compact_test.go Outdated Show resolved Hide resolved
for shards that may have over a 2 GB group size but
many fragmented files (under 2 GB and under aggressive
point per block count)
Copy link
Contributor

@davidby-influx davidby-influx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of good changes and more tests! Thanks for the effort.
I still have a few things that you may want to change. Happy to discuss things in a teleconference.

tsdb/config.go Show resolved Hide resolved
tsdb/engine/tsm1/compact.go Outdated Show resolved Hide resolved
tsdb/engine/tsm1/compact.go Outdated Show resolved Hide resolved
tsdb/engine/tsm1/compact_test.go Outdated Show resolved Hide resolved
tsdb/engine/tsm1/compact_test.go Outdated Show resolved Hide resolved
tsdb/engine/tsm1/compact_test.go Outdated Show resolved Hide resolved
tsdb/engine/tsm1/compact_test.go Outdated Show resolved Hide resolved
tsdb/engine/tsm1/compact_test.go Show resolved Hide resolved
tsdb/engine/tsm1/compact_test.go Outdated Show resolved Hide resolved
tsdb/engine/tsm1/compact_test.go Show resolved Hide resolved
Copy link
Contributor

@davidby-influx davidby-influx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great changes, really improving this code. A few more in the endless cycle...

tsdb/engine/tsm1/compact.go Outdated Show resolved Hide resolved
tsdb/engine/tsm1/compact_test.go Outdated Show resolved Hide resolved
@@ -77,6 +81,9 @@ const (
// partition snapshot compactions that can run at one time.
// A value of 0 results in runtime.GOMAXPROCS(0).
DefaultSeriesFileMaxConcurrentSnapshotCompactions = 0

// MaxTSMFileSize is the maximum size of TSM files.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

tsdb/engine/tsm1/compact_test.go Show resolved Hide resolved
tsdb/engine/tsm1/compact_test.go Outdated Show resolved Hide resolved
tsdb/engine/tsm1/compact_test.go Outdated Show resolved Hide resolved
tsdb/engine/tsm1/compact_test.go Show resolved Hide resolved
@devanbenz
Copy link
Author

@davidby-influx I've added checks to verify that I'm not planning PlanOptimize() or FullyCompacted() specific TSM layouts 👍

tsdb/config.go Outdated Show resolved Hide resolved
Comment on lines +98 to +103
// PlanOptimize will return the groups for compaction, the compaction group length,
// and the amount of generations within the compaction group.
// generationCount needs to be set to decide how many points per block during compaction.
// This value is mostly ignored in normal compaction code paths, but,
// for the edge case where there is a single generation with many
// files under 2 GB this value is an important indicator.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love this comment!

var called SingleGenerationReasonText var
Many tsm generations over level 4 compaction
single tsm generation under level 4 compaction all in
same shard. Group size is over 2 GB for each generation.
Add a check to ensure that "orphaned" levels are compacted
further with the rest of the shard.
@devanbenz devanbenz requested a review from gwossum December 23, 2024 15:53
Copy link
Contributor

@davidby-influx davidby-influx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change all require.Equal(t, 0, ... calls to require.Zero(t ... calls.

I think some of the conditionals need to change in the planner./ The scenario I am imagining is a shard that was compacted with the aggressive block count taking new writes and need to be recompacted. We need to ignore files with the aggressive block count, not just those which are exactly the default. This may also require more test to verify correctness.

I will go over the tests in more detail later this week.

@@ -397,7 +397,7 @@ func (c *DefaultPlanner) PlanOptimize() (compactGroup []CompactionGroup, compact
}
}

if len(currentGen) == 0 || currentGen.level() == cur.level() {
if len(currentGen) == 0 || currentGen.level() >= cur.level() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the halting issue for the recent customer situation?

t.Fatalf("tsm file length mismatch: got %v, exp %v", got, exp)
}
_, cgLen := cp.PlanLevel(1)
require.Equal(t, int64(0), cgLen, "compaction group length; PlanLevel(1)")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does require.Zero work here?

@@ -449,7 +469,7 @@ func (c *DefaultPlanner) Plan(lastWrite time.Time) ([]CompactionGroup, int64) {
var skip bool

// Skip the file if it's over the max size and contains a full block and it does not have any tombstones
if len(generations) > 2 && group.size() > uint64(maxTSMFileSize) && c.FileStore.BlockCount(group.files[0].Path, 1) == tsdb.DefaultMaxPointsPerBlock && !group.hasTombstones() {
if len(generations) > 2 && group.size() > uint64(tsdb.MaxTSMFileSize) && c.FileStore.BlockCount(group.files[0].Path, 1) == tsdb.DefaultMaxPointsPerBlock && !group.hasTombstones() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the c.FileStore.BlockCount(group.files[0].Path, 1) == tsdb.DefaultMaxPointsPerBlock has to become a >= because you may produce files with the aggressive block count in an earlier compaction that you don't want to compact again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And a test for this?

@@ -525,7 +545,7 @@ func (c *DefaultPlanner) Plan(lastWrite time.Time) ([]CompactionGroup, int64) {
// Skip the file if it's over the max size and contains a full block or the generation is split
// over multiple files. In the latter case, that would mean the data in the file spilled over
// the 2GB limit.
if g.size() > uint64(maxTSMFileSize) && c.FileStore.BlockCount(g.files[0].Path, 1) == tsdb.DefaultMaxPointsPerBlock {
if g.size() > uint64(tsdb.MaxTSMFileSize) && c.FileStore.BlockCount(g.files[0].Path, 1) == tsdb.DefaultMaxPointsPerBlock {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above:
c.FileStore.BlockCount(g.files[0].Path, 1) == tsdb.DefaultMaxPointsPerBlock might need to be >=

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, may need a test for this.

@@ -569,7 +589,7 @@ func (c *DefaultPlanner) Plan(lastWrite time.Time) ([]CompactionGroup, int64) {
}

// Skip the file if it's over the max size and it contains a full block
if gen.size() >= uint64(maxTSMFileSize) && c.FileStore.BlockCount(gen.files[0].Path, 1) == tsdb.DefaultMaxPointsPerBlock && !gen.hasTombstones() {
if gen.size() >= uint64(tsdb.MaxTSMFileSize) && c.FileStore.BlockCount(gen.files[0].Path, 1) == tsdb.DefaultMaxPointsPerBlock && !gen.hasTombstones() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

c.FileStore.BlockCount(gen.files[0].Path, 1) == tsdb.DefaultMaxPointsPerBlock might need to be >=

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, may need a test for this

}
}
tsmP, pLenP := cp.Plan(time.Now().Add(-time.Second))
require.Equal(t, 0, len(tsmP), "compaction group; Plan()")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

require.Zero ?

t.Fatalf("tsm file length mismatch: got %v, exp %v", got, exp)
}
_, cgLen := cp.PlanLevel(1)
require.Equal(t, int64(0), cgLen, "compaction group length; PlanLevel(1)")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

require.Zero

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants