-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Modify optimized compaction to cover edge cases #25594
base: master-1.x
Are you sure you want to change the base?
Conversation
6e9db1b
to
cab638c
Compare
9f5098b
to
8c9d7e7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First pass, mostly about comments.
This PR changes the algorithm for compaction to account for the following cases that were not previously accounted for: - Many generations with a groupsize over 2 GB - Single generation with many files and a groupsize under 2 GB Where groupsize is the total size of the TSM files in said shard directory. closes #25666
7de2cf2
to
d631314
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still need to review the tests more closely, as well.
for shards that may have over a 2 GB group size but many fragmented files (under 2 GB and under aggressive point per block count)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lots of good changes and more tests! Thanks for the effort.
I still have a few things that you may want to change. Happy to discuss things in a teleconference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great changes, really improving this code. A few more in the endless cycle...
@@ -77,6 +81,9 @@ const ( | |||
// partition snapshot compactions that can run at one time. | |||
// A value of 0 results in runtime.GOMAXPROCS(0). | |||
DefaultSeriesFileMaxConcurrentSnapshotCompactions = 0 | |||
|
|||
// MaxTSMFileSize is the maximum size of TSM files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
@davidby-influx I've added checks to verify that I'm not planning PlanOptimize() or FullyCompacted() specific TSM layouts 👍 |
// PlanOptimize will return the groups for compaction, the compaction group length, | ||
// and the amount of generations within the compaction group. | ||
// generationCount needs to be set to decide how many points per block during compaction. | ||
// This value is mostly ignored in normal compaction code paths, but, | ||
// for the edge case where there is a single generation with many | ||
// files under 2 GB this value is an important indicator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love this comment!
var called SingleGenerationReasonText var
Many tsm generations over level 4 compaction single tsm generation under level 4 compaction all in same shard. Group size is over 2 GB for each generation.
Add a check to ensure that "orphaned" levels are compacted further with the rest of the shard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please change all require.Equal(t, 0, ...
calls to require.Zero(t ...
calls.
I think some of the conditionals need to change in the planner./ The scenario I am imagining is a shard that was compacted with the aggressive block count taking new writes and need to be recompacted. We need to ignore files with the aggressive block count, not just those which are exactly the default. This may also require more test to verify correctness.
I will go over the tests in more detail later this week.
@@ -397,7 +397,7 @@ func (c *DefaultPlanner) PlanOptimize() (compactGroup []CompactionGroup, compact | |||
} | |||
} | |||
|
|||
if len(currentGen) == 0 || currentGen.level() == cur.level() { | |||
if len(currentGen) == 0 || currentGen.level() >= cur.level() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the halting issue for the recent customer situation?
t.Fatalf("tsm file length mismatch: got %v, exp %v", got, exp) | ||
} | ||
_, cgLen := cp.PlanLevel(1) | ||
require.Equal(t, int64(0), cgLen, "compaction group length; PlanLevel(1)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does require.Zero
work here?
@@ -449,7 +469,7 @@ func (c *DefaultPlanner) Plan(lastWrite time.Time) ([]CompactionGroup, int64) { | |||
var skip bool | |||
|
|||
// Skip the file if it's over the max size and contains a full block and it does not have any tombstones | |||
if len(generations) > 2 && group.size() > uint64(maxTSMFileSize) && c.FileStore.BlockCount(group.files[0].Path, 1) == tsdb.DefaultMaxPointsPerBlock && !group.hasTombstones() { | |||
if len(generations) > 2 && group.size() > uint64(tsdb.MaxTSMFileSize) && c.FileStore.BlockCount(group.files[0].Path, 1) == tsdb.DefaultMaxPointsPerBlock && !group.hasTombstones() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the c.FileStore.BlockCount(group.files[0].Path, 1) == tsdb.DefaultMaxPointsPerBlock
has to become a >=
because you may produce files with the aggressive block count in an earlier compaction that you don't want to compact again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And a test for this?
@@ -525,7 +545,7 @@ func (c *DefaultPlanner) Plan(lastWrite time.Time) ([]CompactionGroup, int64) { | |||
// Skip the file if it's over the max size and contains a full block or the generation is split | |||
// over multiple files. In the latter case, that would mean the data in the file spilled over | |||
// the 2GB limit. | |||
if g.size() > uint64(maxTSMFileSize) && c.FileStore.BlockCount(g.files[0].Path, 1) == tsdb.DefaultMaxPointsPerBlock { | |||
if g.size() > uint64(tsdb.MaxTSMFileSize) && c.FileStore.BlockCount(g.files[0].Path, 1) == tsdb.DefaultMaxPointsPerBlock { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above:
c.FileStore.BlockCount(g.files[0].Path, 1) == tsdb.DefaultMaxPointsPerBlock
might need to be >=
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, may need a test for this.
@@ -569,7 +589,7 @@ func (c *DefaultPlanner) Plan(lastWrite time.Time) ([]CompactionGroup, int64) { | |||
} | |||
|
|||
// Skip the file if it's over the max size and it contains a full block | |||
if gen.size() >= uint64(maxTSMFileSize) && c.FileStore.BlockCount(gen.files[0].Path, 1) == tsdb.DefaultMaxPointsPerBlock && !gen.hasTombstones() { | |||
if gen.size() >= uint64(tsdb.MaxTSMFileSize) && c.FileStore.BlockCount(gen.files[0].Path, 1) == tsdb.DefaultMaxPointsPerBlock && !gen.hasTombstones() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
c.FileStore.BlockCount(gen.files[0].Path, 1) == tsdb.DefaultMaxPointsPerBlock
might need to be >=
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, may need a test for this
} | ||
} | ||
tsmP, pLenP := cp.Plan(time.Now().Add(-time.Second)) | ||
require.Equal(t, 0, len(tsmP), "compaction group; Plan()") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
require.Zero
?
t.Fatalf("tsm file length mismatch: got %v, exp %v", got, exp) | ||
} | ||
_, cgLen := cp.PlanLevel(1) | ||
require.Equal(t, int64(0), cgLen, "compaction group length; PlanLevel(1)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
require.Zero
This PR changes the algorithm for compaction to account for the following
cases that were not previously accounted for:
Where groupsize is the total size of the TSM files in said shard directory.