-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Batch sstables of similar size #3979
Comments
Is this part of 3.3.2? |
The change is going to affect the meaning of |
This is a temporary implementation used for integrating workload indexing with the rest of the code. It will be improved as a part of the #3979.
This is a temporary implementation used for integrating workload indexing with the rest of the code. It will be improved as a part of the #3979.
This is a temporary implementation used for integrating workload indexing with the rest of the code. It will be improved as a part of the #3979.
This is a temporary implementation used for integrating workload indexing with the rest of the code. It will be improved as a part of the #3979.
This is a temporary implementation used for integrating workload indexing with the rest of the code. It will be improved as a part of the #3979.
This commit allows to set --batch-size=0. When this happens, batches will be created so that they contain about 5% of expected node workload during restore. This allows for creating big, yet evenly distributed batches without the need to play with the --batch-size flag. It should also work better fine when backed up cluster had different amount of nodes than the restore destination cluster. Fixes #3979
This is a temporary implementation used for integrating workload indexing with the rest of the code. It will be improved as a part of the #3979.
* feat(schema): drop sequential restore run tracking Right now SM restores location by location, manifest by manifest, table by table. That's why it tracks restore progress by keeping location/manifest/table in the DB. We are moving away from sequential restore approach in favor of restoring from all locations/manifests/tables at the same time. * feat(restore): adjust model to dropped sequential restore run tracking * refactor(backupspec): include the newest file version in ListVersionedFiles There is no need to iterate versioned files (ListVersionedFiles) and not versioned files (buildFilesSizesCache) separately. Doing it in a single iteration is faster, and it allows to store all size information in a single place. * feat(restore): add workload indexing This commit introduces the structure of restore workload. Workload is divided per location->table->remote sstable directory. This changes the hierarchy established by manifests (location->node->table->remote sstable dir). It also aggregates files into actual sstables, extracts their IDs, and aggregates their sizes, and keeps track of sstable versioning. * feat(restore): index, support resume Indexed workload won't contain sstables that were already restored during previous restore run. * feat(restore): index, support metrics init * feat(restore): add primitive batching using indexed workload This is a temporary implementation used for integrating workload indexing with the rest of the code. It will be improved as a part of the #3979. * feat(restore): integrate new indexing and batching with codebase This commit makes use of the new indexing and batching approaches and uses them in the restore tables codebase. * fix(restore): handle download of fully versioned batch Recent commits changed versioned batch download so that if any sstable component is versioned, then all sstable components are downloaded as versioned files. It was done in that way to allow easier versioned progress calculation (we don't store per file size, only the whole sstable size). This brought to light a bug (that existed before, but was more difficult to hit), in which restoring batch failed when the whole batch was versioned, as calling RcloneSyncCopyPaths on empty paths parameter resulted in broken download. We could just skip the RcloneSyncCopyPaths call when the whole batch is versioned, but this would leave us without the agentJobID which is a part of sort key in RestoreRunProgress. Without it, we could potentially overwrite one restore run progress with another - if both of them happened on the same RemoteSSTableDir, by the same Host, and were fully versioned. It would also introduce a different path for restoring regular batch and fully versioned batch, which is not desirable. That's why I decided to modify rclone server to allow empty path parameter, so that it still generates agentJobID, but it doesn't do anything except for that. * feat(restore): index, log workload info Workload info contains location/table/remote sstable dir sstable count, total size, max and average sstable size.
This commit allows to set --batch-size=0. When this happens, batches will be created so that they contain about 5% of expected node workload during restore. This allows for creating big, yet evenly distributed batches without the need to play with the --batch-size flag. It should also work better fine when backed up cluster had different amount of nodes than the restore destination cluster. Fixes #3979
@karol-kokoszka do we want to make --batch-size=0 default in SM 3.4? |
This commit allows to set --batch-size=0. When this happens, batches will be created so that they contain about 5% of expected node workload during restore. This allows for creating big, yet evenly distributed batches without the need to play with the --batch-size flag. It should also work better fine when backed up cluster had different amount of nodes than the restore destination cluster. Fixes #3979
@Michal-Leszczynski please leave the current default. |
@karol-kokoszka so we can close #3809 then? |
This results in creating batches of sstables of more similar size. Fixes #3979
Let's bring it to the planning. IMHO we should keep the default, but let's discuss on planning. |
Assume the following scenario: node with 8 shards, batch with 8 sstables, 1 is really big, 7 are small.
As only a single shard works on load&stream of given sstable, we would end up in a situation where the 7 shards taking care of small sstables would finish their work quickly and wait for the single shard working on a big sstable.
Creating batches of similarly sized sstables would improve load&stream performance.
The text was updated successfully, but these errors were encountered: