Batch sstables of similar size #3979

Michal-Leszczynski · 2024-08-19T12:46:03Z

Assume the following scenario: node with 8 shards, batch with 8 sstables, 1 is really big, 7 are small.
As only a single shard works on load&stream of given sstable, we would end up in a situation where the 7 shards taking care of small sstables would finish their work quickly and wait for the single shard working on a big sstable.
Creating batches of similarly sized sstables would improve load&stream performance.

mykaul · 2024-09-03T12:19:07Z

Is this part of 3.3.2?

Michal-Leszczynski · 2024-09-03T14:35:38Z

No, the 3.3.2 will mostly contain a fix for #3989 and perhaps #4007 and #3995.
I would say that the restore related improvements are big enough for 3.4, but if there is a need to keep it as a patch release, it could be 3.3.3.

karol-kokoszka · 2024-09-09T09:19:41Z

The change is going to affect the meaning of batch-size flag when the value is set to 0.
0 is going to be wild-card meaning that batch size is expected to be 5% of the all data that would be sent to the node (basing on the following calculation -> (amount_of_restore_data * number_of_node_shards) / number_of_all_shards ).

This is a temporary implementation used for integrating workload indexing with the rest of the code. It will be improved as a part of the #3979.

This commit allows to set --batch-size=0. When this happens, batches will be created so that they contain about 5% of expected node workload during restore. This allows for creating big, yet evenly distributed batches without the need to play with the --batch-size flag. It should also work better fine when backed up cluster had different amount of nodes than the restore destination cluster. Fixes #3979

This is a temporary implementation used for integrating workload indexing with the rest of the code. It will be improved as a part of the #3979.

* feat(schema): drop sequential restore run tracking Right now SM restores location by location, manifest by manifest, table by table. That's why it tracks restore progress by keeping location/manifest/table in the DB. We are moving away from sequential restore approach in favor of restoring from all locations/manifests/tables at the same time. * feat(restore): adjust model to dropped sequential restore run tracking * refactor(backupspec): include the newest file version in ListVersionedFiles There is no need to iterate versioned files (ListVersionedFiles) and not versioned files (buildFilesSizesCache) separately. Doing it in a single iteration is faster, and it allows to store all size information in a single place. * feat(restore): add workload indexing This commit introduces the structure of restore workload. Workload is divided per location->table->remote sstable directory. This changes the hierarchy established by manifests (location->node->table->remote sstable dir). It also aggregates files into actual sstables, extracts their IDs, and aggregates their sizes, and keeps track of sstable versioning. * feat(restore): index, support resume Indexed workload won't contain sstables that were already restored during previous restore run. * feat(restore): index, support metrics init * feat(restore): add primitive batching using indexed workload This is a temporary implementation used for integrating workload indexing with the rest of the code. It will be improved as a part of the #3979. * feat(restore): integrate new indexing and batching with codebase This commit makes use of the new indexing and batching approaches and uses them in the restore tables codebase. * fix(restore): handle download of fully versioned batch Recent commits changed versioned batch download so that if any sstable component is versioned, then all sstable components are downloaded as versioned files. It was done in that way to allow easier versioned progress calculation (we don't store per file size, only the whole sstable size). This brought to light a bug (that existed before, but was more difficult to hit), in which restoring batch failed when the whole batch was versioned, as calling RcloneSyncCopyPaths on empty paths parameter resulted in broken download. We could just skip the RcloneSyncCopyPaths call when the whole batch is versioned, but this would leave us without the agentJobID which is a part of sort key in RestoreRunProgress. Without it, we could potentially overwrite one restore run progress with another - if both of them happened on the same RemoteSSTableDir, by the same Host, and were fully versioned. It would also introduce a different path for restoring regular batch and fully versioned batch, which is not desirable. That's why I decided to modify rclone server to allow empty path parameter, so that it still generates agentJobID, but it doesn't do anything except for that. * feat(restore): index, log workload info Workload info contains location/table/remote sstable dir sstable count, total size, max and average sstable size.

This commit allows to set --batch-size=0. When this happens, batches will be created so that they contain about 5% of expected node workload during restore. This allows for creating big, yet evenly distributed batches without the need to play with the --batch-size flag. It should also work better fine when backed up cluster had different amount of nodes than the restore destination cluster. Fixes #3979

Michal-Leszczynski · 2024-10-03T11:02:15Z

@karol-kokoszka do we want to make --batch-size=0 default in SM 3.4?
If I were to choose the default values from the scratch, I would choose in order to restore as fast as possible - as this is the most common case. On the other hand, changing the default value for the flag might break compatibility in other projects.

This commit allows to set --batch-size=0. When this happens, batches will be created so that they contain about 5% of expected node workload during restore. This allows for creating big, yet evenly distributed batches without the need to play with the --batch-size flag. It should also work better fine when backed up cluster had different amount of nodes than the restore destination cluster. Fixes #3979

karol-kokoszka · 2024-10-03T11:31:46Z

@Michal-Leszczynski please leave the current default.

Michal-Leszczynski · 2024-10-03T11:46:48Z

@karol-kokoszka so we can close #3809 then?

This results in creating batches of sstables of more similar size. Fixes #3979

karol-kokoszka · 2024-10-03T12:02:58Z

Let's bring it to the planning. IMHO we should keep the default, but let's discuss on planning.

Michal-Leszczynski added enhancement New feature or request restore labels Aug 19, 2024

Michal-Leszczynski self-assigned this Aug 19, 2024

Michal-Leszczynski mentioned this issue Aug 20, 2024

Improve restore performance #3706

Open

Michal-Leszczynski added this to the 3.3.2 milestone Aug 20, 2024

Michal-Leszczynski closed this as completed Aug 20, 2024

karol-kokoszka reopened this Sep 9, 2024

karol-kokoszka changed the title ~~Batch sstables of similar size~~ Batch sstables basing on percentage of the node's data Sep 9, 2024

karol-kokoszka mentioned this issue Sep 9, 2024

Increase the default batch size for restore to max possible = 0 #3809

Open

Michal-Leszczynski mentioned this issue Sep 23, 2024

[SCT] Report backup/restore duration metrics to Argus #4027

Closed

Michal-Leszczynski added a commit that referenced this issue Sep 25, 2024

feat(restore): add primitive batching using indexed workload

f06648e

This is a temporary implementation used for integrating workload indexing with the rest of the code. It will be improved as a part of the #3979.

Michal-Leszczynski mentioned this issue Sep 25, 2024

Restore improvement: index #4044

Merged

Michal-Leszczynski added a commit that referenced this issue Sep 25, 2024

feat(restore): add primitive batching using indexed workload

825f302

This is a temporary implementation used for integrating workload indexing with the rest of the code. It will be improved as a part of the #3979.

Michal-Leszczynski added a commit that referenced this issue Sep 26, 2024

feat(restore): add primitive batching using indexed workload

fe8d6b4

This is a temporary implementation used for integrating workload indexing with the rest of the code. It will be improved as a part of the #3979.

Michal-Leszczynski added a commit that referenced this issue Sep 26, 2024

feat(restore): add primitive batching using indexed workload

85ca585

This is a temporary implementation used for integrating workload indexing with the rest of the code. It will be improved as a part of the #3979.

Michal-Leszczynski added a commit that referenced this issue Sep 27, 2024

feat(restore): add primitive batching using indexed workload

cd67d27

This is a temporary implementation used for integrating workload indexing with the rest of the code. It will be improved as a part of the #3979.

Michal-Leszczynski linked a pull request Oct 2, 2024 that will close this issue

Restore improvement: batch #4055

Open

Michal-Leszczynski added a commit that referenced this issue Oct 3, 2024

feat(restore): add primitive batching using indexed workload

72f9c6f

This is a temporary implementation used for integrating workload indexing with the rest of the code. It will be improved as a part of the #3979.

Michal-Leszczynski changed the title ~~Batch sstables basing on percentage of the node's data~~ Batch sstables of similar size Oct 3, 2024

Michal-Leszczynski mentioned this issue Oct 3, 2024

Batch sstables basing on percentage of the node's data #4059

Open

Michal-Leszczynski added a commit that referenced this issue Oct 3, 2024

feat(restore): batch, order sstables by size

ac72b54

This results in creating batches of sstables of more similar size. Fixes #3979

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch sstables of similar size #3979

Batch sstables of similar size #3979

Michal-Leszczynski commented Aug 19, 2024

mykaul commented Sep 3, 2024

Michal-Leszczynski commented Sep 3, 2024

karol-kokoszka commented Sep 9, 2024

Michal-Leszczynski commented Oct 3, 2024

karol-kokoszka commented Oct 3, 2024

Michal-Leszczynski commented Oct 3, 2024

karol-kokoszka commented Oct 3, 2024

Batch sstables of similar size #3979

Batch sstables of similar size #3979

Comments

Michal-Leszczynski commented Aug 19, 2024

mykaul commented Sep 3, 2024

Michal-Leszczynski commented Sep 3, 2024

karol-kokoszka commented Sep 9, 2024

Michal-Leszczynski commented Oct 3, 2024

karol-kokoszka commented Oct 3, 2024

Michal-Leszczynski commented Oct 3, 2024

karol-kokoszka commented Oct 3, 2024