-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Store bandwidth characteristic of Manager restore process #4042
Comments
The suggestion is to include bandwidth of the restore into the To achieve the goal of this task, we need to store an information about the duration of The restore_progress that is saved to DB is of the following form
It looks that we already store the needed data in DB. It's per batch of the sstables (SSTableID column), and we store an information of when the download has started/ended (DownloadStartedAt/DownloadCompletedAt), when the l&s started/ended (RestoreStartedAt, RestoreCompletedAt) and what was the host owning this batch (Host). Besides, we save info of how much of the bytes was downloaded/restored in
We can show the idle time in the summary as well by outputing (per host)
The output can be:
As the output may be quite big for x nodes cluster, we can include it into the |
Possibly it's worth to include it into 3.4 release. |
Nice! |
Isn't it why God created Metrics and we implemented Monitoring? |
I will add it in both metrics and progress display. |
It's useful for checking/tracking restore performance. Ref #4042
It's useful for checking/tracking restore performance. Ref #4042
It's useful for checking/tracking restore performance. Ref #4042
It's useful for checking/tracking restore performance. Ref #4042
* refactor(restore): separate methods for updating metrics/progress This should make it easier to see what is updated where and when. * feat(metrics): restore, add bandwidth metrics They are really useful for evaluating restore performance. * feat(restore): set download/stream bytes/duration metrics It's useful for checking/tracking restore performance. Ref #4042 * fix(restore): don't initialize metrics twice This was a left-over from the PR introducing indexing (14aef7b). It also initialized metrics as a part of the indexing procedure, but it forgot to remove the previous metrics initialization from the code. * fix(restore): use backup bluster ID in remaining_bytes metric There was a confusion about which cluster ID should be used for labeling remaining_bytes metric. When setting remaining_bytes, we used backup cluster ID, but when decreasing, we used restore cluster ID. Backup cluster ID should be used in both places as this metrics describes how many bytes from which place are yet to be restored. Since we use backup cluster DC, node ID, etc., we should also use backup cluster ID.
* refactor(restore): separate methods for updating metrics/progress This should make it easier to see what is updated where and when. * feat(metrics): restore, add bandwidth metrics They are really useful for evaluating restore performance. * feat(restore): set download/stream bytes/duration metrics It's useful for checking/tracking restore performance. Ref #4042 * fix(restore): don't initialize metrics twice This was a left-over from the PR introducing indexing (14aef7b). It also initialized metrics as a part of the indexing procedure, but it forgot to remove the previous metrics initialization from the code. * fix(restore): use backup bluster ID in remaining_bytes metric There was a confusion about which cluster ID should be used for labeling remaining_bytes metric. When setting remaining_bytes, we used backup cluster ID, but when decreasing, we used restore cluster ID. Backup cluster ID should be used in both places as this metrics describes how many bytes from which place are yet to be restored. Since we use backup cluster DC, node ID, etc., we should also use backup cluster ID.
Restore: add and fill host info in restore progress * chore(go.mod): remove replace directive to SM submodules It was a left-over after feature development:/ * chore(go.mod): bump SM submodules deps * feat(schema): add shard cnt to restore_run_progress It's going to be needed for calculating per shard download/stream bandwidth in progress command. * feat(restore): add and fill shard cnt in restore run progress This commit also moves host shard info to the tablesWorker, as it is commonly reused during restore procedure. * feat(restore): add and fill host info in progress This allows to calculate download/stream per shard bandwidth in 'sctool progress' display. * feat(managerclient): display bandwidth in sctool progress Fixes #4042 * feat(managerclient): include B or iB in SizeSuffix display It is nicer to see: "Size: 10B" instead of "Size: 10" or "Size: 20KiB" instead of "Size: 20k".
Since evaluating the restore process performance by bandwidth seems like the most reasonable approach - especially when comparing across different clusters and datasets - it makes sense to store and display this data in Manager.
It would be a great improvement for restore benchmarking tests.
The text was updated successfully, but these errors were encountered: