Give possibility for restoring DC using mapping sourceDC -> destinationDC #3829

karol-kokoszka · 2024-04-29T09:28:05Z

Right now, there is no option in the Scylla Manager restore task to restore just a single data center (DC) from the backup location. This could lead to problematic situations, particularly when:

Encryption at Rest (EaR) is enabled,
Two DCs use different encryption keys,
Encryption keys are stored in different cloud regions, and
There is only one backup location available.
To address this, we would need to make the encryption keys multi-regional to facilitate the restoration process in such scenarios.

The location flag may not be very intuitive, as the [dc] part defines the destination DC, not the source DC data. We need to discuss during the manager planning to determine if a new flag specifying the source is necessary.
If we can restore just a single DC, then we can restore DC by DC, avoiding the need to create multi-regional keys.

(cc: @tzach)

The text was updated successfully, but these errors were encountered:

tzach · 2024-04-30T07:04:11Z

To address this, we would need to make the encryption keys multi-regional to facilitate the restoration process in such scenarios.

Agree, but how this is a Scylla Manager issue to fix?

karol-kokoszka · 2024-04-30T09:29:19Z

We could potentially address the problem by allowing to restore just a single DC from the location bucket.
It's something what we doesn't support at the moment (possibly by a mistake).

rayakurl · 2024-05-07T10:21:37Z

@tzach - we need a resolution. for now almost all sct tests are failing since they are multi DC. We will add a couple of pipelines for a single DC + encryption but are are disabling the multi DC jobs as they constantly failing. @mikliapko as discussed, please create a task for the new pipelines. and disable the multi DC ones for now. Thanks

Since there is an issue with multiDC cluster restore when the EaR is turned on (scylladb/scylla-manager#3829), it was decided to temporarily switch the main part of jobs to run on singleDC cluster. Only one multiDC cluster job is left for enterprise version 2022 where EaR is not implemented.

Since there is an issue with multiDC cluster restore when the EaR is turned on (scylladb/scylla-manager#3829), it was decided to temporarily switch the main part of jobs to run on singleDC cluster. Only one multiDC cluster job is left for enterprise version 2022 where EaR is not implemented. (cherry picked from commit 4da831d)

karol-kokoszka · 2024-05-20T09:20:19Z

grooming notes

The initial idea is to add new flag to the restore CLI, so that it's possible to define the origin DC from the backup location.
Then, data from this DC is going to be restored to specified destination.

@mikliapko SCT will have to be updated to test the scenario with restoring single DC.

Fixes #3829 s: filter by dc

Fixes #3829

Michal-Leszczynski · 2024-10-09T08:22:20Z

The initial idea is to add new flag to the restore CLI, so that it's possible to define the origin DC from the backup location.
Then, data from this DC is going to be restored to specified destination.

After giving it some more thought, I wouldn't recommend adding it in such way.
The need for this feature raised from #3871, where it could be used to restore DC by DC.
This is problematic, as restore task does not only download and load&stream the data, but it also:

disables and enables tombstone_gc
drops and creates views
runs repair

So running many restore tasks, one by one, DC by DC, would result in lots of redundant work.
Also, I could theoretically (not sure about that) lead to data resurrection, as tombstone_gc would be enabled in between DC restorations.
Not to mention, that it would be user responsibility to remember about all DCs from the backup to be restored.

A better idea could be to extend restore with a flag like --dc-mapping (string -> list of strings).
This would allow user to specify which dc from the backup should be restored by which dcs in restored cluster.
It has a few benefits;

makes it possible to solve Manager restore error (kms_error AccessDeniedException) for multiDC cluster with EAR enabled #3871
allows for tuning up batching process by reducing node - backup location latencies in multi-location setting
it could be used during restore schema in order to modify CREATE KEYSPACE statements containing no longer existing dcs (Make it possible to restore schema into a different DC setting #4049)

mikliapko · 2024-10-10T09:18:48Z

A better idea could be to extend restore with a flag like --dc-mapping (string -> list of strings).

@Michal-Leszczynski
When it is ready, could you please then provide an example of input for this flag.
I will switch some of our SCT tests back to run on multiDC cluster.

disrupt_mgmt_restore doesn't support multiDC cluster configuration due to the issues: - scylladb/scylla-manager#3829 - scylladb/scylla-manager#4049 Thus, it should be skipped if both issues are opened and cluster configuration is multiDC.

VAveryanov8 · 2025-01-07T12:02:07Z

I've put some additional thoughts to this issue to see what syntax we may want to support in restore command to cover following use cases, considering Michał findings that restoring DCs one by one could be problematic:

Allow sctool user to restore from cluster with any number of DCs to a cluster with any number of DCs
DC names from source cluster may be different from DC names in target cluster
Backup location can be one per DC or one for all DCs

Here is possible combinations of DCs in source and cluster target:

#	Source Cluster DCs	Target Cluster DCs	Description
0	dc1, dc2	dc1, dc2	Everything is exactly the same
1	dc1	dc2	Names mismatch
2	dc1, dc2	dc1, dc3	One dc name mismatch
3	dc1, dc2	dc3, dc4	Names mismatch
4	dc1, dc2	dc1	Target has less dcs, than Source
5	dc1, dc2	dc3	Target has less dcs, than Source; Names mismatch
6	dc1	dc1, dc2	Source has less dcs, than Target
7	dc1	dc2, dc3	Source has less dcs, than Target; Names mismatch

Considering above options, following syntax can be used

--dc-mapping source_dc1=>target_dc2;source_dc3=>target_dc4
where 
   '=>' is used to separate source and target
   ',' is used to separate list of dcs
   ';' is used to separate mappings

Alternatively to improve readability json file with mappings can be used, e.g.

--dc-mapping ./mappings.json

where mapping.json

{
  "mappings": [
    {"source": ["dc1"], "target": ["dc2"]},
    {"source": ["dc1", "dc2"], "target": ["dc3"]}
  ]
}

This syntax also can be used for to restore schema as well (#4049 ) , then we can modify WITH replication = {...} without requiring manual change from the user.

@karol-kokoszka @Michal-Leszczynski What do you think about this?

Michal-Leszczynski · 2025-01-07T13:46:17Z

Nice examples!
Also a nice idea to allow for using a json instead of typing everything into the command line.

I'm just wondering if it's useful to go for the []string->[]string instead of string->[]string mapping.
They both can describe the same situations (slice of keys can be simply separated to different entries).
The only argument for []string->[]string is that it might be more convenient for the user in some scenarios, but it makes the syntax more complicated, which is probably fine.

We also need to think about the validation.
I would say that the mapping pasted above should be incorrect, as it specifies dc1 in 2 separate entries (perhaps by mistake).

What happens when user specifies only a subset of backup DCs? In some cases, it might make sense to restore just a subset of DCs, but it should be clearly specified by the user so that it does not happen by accident. We could make it so that all DCs need to be specified, but some of them might be mapped to an empty set of DCs meaning that they wouldn't be restored at all.
Or perhaps we should introduce some wildcard characters for improved UX (although I'm not a big fan of that).
E.g. --dc-mapping dc1=>* meaning that dc1 should be restored to all DCs.

Finally, what should be the default values? Just identity mapping?

In terms of the syntax and parsing, we should re-use the pflag package for general purposes, but I don't think that it already contains the syntax needed for this flag.

VAveryanov8 · 2025-01-07T14:38:42Z

I'm just wondering if it's useful to go for the []string->[]string instead of string->[]string mapping.
They both can describe the same situations (slice of keys can be simply separated to different entries).
The only argument for []string->[]string is that it might be more convenient for the user in some scenarios, but it makes the syntax more complicated, which is probably fine.

Yes, the syntax for cmd looks quite complicated (and it's getting only more complicated, see below answers 😄 ), so maybe it's better to support only json input for dc-mapping.

We also need to think about the validation.
I would say that the mapping pasted above should be incorrect, as it specifies dc1 in 2 separate entries (perhaps by mistake).

Yes, it's just a mistake :)

What happens when user specifies only a subset of backup DCs? In some cases, it might make sense to restore just a subset of DCs, but it should be clearly specified by the user so that it does not happen by accident. We could make it so that all DCs need to be specified, but some of them might be mapped to an empty set of DCs meaning that they wouldn't be restored at all.
Or perhaps we should introduce some wildcard characters for improved UX (although I'm not a big fan of that).
E.g. --dc-mapping dc1=>* meaning that dc1 should be restored to all DCs.

Very good question! Maybe we can use --dc option with simplified semantics of --dc from backup command? So by default if only --dc-mapping is provided, we expect that all DCs must be specified, but if user wants to specify only subset of DCs, then it should be explicitly stated using the --dc flag. Here is an example of how I see that:

Having:
  cluster1 with dc1, dc2 
  cluster2 with dc3, dc4 
Then: 
  if --dc-mapping dc1=>dc3;dc2=>dc4 - ok
  if --dc-mapping dc1=>dc3 - error: source dc2 is not mapped to any target dc
  if --dc dc1 --dc-mapping dc1=>dc3 - ok (only dc1 will be restored)

But it looks at little bit complicated :)

Finally, what should be the default values? Just identity mapping?

Good question. if --dc-mapping is not provided we can preserve current behavior. But if we think that current behavior is doesn't worth to preserve, then yes, we can try to map DC's, but their names should be match otherwise error?

Michal-Leszczynski · 2025-01-07T15:25:58Z

Yes, the syntax for cmd looks quite complicated (and it's getting only more complicated, see below answers 😄 ), so maybe it's better to support only json input for dc-mapping.

That could also work, but the flag should accept the json in the regular text format instead of the file name (in some cases, it might be annoying to create the file just to run the restore command, but you can always include the file into the command line by using cat command).

Very good question! Maybe we can use --dc option with simplified semantics of --dc from backup command? So by default if only --dc-mapping is provided, we expect that all DCs must be specified, but if user wants to specify only subset of DCs, then it should be explicitly stated using the --dc flag. Here is an example of how I see that:

Yeah, creating 2 flags for controlling dc mapping seems like an overkill. I think that allowing empty values in the mapping or adding a special character (invalid in DC names) for marking that given DC shouldn't be restored at all makes more sense.

Good question. if --dc-mapping is not provided we can preserve current behavior. But if we think that current behavior is doesn't worth to preserve, then yes, we can try to map DC's, but their names should be match otherwise error?

The current behavior is not something worth preserving, so I would say that the identity mapping which fails when dc names does not match seems to be the best one to me. By does not match, I mean that the backup and restore DCs should be identical. This means that if backup={dc1,dc2} and restore={dc1,dc2,dc3}, then we should fail, as in this situation we should make sure that user knows that the dc3 would not be restoring any data.

disrupt_mgmt_restore doesn't support multiDC cluster configuration due to the issues: - scylladb/scylla-manager#3829 - scylladb/scylla-manager#4049 Thus, it should be skipped if both issues are opened and cluster configuration is multiDC.

disrupt_mgmt_restore doesn't support multiDC cluster configuration due to the issues: - scylladb/scylla-manager#3829 - scylladb/scylla-manager#4049 Thus, it should be skipped if both issues are opened and cluster configuration is multiDC. (cherry picked from commit 53fbf07)

This adds support for --dc-mapping flag to restore command. It specifies mapping between DCs from the backup and DCs in the restored(target) cluster. All DCs from the source cluster should be explicitly mapped to all DCs in the target cluster. The only exception is when source and target cluster has exact match: source dcs == target dcs. Only works with tables restoration (--restore-tables=true). Syntax: "source_dc1,source_dc2=>target_dc1,target_dc2" Multiple mappings are separated by semicolons (;). Exclamation mark (!) before a DC indicates that it should be ignored during restore. Examples: "dc1,dc2=>dc3" - data from dc1 and dc2 DCs should be restored to dc3 DC. "dc1,dc2=>dc3,!dc4" - data from dc1 and dc2 DCs should be restored to dc3 DC. Ignoring dc4 DC from target cluster. "dc1,!dc2=>dc2" - data from dc1 should be restored to dc2 DC. Ignoring dc2 from source cluster. Fixes: #3829

This introduces use of dc mappings when restoring tables. Now each dc is downloading only data from corresponding dc(s) accordingly to user provided mapping. Also some dcs can be explicitly ignored. Fixes: #3829

This adds another cluster to docker setup, so we can have integration tests for dc-mappings. Fixes: #3829

Fixes: #3829

karol-kokoszka added restore refinement-needed labels Apr 29, 2024

mikliapko mentioned this issue Apr 29, 2024

fix(manager): upgrade Scylla version used in Manager to latest scylladb/scylla-cluster-tests#7365

Merged

4 tasks

mikliapko mentioned this issue May 14, 2024

[SCT] Switch multiDC cluster tests to run on singleDC cluster #3850

Closed

mikliapko mentioned this issue May 14, 2024

Switch manager tests to run on singleDC environment scylladb/scylla-cluster-tests#7435

Merged

3 tasks

karol-kokoszka removed the refinement-needed label May 20, 2024

Michal-Leszczynski added this to the 3.4 milestone Sep 24, 2024

Michal-Leszczynski self-assigned this Sep 24, 2024

Michal-Leszczynski added a commit that referenced this issue Oct 8, 2024

feat(restore): filter restored manifests by dc

7af2bdb

Fixes #3829 s: filter by dc

Michal-Leszczynski added a commit that referenced this issue Oct 8, 2024

feat(restore): filter restored manifests by dc

942dad8

Fixes #3829 s: filter by dc

Michal-Leszczynski added a commit that referenced this issue Oct 8, 2024

feat(restore): filter restored manifests by dc

8ef0981

Fixes #3829

karol-kokoszka changed the title ~~Give possibility for restoring just a single DC~~ Give possibility for restoring DC using mapping sourceDC -> destinationDC Oct 10, 2024

karol-kokoszka removed this from the 3.4 milestone Oct 21, 2024

karol-kokoszka added this to the 3.5 milestone Oct 21, 2024

Michal-Leszczynski assigned VAveryanov8 Dec 19, 2024

mikliapko mentioned this issue Jan 2, 2025

Support different regions in disrupt_mgmt_restore nemesis scylladb/scylla-cluster-tests#9492

Merged

5 tasks

VAveryanov8 added a commit that referenced this issue Jan 16, 2025

feat(tests): adds third cluster (dc3) to docker setup

4622a83

This adds another cluster to docker setup, so we can have integration tests for dc-mappings. Fixes: #3829

VAveryanov8 added a commit that referenced this issue Jan 16, 2025

chore(test): adds dc-mapping integration tests

f0229b4

Fixes: #3829

VAveryanov8 linked a pull request Jan 16, 2025 that will close this issue

feat(restore): adds --dc-mapping flag to restore command #4213

Open

karol-kokoszka modified the milestones: 3.5, 1-1 Restore Jan 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Give possibility for restoring DC using mapping sourceDC -> destinationDC #3829

Give possibility for restoring DC using mapping sourceDC -> destinationDC #3829

karol-kokoszka commented Apr 29, 2024

tzach commented Apr 30, 2024

karol-kokoszka commented Apr 30, 2024

rayakurl commented May 7, 2024

karol-kokoszka commented May 20, 2024

Michal-Leszczynski commented Oct 9, 2024

mikliapko commented Oct 10, 2024

VAveryanov8 commented Jan 7, 2025

Michal-Leszczynski commented Jan 7, 2025

VAveryanov8 commented Jan 7, 2025

Michal-Leszczynski commented Jan 7, 2025 •

edited

Loading

Give possibility for restoring DC using mapping sourceDC -> destinationDC #3829

Give possibility for restoring DC using mapping sourceDC -> destinationDC #3829

Comments

karol-kokoszka commented Apr 29, 2024

tzach commented Apr 30, 2024

karol-kokoszka commented Apr 30, 2024

rayakurl commented May 7, 2024

karol-kokoszka commented May 20, 2024

Michal-Leszczynski commented Oct 9, 2024

mikliapko commented Oct 10, 2024

VAveryanov8 commented Jan 7, 2025

Michal-Leszczynski commented Jan 7, 2025

VAveryanov8 commented Jan 7, 2025

Michal-Leszczynski commented Jan 7, 2025 • edited Loading

Michal-Leszczynski commented Jan 7, 2025 •

edited

Loading