fix(kms): set proper node regions in multi-dc setups #9026

vponomaryov · 2024-10-22T16:55:22Z

The AWS-KMS code in DB node python class uses shared dictionary
from the DB cluster class for updating the KMS endpoint region.

It was not a problem when DB nodes setup was serial.
In this case shared object was changed by each node but had proper value in needed time frame.

After implementation of the parallel DB nodes setup (#7383)
we started getting problems that only one state of that shared object was being applied for all nodes.

In single-dc setups everything was correct just because there was no diff among DB node's region names.
But in multi-dc setups values from DB nodes started being applied to each other.

So, fix it by just deep-copying that shared dictionary to avoid updates of a shared object.

Closes: #9025

Testing

https://argus.scylladb.com/tests/scylla-cluster-tests/42b2d981-559b-4b64-b931-289e9f4ab928

PR pre-checks (self review)

I added the relevant backport labels
I didn't leave commented-out/debugging code

Reminders

Add New configuration option and document them (in sdcm/sct_config.py)
Add unit tests to cover my changes (under unit-test/ folder)
Update the Readme/doc folder relevant to this change (if needed)

The AWS-KMS code in DB node python class uses shared dictionary from the DB cluster class for updating the KMS endpoint region. It was not a problem when DB nodes setup was serial. In this case shared object was changed by each node but had proper value in needed time frame. After implementation of the parallel DB nodes setup [1] we started getting problems that only one state of that shared object was being applied for all nodes. In single-dc setups everything was correct just because there was no diff among DB node's region names. But in multi-dc setups values from DB nodes started being applied to each other. So, fix it by just deep-copying that shared dictionary to avoid updates of a shared object. [1] scylladb#7383 Closes: scylladb#9025

vponomaryov · 2024-10-22T16:59:43Z

Example of the failure: enterprise-2024.2/tier1/longevity-multidc-schema-topology-changes-12h-test#7
Fix tested here: https://argus.scylladb.com/tests/scylla-cluster-tests/42b2d981-559b-4b64-b931-289e9f4ab928

Must be backported to the branch-2024.2.
May be backported to the branch-2024.1, but not needed because of the reasons described in the commit/PR description.

mikliapko

LGTM

soyacz

LGTM

vponomaryov · 2024-10-23T11:54:07Z

Just for history:

Description of the PR is not complete.
The bug appeared not only after parallelization of DB nodes setup [1], but also after converting the append_scylla_yaml SCT config option from string type to dict [2].

[1] #7383
[2] #7554

vponomaryov added the backport/2024.2 Need backport to 2024.2 label Oct 22, 2024

github-actions bot assigned vponomaryov Oct 22, 2024

vponomaryov added the Ready for review label Oct 22, 2024

vponomaryov requested review from mikliapko, fruch and soyacz October 22, 2024 16:55

vponomaryov mentioned this pull request Oct 22, 2024

SCT defines the same aws_region for all nodes of multiDC cluster in scylla.yaml (kms_hosts:auto section) #9025

Closed

vponomaryov added the backport/manager-3.3 label Oct 22, 2024

vponomaryov mentioned this pull request Oct 22, 2024

Manager restore error (kms_error AccessDeniedException) for multiDC cluster with EAR enabled scylladb/scylla-manager#3871

Open

mikliapko approved these changes Oct 22, 2024

View reviewed changes

vponomaryov added backport/6.2 backport/2024.1 Need backport to 2024.1 backport/6.1 Need backport to 6.1 backport/manager-3.2 labels Oct 23, 2024

soyacz approved these changes Oct 23, 2024

View reviewed changes

vponomaryov merged commit ce56367 into scylladb:master Oct 23, 2024
15 checks passed

scylladbbot added the promoted-to-master label Oct 23, 2024

scylladbbot added backport/2024.2-done Commit backported to 2024.2 backport/3.3-done backport/6.2-done and removed backport/2024.2 Need backport to 2024.2 backport/6.2 labels Oct 23, 2024

scylladbbot added backport/6.1-done Commit backported to 6.1 and removed backport/6.1 Need backport to 6.1 labels Oct 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(kms): set proper node regions in multi-dc setups #9026

fix(kms): set proper node regions in multi-dc setups #9026

vponomaryov commented Oct 22, 2024 •

edited

Loading

vponomaryov commented Oct 22, 2024 •

edited

Loading

mikliapko left a comment

soyacz left a comment

vponomaryov commented Oct 23, 2024

fix(kms): set proper node regions in multi-dc setups #9026

fix(kms): set proper node regions in multi-dc setups #9026

Conversation

vponomaryov commented Oct 22, 2024 • edited Loading

Testing

PR pre-checks (self review)

Reminders

vponomaryov commented Oct 22, 2024 • edited Loading

mikliapko left a comment

Choose a reason for hiding this comment

soyacz left a comment

Choose a reason for hiding this comment

vponomaryov commented Oct 23, 2024

vponomaryov commented Oct 22, 2024 •

edited

Loading

vponomaryov commented Oct 22, 2024 •

edited

Loading