Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(kms): set proper node regions in multi-dc setups #9026

Merged
merged 1 commit into from
Oct 23, 2024

Conversation

vponomaryov
Copy link
Contributor

@vponomaryov vponomaryov commented Oct 22, 2024

The AWS-KMS code in DB node python class uses shared dictionary
from the DB cluster class for updating the KMS endpoint region.

It was not a problem when DB nodes setup was serial.
In this case shared object was changed by each node but had proper value in needed time frame.

After implementation of the parallel DB nodes setup (#7383)
we started getting problems that only one state of that shared object was being applied for all nodes.

In single-dc setups everything was correct just because there was no diff among DB node's region names.
But in multi-dc setups values from DB nodes started being applied to each other.

So, fix it by just deep-copying that shared dictionary to avoid updates of a shared object.

Closes: #9025

Testing

PR pre-checks (self review)

  • I added the relevant backport labels
  • I didn't leave commented-out/debugging code

Reminders

  • Add New configuration option and document them (in sdcm/sct_config.py)
  • Add unit tests to cover my changes (under unit-test/ folder)
  • Update the Readme/doc folder relevant to this change (if needed)

The AWS-KMS code in DB node python class uses shared dictionary
from the DB cluster class for updating the KMS endpoint region.

It was not a problem when DB nodes setup was serial.
In this case shared object was changed by each node but had proper
value in needed time frame.

After implementation of the parallel DB nodes setup [1] we started
getting problems that only one state of that shared object was
being applied for all nodes.

In single-dc setups everything was correct just because there was no
diff among DB node's region names.
But in multi-dc setups values from DB nodes started being applied
to each other.

So, fix it by just deep-copying that shared dictionary to avoid
updates of a shared object.

[1] scylladb#7383

Closes: scylladb#9025
@vponomaryov
Copy link
Contributor Author

vponomaryov commented Oct 22, 2024

Example of the failure: enterprise-2024.2/tier1/longevity-multidc-schema-topology-changes-12h-test#7
Fix tested here: https://argus.scylladb.com/tests/scylla-cluster-tests/42b2d981-559b-4b64-b931-289e9f4ab928

Must be backported to the branch-2024.2.
May be backported to the branch-2024.1, but not needed because of the reasons described in the commit/PR description.

Copy link
Contributor

@mikliapko mikliapko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@soyacz soyacz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@scylladbbot scylladbbot added backport/6.1-done Commit backported to 6.1 and removed backport/6.1 Need backport to 6.1 labels Oct 23, 2024
@vponomaryov
Copy link
Contributor Author

Just for history:

Description of the PR is not complete.
The bug appeared not only after parallelization of DB nodes setup [1], but also after converting the append_scylla_yaml SCT config option from string type to dict [2].

[1] #7383
[2] #7554

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SCT defines the same aws_region for all nodes of multiDC cluster in scylla.yaml (kms_hosts:auto section)
4 participants