Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade db packages stops all nodes when growing cluster by 3 in parallel (custom db packages) #8551

Closed
soyacz opened this issue Sep 4, 2024 · 2 comments · Fixed by #8857
Assignees

Comments

@soyacz
Copy link
Contributor

soyacz commented Sep 4, 2024

A test with custom scylla db packages (update_db_packages param set).
When growing cluster by 3 in parallel, SCT stops all the nodes instead of only added ones.
Culprit line:

if len(node_list) == 1:

Impact

Fail the test due c-s errors when stopping all the nodes.

How frequently does it reproduce?

Always when growing in parallel and using custom db packages.

Installation details

Cluster size: 3 nodes (i4i.2xlarge)

Scylla Nodes used in this run:

  • elasticity-test-ubuntu-db-node-bc75f3a1-6 (18.202.56.4 | 10.4.1.158) (shards: 7)
  • elasticity-test-ubuntu-db-node-bc75f3a1-5 (34.243.57.142 | 10.4.0.46) (shards: 7)
  • elasticity-test-ubuntu-db-node-bc75f3a1-4 (34.240.37.34 | 10.4.2.211) (shards: 7)
  • elasticity-test-ubuntu-db-node-bc75f3a1-3 (34.245.179.48 | 10.4.2.65) (shards: 7)
  • elasticity-test-ubuntu-db-node-bc75f3a1-2 (3.254.86.25 | 10.4.0.13) (shards: 7)
  • elasticity-test-ubuntu-db-node-bc75f3a1-1 (34.244.12.247 | 10.4.0.137) (shards: 7)

OS / Image: ami-0415b87a177bf40a6 (aws: undefined_region)

Test: scylla-enterprise-perf-regression-latency-650gb-elasticity
Test id: bc75f3a1-389f-4c3e-a84f-ef388d9bd03c
Test name: scylla-staging/lukasz/scylla-enterprise-perf-regression-latency-650gb-elasticity
Test method: performance_regression_test.PerformanceRegressionTest.test_latency_mixed_with_nemesis
Test config file(s):

Logs and commands
  • Restore Monitor Stack command: $ hydra investigate show-monitor bc75f3a1-389f-4c3e-a84f-ef388d9bd03c
  • Restore monitor on AWS instance using Jenkins job
  • Show all stored logs command: $ hydra investigate show-logs bc75f3a1-389f-4c3e-a84f-ef388d9bd03c

Logs:

Jenkins job URL
Argus

@fruch
Copy link
Contributor

fruch commented Sep 8, 2024

@soyacz this logic can go, we don't care about the ordering of starting nodes anymore, we can remove that if, and remove all the else branch

we should just stop/stop the node that are being asked, we shouldn't touch any other nodes at that point, it's a mistake

@soyacz
Copy link
Contributor Author

soyacz commented Sep 9, 2024

@soyacz this logic can go, we don't care about the ordering of starting nodes anymore, we can remove that if, and remove all the else branch

we should just stop/stop the node that are being asked, we shouldn't touch any other nodes at that point, it's a mistake

Yes, shouldn't be hard to fix, let's plan it for this sprint.

soyacz added a commit to soyacz/scylla-cluster-tests that referenced this issue Sep 27, 2024
When updating multiple nodes db packages, SCT stops all the nodes.
This is wrong and causes tests to fail when providing
`update_db_packages` param.

Fix by removing broken logic and dropping code for nodes stop ordering.

fixes: scylladb#8551
@fruch fruch closed this as completed in c140e7b Sep 29, 2024
mergify bot pushed a commit that referenced this issue Sep 29, 2024
When updating multiple nodes db packages, SCT stops all the nodes.
This is wrong and causes tests to fail when providing
`update_db_packages` param.

Fix by removing broken logic and dropping code for nodes stop ordering.

fixes: #8551
(cherry picked from commit c140e7b)
mergify bot pushed a commit that referenced this issue Sep 29, 2024
When updating multiple nodes db packages, SCT stops all the nodes.
This is wrong and causes tests to fail when providing
`update_db_packages` param.

Fix by removing broken logic and dropping code for nodes stop ordering.

fixes: #8551
(cherry picked from commit c140e7b)
mergify bot pushed a commit that referenced this issue Sep 29, 2024
When updating multiple nodes db packages, SCT stops all the nodes.
This is wrong and causes tests to fail when providing
`update_db_packages` param.

Fix by removing broken logic and dropping code for nodes stop ordering.

fixes: #8551
(cherry picked from commit c140e7b)
mergify bot pushed a commit that referenced this issue Sep 29, 2024
When updating multiple nodes db packages, SCT stops all the nodes.
This is wrong and causes tests to fail when providing
`update_db_packages` param.

Fix by removing broken logic and dropping code for nodes stop ordering.

fixes: #8551
(cherry picked from commit c140e7b)
mergify bot pushed a commit that referenced this issue Sep 29, 2024
When updating multiple nodes db packages, SCT stops all the nodes.
This is wrong and causes tests to fail when providing
`update_db_packages` param.

Fix by removing broken logic and dropping code for nodes stop ordering.

fixes: #8551
(cherry picked from commit c140e7b)
fruch pushed a commit that referenced this issue Sep 29, 2024
When updating multiple nodes db packages, SCT stops all the nodes.
This is wrong and causes tests to fail when providing
`update_db_packages` param.

Fix by removing broken logic and dropping code for nodes stop ordering.

fixes: #8551
(cherry picked from commit c140e7b)
fruch pushed a commit that referenced this issue Sep 29, 2024
When updating multiple nodes db packages, SCT stops all the nodes.
This is wrong and causes tests to fail when providing
`update_db_packages` param.

Fix by removing broken logic and dropping code for nodes stop ordering.

fixes: #8551
(cherry picked from commit c140e7b)
fruch pushed a commit that referenced this issue Sep 29, 2024
When updating multiple nodes db packages, SCT stops all the nodes.
This is wrong and causes tests to fail when providing
`update_db_packages` param.

Fix by removing broken logic and dropping code for nodes stop ordering.

fixes: #8551
(cherry picked from commit c140e7b)
fruch pushed a commit that referenced this issue Sep 29, 2024
When updating multiple nodes db packages, SCT stops all the nodes.
This is wrong and causes tests to fail when providing
`update_db_packages` param.

Fix by removing broken logic and dropping code for nodes stop ordering.

fixes: #8551
(cherry picked from commit c140e7b)
fruch pushed a commit that referenced this issue Sep 29, 2024
When updating multiple nodes db packages, SCT stops all the nodes.
This is wrong and causes tests to fail when providing
`update_db_packages` param.

Fix by removing broken logic and dropping code for nodes stop ordering.

fixes: #8551
(cherry picked from commit c140e7b)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants