Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

_get_keyspaces_to_decrease_rf isn't safe enough #8694

Closed
2 tasks
fruch opened this issue Sep 15, 2024 · 1 comment · Fixed by #8695
Closed
2 tasks

_get_keyspaces_to_decrease_rf isn't safe enough #8694

fruch opened this issue Sep 15, 2024 · 1 comment · Fixed by #8695
Assignees

Comments

@fruch
Copy link
Contributor

fruch commented Sep 15, 2024

_get_keyspaces_to_decrease_rf isn't safe enough, and failing as the following:

2024-09-14 08:01:11.191: (DisruptionEvent Severity.ERROR) period_type=end event_id=970146e2-c94f-41a7-9b67-67f465c08e4a duration=1h36m2s: nemesis_name=AddRemoveDc target_node=Node longevity-tls-1tb-7d-master-db-node-ce64f53c-eastus-1 [None | 10.0.0.5] errors=int() argument must be a string, a bytes-like object or a real number, not 'NoneType'
Traceback (most recent call last):
File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 4658, in disrupt_add_remove_dc
self.cluster.decommission(new_node)
File "/home/ubuntu/scylla-cluster-tests/sdcm/cluster.py", line 5019, in decommission
dc_topology_rf_change.decrease_keyspaces_rf()
File "/home/ubuntu/scylla-cluster-tests/sdcm/utils/replication_strategy_utils.py", line 225, in decrease_keyspaces_rf
if decreased_rf_keyspaces := self._get_keyspaces_to_decrease_rf(session=session):
File "/home/ubuntu/scylla-cluster-tests/sdcm/utils/replication_strategy_utils.py", line 183, in _get_keyspaces_to_decrease_rf
rf = int(replication.get(self.datacenter))
TypeError: int() argument must be a string, a bytes-like object or a real number, not 'NoneType'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 5220, in wrapper
result = method(*args[1:], **kwargs)
File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 4624, in disrupt_add_remove_dc
with ExitStack() as context_manager:
File "/usr/local/lib/python3.10/contextlib.py", line 576, in __exit__
raise exc_details[1]
File "/usr/local/lib/python3.10/contextlib.py", line 561, in __exit__
if cb(*exc_details):
File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 4631, in finalizer
self.cluster.decommission(new_node)
File "/home/ubuntu/scylla-cluster-tests/sdcm/cluster.py", line 5019, in decommission
dc_topology_rf_change.decrease_keyspaces_rf()
File "/home/ubuntu/scylla-cluster-tests/sdcm/utils/replication_strategy_utils.py", line 225, in decrease_keyspaces_rf
if decreased_rf_keyspaces := self._get_keyspaces_to_decrease_rf(session=session):
File "/home/ubuntu/scylla-cluster-tests/sdcm/utils/replication_strategy_utils.py", line 183, in _get_keyspaces_to_decrease_rf
rf = int(replication.get(self.datacenter))
TypeError: int() argument must be a string, a bytes-like object or a real number, not 'NoneType'

Packages

Scylla version: 6.2.0~dev-20240912.612a1416604e with build-id 9d734ec7ebdb4b968a379d5efb6fe11e8cc541aa

Kernel Version: 6.8.0-1014-azure

Issue description

  • This issue is a regression.
  • It is unknown if this issue is a regression.

Describe your issue in detail and steps it took to produce it.

Impact

Describe the impact this issue causes to the user.

How frequently does it reproduce?

Describe the frequency with how this issue can be reproduced.

Installation details

Cluster size: 4 nodes (Standard_L16s_v3)

Scylla Nodes used in this run:

  • longevity-tls-1tb-7d-master-db-node-ce64f53c-eastus-7 (null | 10.0.0.5) (shards: 14)
  • longevity-tls-1tb-7d-master-db-node-ce64f53c-eastus-6 (null | 10.0.0.7) (shards: 14)
  • longevity-tls-1tb-7d-master-db-node-ce64f53c-eastus-5 (null | 10.0.0.14) (shards: 14)
  • longevity-tls-1tb-7d-master-db-node-ce64f53c-eastus-4 (null | 10.0.0.8) (shards: 14)
  • longevity-tls-1tb-7d-master-db-node-ce64f53c-eastus-3 (null | 10.0.0.7) (shards: 14)
  • longevity-tls-1tb-7d-master-db-node-ce64f53c-eastus-2 (null | 10.0.0.6) (shards: 14)
  • longevity-tls-1tb-7d-master-db-node-ce64f53c-eastus-1 (null | 10.0.0.5) (shards: 14)

OS / Image: /subscriptions/6c268694-47ab-43ab-b306-3c5514bc4112/resourceGroups/SCYLLA-IMAGES/providers/Microsoft.Compute/images/scylla-6.2.0-dev-x86_64-2024-09-13T02-56-40 (azure: undefined_region)

Test: longevity-1tb-5days-azure-test
Test id: ce64f53c-084b-4445-8b62-784fa80adf1c
Test name: scylla-master/tier1/longevity-1tb-5days-azure-test
Test method: longevity_test.LongevityTest.test_custom_time
Test config file(s):

Logs and commands
  • Restore Monitor Stack command: $ hydra investigate show-monitor ce64f53c-084b-4445-8b62-784fa80adf1c
  • Restore monitor on AWS instance using Jenkins job
  • Show all stored logs command: $ hydra investigate show-logs ce64f53c-084b-4445-8b62-784fa80adf1c

Logs:

Jenkins job URL
Argus

@yarongilor
Copy link
Contributor

The issue is about output of the following query:

< t:2024-09-14 08:01:10,028 f:common.py       l:1320 c:utils                p:DEBUG > Executing CQL 'SELECT keyspace_name, replication FROM system_schema.keyspaces' ...

then a keyspace output did have:

'NetworkTopologyStrategy' in replication['class']

then replication.get(self.datacenter) returns "None" from some reason.
i'll try reproducing to check if it is a scylla issue as well.

yarongilor pushed a commit to yarongilor/scylla-cluster-tests that referenced this issue Sep 15, 2024
…RF value of DC

	In case no keyspace replication-factor value is retrieved in a DC,
	A warning is logged and the keyspace is ignored (skipped).
	Fixes: scylladb#8694
yarongilor pushed a commit to yarongilor/scylla-cluster-tests that referenced this issue Sep 15, 2024
…RF value of DC

In case no keyspace replication-factor value is retrieved in a DC,
A warning is logged and the keyspace is ignored (skipped).
Fixes: scylladb#8694
@fruch fruch closed this as completed in e57d75b Sep 15, 2024
mergify bot pushed a commit that referenced this issue Sep 15, 2024
…RF value of DC

In case no keyspace replication-factor value is retrieved in a DC,
A warning is logged and the keyspace is ignored (skipped).
Fixes: #8694

(cherry picked from commit e57d75b)

# Conflicts:
#	sdcm/utils/replication_strategy_utils.py
mergify bot pushed a commit that referenced this issue Sep 15, 2024
…RF value of DC

In case no keyspace replication-factor value is retrieved in a DC,
A warning is logged and the keyspace is ignored (skipped).
Fixes: #8694

(cherry picked from commit e57d75b)

# Conflicts:
#	sdcm/utils/replication_strategy_utils.py
mergify bot pushed a commit that referenced this issue Sep 15, 2024
…RF value of DC

In case no keyspace replication-factor value is retrieved in a DC,
A warning is logged and the keyspace is ignored (skipped).
Fixes: #8694

(cherry picked from commit e57d75b)
fruch pushed a commit that referenced this issue Sep 15, 2024
…RF value of DC

In case no keyspace replication-factor value is retrieved in a DC,
A warning is logged and the keyspace is ignored (skipped).
Fixes: #8694

(cherry picked from commit e57d75b)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants