Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Fullscanoperation thread to choose only alive node #9284

Open
2 tasks
aleksbykov opened this issue Nov 19, 2024 · 2 comments · May be fixed by #9600 or #9370
Open
2 tasks

Fix Fullscanoperation thread to choose only alive node #9284

aleksbykov opened this issue Nov 19, 2024 · 2 comments · May be fixed by #9600 or #9370
Assignees

Comments

@aleksbykov
Copy link
Contributor

Packages

Scylla version: 2024.2.0-20241118.614d56348f46 with build-id e67376d9ddfea081a3bab398f4581ecdde59911d

Kernel Version: 5.15.0-1072-aws

Issue description

  • This issue is a regression.
  • It is unknown if this issue is a regression.

Full scan operation chose node which was then used by rolling restart nemesis and cause the error message:

2024-11-18 22:32:44.845: (FullScanAggregateEvent Severity.ERROR) period_type=end event_id=e79676a5-c38c-4b33-b7cf-b3f9d96610a8 during_nemesis=RollingRestartCluster duration=13s node=longevity-tls-50gb-3d-2024-2-db-node-c5d16022-6 select_from=keyspace1.standard1 message=FullScanAggregatesOperation operation failed, ReadTimeout error: ReadTimeout('Error from server: code=1200 [Coordinator node timed out waiting for replica nodes\' responses] message="Operation failed for keyspace1.standard1 - received 0 responses and 1 failures from 1 CL=ONE." info={\'consistency\': \'ONE\', \'required_responses\': 1, \'received_responses\': 0}')

Need to fix FullScan thread to choose only alive nodes
-or-
Fix rolling restart nemesis to mark restarting node as busy for other operations

Impact

Reported Error event mark job as failed.

How frequently does it reproduce?

Describe the frequency with how this issue can be reproduced.

Installation details

Cluster size: 6 nodes (i4i.4xlarge)

Scylla Nodes used in this run:

  • longevity-tls-50gb-3d-2024-2-db-node-c5d16022-9 (52.4.92.28 | 10.12.35.142) (shards: 14)
  • longevity-tls-50gb-3d-2024-2-db-node-c5d16022-8 (98.85.39.206 | 10.12.35.198) (shards: 14)
  • longevity-tls-50gb-3d-2024-2-db-node-c5d16022-7 (52.72.119.13 | 10.12.34.73) (shards: 14)
  • longevity-tls-50gb-3d-2024-2-db-node-c5d16022-6 (44.214.249.197 | 10.12.34.72) (shards: 14)
  • longevity-tls-50gb-3d-2024-2-db-node-c5d16022-5 (35.172.65.4 | 10.12.32.10) (shards: 14)
  • longevity-tls-50gb-3d-2024-2-db-node-c5d16022-4 (34.227.247.191 | 10.12.35.166) (shards: 14)
  • longevity-tls-50gb-3d-2024-2-db-node-c5d16022-3 (50.19.104.112 | 10.12.32.218) (shards: 14)
  • longevity-tls-50gb-3d-2024-2-db-node-c5d16022-2 (34.237.79.41 | 10.12.34.15) (shards: 14)
  • longevity-tls-50gb-3d-2024-2-db-node-c5d16022-1 (54.81.140.125 | 10.12.34.222) (shards: 14)

OS / Image: ami-06d63888ff4cf3d3f (aws: undefined_region)

Test: longevity-50gb-3days-test
Test id: c5d16022-93b6-44b1-9bab-22571a3eade5
Test name: enterprise-2024.2/tier1/longevity-50gb-3days-test
Test method: longevity_test.LongevityTest.test_custom_time
Test config file(s):

Logs and commands
  • Restore Monitor Stack command: $ hydra investigate show-monitor c5d16022-93b6-44b1-9bab-22571a3eade5
  • Restore monitor on AWS instance using Jenkins job
  • Show all stored logs command: $ hydra investigate show-logs c5d16022-93b6-44b1-9bab-22571a3eade5

Logs:

Jenkins job URL
Argus

@temichus temichus self-assigned this Nov 19, 2024
@roydahan
Copy link
Contributor

I also noticed this issue.
Is it really relevant only to rolling restart?
Isn't it relevant to every FullScan that may happen during disruptive nemesis?

temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Nov 26, 2024
this commit has the following changes

1 introduce acommon targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 common run_nemesis wrapper can provide a node
that is not under disruptive_nemesis together with providing a node with no nemesis.
This will allow non-disruptive operations to pick the same node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

fixes: scylladb#9284
@temichus temichus linked a pull request Nov 26, 2024 that will close this issue
2 tasks
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Nov 26, 2024
this commit has the following changes

1 introduce acommon targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 common run_nemesis wrapper can provide a node
that is not under disruptive_nemesis together with providing a node with no nemesis.
This will allow non-disruptive operations to pick the same node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

fixes: scylladb#9284
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Nov 26, 2024
this commit has the following changes

1 introduce common targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 common run_nemesis wrapper can provide a node
that is not under disruptive_nemesis together with providing a node with no nemesis.
This will allow non-disruptive operations to pick the same node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

fixes: scylladb#9284
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Nov 26, 2024
this commit has the following changes

1 introduce common targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 common run_nemesis wrapper can provide a node
that is not under disruptive_nemesis together with providing a node with no nemesis.
This will allow non-disruptive operations to pick the same node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

fixes: scylladb#9284
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Nov 26, 2024
this commit has the following changes

1 introduce common targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 common run_nemesis wrapper can provide a node
that is not under disruptive_nemesis together with providing a node with no nemesis.
This will allow non-disruptive operations to pick the same node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

fixes: scylladb#9284
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Nov 26, 2024
this commit has the following changes

1 introduce common targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 common run_nemesis wrapper can provide a node
that is not under disruptive_nemesis together with providing a node with no nemesis.
This will allow non-disruptive operations to pick the same node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

fixes: scylladb#9284
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Nov 26, 2024
this commit has the following changes

1 introduce common targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 FullScan operation now run only on free of nemeses node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

fixes: scylladb#9284
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Nov 27, 2024
this commit has the following changes

1 introduce common targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 FullScan operation now run only on free of nemeses node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

fixes: scylladb#9284
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Nov 27, 2024
this commit has the following changes

1 introduce common targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 FullScan operation now run only on free of nemeses node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

fixes: scylladb#9284
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Nov 27, 2024
this commit has the following changes

1 introduce common targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 FullScan operation now run only on free of nemeses node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

fixes: scylladb#9284
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Nov 27, 2024
this commit has the following changes

1 introduce common targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 FullScan operation now run only on free of nemeses node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

fixes: scylladb#9284
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Nov 28, 2024
this commit has the following changes

1 introduce common targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 FullScan operation now run only on free of nemeses node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

fixes: scylladb#9284
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Dec 1, 2024
this commit has the following changes

1 introduce common targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 FullScan operation now run only on free of nemeses node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

fixes: scylladb#9284
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Dec 1, 2024
this commit has the following changes

1 introduce common targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 FullScan operation now run only on free of nemeses node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

fixes: scylladb#9284
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Dec 1, 2024
this commit has the following changes

1 introduce common targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 FullScan operation now run only on free of nemeses node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

fixes: scylladb#9284
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Dec 1, 2024
this commit has the following changes

1 introduce common targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 FullScan operation now run only on free of nemeses node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

fixes: scylladb#9284
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Dec 1, 2024
this commit has the following changes

1 introduce common targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 FullScan operation now run only on free of nemeses node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

fixes: scylladb#9284
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Dec 2, 2024
this commit has the following changes

1 introduce common targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 FullScan operation now run only on free of nemeses node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

fixes: scylladb#9284
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Dec 2, 2024
this commit has the following changes

1 introduce common targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 FullScan operation now run only on free of nemeses node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

fixes: scylladb#9284
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Dec 2, 2024
this commit has the following changes

1 introduce common targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 FullScan operation now run only on free of nemeses node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

fixes: scylladb#9284
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Dec 2, 2024
this commit has the following changes

1 introduce common targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 FullScan operation now run only on free of nemeses node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

fixes: scylladb#9284
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Dec 2, 2024
this commit has the following changes

1 introduce common targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 FullScan operation now run only on free of nemeses node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

fixes: scylladb#9284
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Dec 3, 2024
this commit has the following changes

1 introduce common targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 FullScan operation now run only on free of nemeses node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

fixes: scylladb#9284
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Dec 3, 2024
this commit has the following changes

1 introduce common targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 FullScan operation now run only on free of nemeses node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

4 change disrupt_rolling_restart_cluster nemesis to lock all nodes in
the cluster befo performing restart

fixes: scylladb#9284
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Dec 3, 2024
this commit has the following changes

1 introduce common targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 FullScan operation now run only on free of nemeses node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

4 change disrupt_rolling_restart_cluster nemesis to lock all nodes in
the cluster befo performing restart

fixes: scylladb#9284
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Dec 3, 2024
this commit has the following changes

1 introduce common targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 FullScan operation now run only on free of nemeses node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

4 change disrupt_rolling_restart_cluster nemesis to lock all nodes in
the cluster befo performing restart

fixes: scylladb#9284
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Dec 3, 2024
this commit has the following changes

1 introduce common targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 FullScan operation now run only on free of nemeses node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

4 change disrupt_rolling_restart_cluster nemesis to lock all nodes in
the cluster befo performing restart

fixes: scylladb#9284
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Dec 3, 2024
this commit has the following changes

1 introduce common targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 FullScan operation now run only on free of nemeses node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

4 change disrupt_rolling_restart_cluster nemesis to lock all nodes in
the cluster befo performing restart

fixes: scylladb#9284
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Dec 3, 2024
this commit has the following changes

1 introduce common targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 FullScan operation now run only on free of nemeses node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

4 change disrupt_rolling_restart_cluster nemesis to lock all nodes in
the cluster befo performing restart

fixes: scylladb#9284
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Dec 9, 2024
this commit has the following changes

1 introduce common targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 FullScan operation now run only on free of nemeses node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

4 change disrupt_rolling_restart_cluster nemesis to lock all nodes in
the cluster befo performing restart

fixes: scylladb#9284
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Dec 9, 2024
this commit has the following changes

1 introduce common targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 FullScan operation now run only on free of nemeses node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

4 change disrupt_rolling_restart_cluster nemesis to lock all nodes in
the cluster befo performing restart

fixes: scylladb#9284
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Dec 9, 2024
this commit has the following changes

1 introduce common targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 FullScan operation now run only on free of nemeses node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

4 change disrupt_rolling_restart_cluster nemesis to lock all nodes in
the cluster befo performing restart

fixes: scylladb#9284
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Dec 10, 2024
this commit has the following changes

1 introduce common targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 FullScan operation now run only on free of nemeses node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

4 change disrupt_rolling_restart_cluster nemesis to lock all nodes in
the cluster befo performing restart

fixes: scylladb#9284
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Dec 10, 2024
this commit has the following changes

1 introduce common targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 FullScan operation now run only on free of nemeses node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

4 change disrupt_rolling_restart_cluster nemesis to lock all nodes in
the cluster befo performing restart

fixes: scylladb#9284
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Dec 10, 2024
this commit has the following changes

1 introduce common targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 FullScan operation now run only on free of nemeses node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

4 change disrupt_rolling_restart_cluster nemesis to lock all nodes in
the cluster befo performing restart

fixes: scylladb#9284
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Dec 10, 2024
this commit has the following changes

1 introduce common targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 FullScan operation now run only on free of nemeses node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

4 change disrupt_rolling_restart_cluster nemesis to lock all nodes in
the cluster befo performing restart

fixes: scylladb#9284
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Dec 10, 2024
this commit has the following changes

1 introduce common targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 FullScan operation now run only on free of nemeses node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

4 change disrupt_rolling_restart_cluster nemesis to lock all nodes in
the cluster befo performing restart

fixes: scylladb#9284
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Dec 10, 2024
this commit has the following changes

1 introduce common targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 FullScan operation now run only on free of nemeses node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

4 change disrupt_rolling_restart_cluster nemesis to lock all nodes in
the cluster befo performing restart

fixes: scylladb#9284
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Dec 10, 2024
this commit has the following changes

1 introduce common targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 FullScan operation now run only on free of nemeses node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

4 change disrupt_rolling_restart_cluster nemesis to lock all nodes in
the cluster befo performing restart

fixes: scylladb#9284
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Dec 10, 2024
this commit has the following changes

1 introduce common targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 FullScan operation now run only on free of nemeses node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

4 change disrupt_rolling_restart_cluster nemesis to lock all nodes in
the cluster befo performing restart

fixes: scylladb#9284
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Dec 11, 2024
this commit has the following changes

1 introduce common targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 FullScan operation now run only on free of nemeses node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

4 change disrupt_rolling_restart_cluster nemesis to lock all nodes in
the cluster befo performing restart

fixes: scylladb#9284
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Dec 11, 2024
this commit has the following changes

1 introduce common targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 FullScan operation now run only on free of nemeses node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

4 change disrupt_rolling_restart_cluster nemesis to lock all nodes in
the cluster befo performing restart

fixes: scylladb#9284
@yarongilor
Copy link
Contributor

reproduced for disrupt_rolling_config_change_internode_compression:

2024-12-18 04:04:06.271: (FullScanAggregateEvent Severity.ERROR) period_type=end event_id=88be61c2-ee61-48fc-aec8-52a48cfae87d during_nemesis=RollingConfigChangeInternodeCompression duration=32s node=longevity-10gb-3h-master-db-node-413a3a9b-eastus-4 select_from=keyspace1.standard1 message=FullScanAggregatesOperation operation failed, ReadTimeout error: ReadTimeout('Error from server: code=1200 [Coordinator node timed out waiting for replica nodes\' responses] message="Operation failed for keyspace1.standard1 - received 0 responses and 1 failures from 1 CL=ONE." info={\'consistency\': \'ONE\', \'required_responses\': 1, \'received_responses\': 0}')

Packages

Scylla version: 6.3.0~dev-20241217.01cdba9a9894 with build-id f5cdbc08a2634f6f378e901fbb10a27fc164783e

Kernel Version: 6.8.0-1018-azure

Issue description

  • This issue is a regression.
  • It is unknown if this issue is a regression.

Describe your issue in detail and steps it took to produce it.

Impact

Describe the impact this issue causes to the user.

How frequently does it reproduce?

Describe the frequency with how this issue can be reproduced.

Installation details

Cluster size: 6 nodes (Standard_L8s_v3)

Scylla Nodes used in this run:

  • longevity-10gb-3h-master-db-node-413a3a9b-eastus-9 (null | 10.0.0.5) (shards: 7)
  • longevity-10gb-3h-master-db-node-413a3a9b-eastus-8 (null | 10.0.0.6) (shards: 7)
  • longevity-10gb-3h-master-db-node-413a3a9b-eastus-7 (null | 10.0.0.7) (shards: 7)
  • longevity-10gb-3h-master-db-node-413a3a9b-eastus-6 (null | 10.0.0.10) (shards: 7)
  • longevity-10gb-3h-master-db-node-413a3a9b-eastus-5 (null | 10.0.0.9) (shards: 7)
  • longevity-10gb-3h-master-db-node-413a3a9b-eastus-4 (null | 10.0.0.8) (shards: 7)
  • longevity-10gb-3h-master-db-node-413a3a9b-eastus-3 (null | 10.0.0.7) (shards: 7)
  • longevity-10gb-3h-master-db-node-413a3a9b-eastus-2 (null | 10.0.0.6) (shards: 7)
  • longevity-10gb-3h-master-db-node-413a3a9b-eastus-11 (null | 10.0.0.10) (shards: 7)
  • longevity-10gb-3h-master-db-node-413a3a9b-eastus-10 (null | 10.0.0.14) (shards: -1)
  • longevity-10gb-3h-master-db-node-413a3a9b-eastus-1 (null | 10.0.0.5) (shards: 7)

OS / Image: /subscriptions/6c268694-47ab-43ab-b306-3c5514bc4112/resourceGroups/scylla-images/providers/Microsoft.Compute/images/scylla-6.3.0-dev-x86_64-2024-12-18T02-02-40 (azure: undefined_region)

Test: longevity-10gb-3h-azure-test
Test id: 413a3a9b-fe7b-4e5e-b864-6f1f26628226
Test name: scylla-master/longevity/longevity-10gb-3h-azure-test
Test method: longevity_test.LongevityTest.test_custom_time
Test config file(s):

Logs and commands
  • Restore Monitor Stack command: $ hydra investigate show-monitor 413a3a9b-fe7b-4e5e-b864-6f1f26628226
  • Restore monitor on AWS instance using Jenkins job
  • Show all stored logs command: $ hydra investigate show-logs 413a3a9b-fe7b-4e5e-b864-6f1f26628226

Logs:

Jenkins job URL
Argus

aleksbykov added a commit to aleksbykov/scylla-cluster-tests that referenced this issue Dec 22, 2024
The node where scan operations was started could be
used by disruptive nemesis. If node was restarted/stopped
while scan query had been running, the scan operation would
be terminated and error event and message will mark
test as failed.

Add to cql session ExponetionalBackoffRetryPolicy
which allow to retry the query, if node was down
and once it back, query will be succesfully finished

Fixes: scylladb#9284
@fruch fruch linked a pull request Dec 23, 2024 that will close this issue
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants