-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
'disrupt_resetlocalschema' nemesis fails for longevity-mv-si-4days
test
#8534
Comments
Why do you think it's SCT issue instead of Scylladb? |
I was led by example of how the same issue/symptoms was reported in the past for K8S related config in #6229 (and comments in that issue suggest that it is an SCT issue). |
I'm not sure it's enough to figure why it was failing. |
Seems this is happening again and again Please cross check the logs of those runs in 2024.1, sounds like we are missing the expected prints (short timeout, or starting to look too late), or the operation doesn't happen on scylla side. This flow isn't working for quite some time in master, since this command isn't supported in new scylla nodetool, but seems a regression in 2024.1 that we should chase down |
The issue repeated again in 2024.1.11 - longevity-mv-si-4days-test. As per system.log of the target node-7 the pattern that the
The problem is that in node-7 messages.log the records about reset_local_schema api request and then the corresponding
And as per scylla-cluster-tests/sdcm/cluster.py Line 562 in 318be43
In general, there is some mess with logs in node-7 messages.log (it is probably the same as #6682):
As per messages.log, the 1st syslog-ng connection drop message is at 01:17:14. But it's just probably wrong timestamp and the drops are likely started somewhere at 23:48:14. |
maybe we should backport #8743 also to 2024.1? |
let's try and test this specific nemesis for couple of hours |
backported. But I'm not sure which test to run - it looks very random issue and I don't see the point of retrying just for this purpose. Can we wait for another round? |
Is this still relevant? |
not clear enough, a suspected fix was backported closing for now |
Packages
Scylla version:
2024.1.9-20240829.d583605198a7
with build-idc0a1a483aad4949fe2ed11479d5a99e92672bb2b
Kernel Version:
5.15.0-1068-aws
Issue description
longevity-mv-si-4days
lengevity scenario failed with the error:Impact
disrupt_resetlocalschema
nemesis should passHow frequently does it reproduce?
No occurrences of the issue were noticed since 2023 in #6229
Installation details
Cluster size: 5 nodes (i4i.8xlarge)
Scylla Nodes used in this run:
OS / Image:
ami-0571d896a052e46a3
(aws: undefined_region)Test:
longevity-mv-si-4days-test
Test id:
7f557fdd-8e48-40f3-bb8f-7c1524f8abd0
Test name:
enterprise-2024.1/longevity/longevity-mv-si-4days-test
Test method:
longevity_test.LongevityTest.test_custom_time
Test config file(s):
Logs and commands
$ hydra investigate show-monitor 7f557fdd-8e48-40f3-bb8f-7c1524f8abd0
$ hydra investigate show-logs 7f557fdd-8e48-40f3-bb8f-7c1524f8abd0
Logs:
Jenkins job URL
Argus
The text was updated successfully, but these errors were encountered: