Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

last runs with gemini v1.8.2, oracle is missing rows #345

Closed
fruch opened this issue Jun 11, 2023 · 11 comments
Closed

last runs with gemini v1.8.2, oracle is missing rows #345

fruch opened this issue Jun 11, 2023 · 11 comments
Assignees

Comments

@fruch
Copy link
Collaborator

fruch commented Jun 11, 2023

look like in the last runs of gemini we are getting failures we didn't got before v1.8.2:
https://jenkins.scylladb.com/job/scylla-master/job/gemini-/job/gemini-3h-with-nemesis-test/380/
https://jenkins.scylladb.com/job/scylla-master/job/gemini-/job/gemini-3h-with-nemesis-test/381/

and oracles are missing rows, and this is happening quite at the beginning of the test before it start doing nemeis

@dkropachev
Copy link
Collaborator

Thanks @fruch for reporting it.
I am looking into it

@dkropachev dkropachev self-assigned this Jun 11, 2023
@dkropachev
Copy link
Collaborator

dkropachev commented Jun 11, 2023

Here is original error:

{"L":"INFO","T":"2023-06-11T08:05:16.152Z","N":"work cycle.validation_job","M":"Validation failed. Error: unable to load check data from the oracle store: system failed: gocql: no response received from cassandra within timeout period"}
Error detected: &joberror.JobError{Timestamp:time.Date(2023, time.June, 11, 8, 5, 16, 152043820, time.Local), Message:"Validation failed: unable to load check data from the oracle store: system failed: gocql: no response received from cassandra within timeout period", Query:"SELECT * FROM ks1.table1_mv_0 WHERE col8=d2808980-8b12-11b8-995a-0a420e931031 AND pk0=1 AND pk1=228134926 AND pk2=4333304577001056359 AND pk3=97 AND pk4=-28154 AND pk5=515616374 "}{"L":"INFO","T":"2023-06-11T08:05:16.152Z","N":"work cycle.validation_job","M":"ending validation loop"}

It shows two problems:

@fruch
Copy link
Collaborator Author

fruch commented Jun 18, 2023

@dkropachev

Seems like those are stilling happening:
https://argus.scylladb.com/test/c05bf635-e29a-490c-a72f-9d4e42e6c7e7/runs?additionalRuns[]=b6fc4ff5-1120-404e-b8f4-8dd19738bc91

do we have fixes for those ready ?

@roydahan
Copy link
Collaborator

@fruch can we try this with older oracle version? enterprise 2020.1.x or 2021.1.x (if it's not already the one).
I want to know if it's a regression in Scylla.

@fruch
Copy link
Collaborator Author

fruch commented Jun 18, 2023

@fruch can we try this with older oracle version? enterprise 2020.1.x or 2021.1.x (if it's not already the one). I want to know if it's a regression in Scylla.

those failures has started showing since we move to v1.8.2, I don't thing it's an issue with the oracle itself.
we could try, but I'm guessing it would fail the same.

@fruch
Copy link
Collaborator Author

fruch commented Jun 18, 2023

oracle now is:
2021.1.15

@fruch
Copy link
Collaborator Author

fruch commented Jun 18, 2023

@dkropachev
Copy link
Collaborator

@fruch , @roydahan , this issue partially fixed on v1.8.3, chance of getting it greatly reduced there, but in order to make it 100% go away we need to implement proper timeout handling

@dkropachev
Copy link
Collaborator

dkropachev commented Jun 18, 2023

@fruch, @roydahan, ok it looks like there are two different issues, one - is related to multirow, any error on multirow is related to what I have been saying, including timeouts and missing rows.

Other issue is other missing rows that popup when gemini is terminating the test cycle, it is also gemini issue, that needs to be addressed.
This is exactly what happened here and on issue before.

@fruch
Copy link
Collaborator Author

fruch commented Jun 18, 2023

running with 2020.1.14 https://jenkins.scylladb.com/view/staging/job/scylla-staging/job/fruch/job/gemini-3h-test/9/

this run confirms the failure isn't related to the version of the oracle.

@dkropachev
Copy link
Collaborator

Both issues that led to this error are fixed:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants