Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log details required for debugging failed tests #409

Closed
ptrsmrn opened this issue Dec 12, 2023 · 4 comments · Fixed by #413
Closed

Log details required for debugging failed tests #409

ptrsmrn opened this issue Dec 12, 2023 · 4 comments · Fixed by #413
Assignees

Comments

@ptrsmrn
Copy link

ptrsmrn commented Dec 12, 2023

Given:
https://github.com/scylladb/scylla-enterprise/issues/3642#issuecomment-1849642826:

tool is not trying to explain what happened, or what sequence of operations led to bad result

https://github.com/scylladb/scylla-enterprise/issues/3642#issuecomment-1838697317:

We don't know anything about the statements that gemini executed — we only know the result.

https://github.com/scylladb/scylla-enterprise/issues/3642#issuecomment-1838820807:

I don't think I will be able to figure anything out without reproducing this issue with some actual logs.

can we improve logging to the point that the cited issue and alike issues are (easily) debuggable? That would include:

  1. logging executed statements,
  2. ability to re-execute these statements using the gemini framework.

cc @mykaul @nuivall

@bhalevy
Copy link
Member

bhalevy commented Dec 12, 2023

For the purpose of reproducing issues, it would also be good to be able to replay the queries from the log.
For that I believe that if the log would be of CQL commands (USING TIMESTAMP), we could just use cqlsh to replay them.

@dkropachev
Copy link
Collaborator

dkropachev commented Dec 12, 2023

@ptrsmrn , @bhalevy, I have different perspective on the issue, let's discuss it:

  1. Issue is 100% valid and I have been thinking on how to solve it for quite long time.
  2. I don't think logging sql statements is not best solution, since it is going to have tons of data, most of which is not usefull
    Say it runs modest 500 rq/s, it is going to generate 1800000 records over an hour, smallest sct test is 4h test, so you gonna get 7200000, which is in the very very best case translates to 196 mb of data, which is hard to store, retrieve and analyze.

Instead of logging (or in parallel to it) I would suggest the following:

  1. Every query is remembered by gemini (memory is not an option, we can go with sqlite), in such manner that it can be found by pk/ck
  2. Test-wide queries, like schema changes, are stored separately
  3. When error happens gemini pulls all the queries that where executed for given pk/ck with test-wide queries and logs/prints them

Reproducibility problem should be addressed by setting seeds, unfortunately it does not work properly, #370, but once issue is fixed you should be able to reproduce exact CQL flow by just setting --schema-seed and --seed

@bhalevy
Copy link
Member

bhalevy commented Dec 12, 2023

@ptrsmrn , @bhalevy, I have different perspective on the issue, let's discuss it:

  1. Issue is 100% valid and I have been thinking on how to solve it for quite long time.
  2. I don't think logging sql statements is not best solution, since it is going to have tons of data, most of which is not usefull
    Say it runs modest 500 rq/s, it is going to generate 1800000 records over an hour, smallest sct test is 4h test, so you gonna get 7200000, which is in the very very best case translates to 196 mb of data, which is hard to store, retrieve and analyze.

196 MB are not hard to store.

As for analyzing the log, leave it to the engineer who's working on the issue.
One can use other tools like grep or write scripts to automate parts of the analysis.
But we need the raw data to start with.
Hiding stuff based on what we know today doesn't help since you will need that data for analyzing future problems that are unknown yet, so I would suggest to "expect the unexpected".

Instead of logging (or in parallel to it) I would suggest the following:

  1. Every query is remembered by gemini (memory is not an option, we can go with sqlite), in such manner that it can be found by pk/ck
  2. Test-wide queries, like schema changes, are stored separately
  3. When error happens gemini pulls all the queries that where executed for given pk/ck with test-wide queries and logs/prints them

Reproducibility problem should be addressed by setting seeds, unfortunately it does not work properly, #370, but once issue is fixed you should be able to reproduce exact CQL flow by just setting --schema-seed and --seed

@ptrsmrn
Copy link
Author

ptrsmrn commented Dec 13, 2023

@dkropachev Saving logs only for failed tests makes sense, given their size. How gemini framework stores them internally, a file or sqlite, is not a concern to devs as long as devs don't need to query sqlite, so gemini logging them onto the screen/file or anything alike, as you suggested, is IMO good - essentially I'd just make these extra logs easily available for devs, so we don't hear further complaints ;)
A note regarding the file size - best to not produce too big files that are hard to open or are bloated with "spam", because it'll make debugging process harder, but that's just a wishful thinking, as I understand each test produces tons of queries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants