Log details required for debugging failed tests #409

ptrsmrn · 2023-12-12T14:25:35Z

Given:
https://github.com/scylladb/scylla-enterprise/issues/3642#issuecomment-1849642826:

tool is not trying to explain what happened, or what sequence of operations led to bad result

https://github.com/scylladb/scylla-enterprise/issues/3642#issuecomment-1838697317:

We don't know anything about the statements that gemini executed — we only know the result.

https://github.com/scylladb/scylla-enterprise/issues/3642#issuecomment-1838820807:

I don't think I will be able to figure anything out without reproducing this issue with some actual logs.

can we improve logging to the point that the cited issue and alike issues are (easily) debuggable? That would include:

logging executed statements,
ability to re-execute these statements using the gemini framework.

cc @mykaul @nuivall

bhalevy · 2023-12-12T15:28:31Z

For the purpose of reproducing issues, it would also be good to be able to replay the queries from the log.
For that I believe that if the log would be of CQL commands (USING TIMESTAMP), we could just use cqlsh to replay them.

dkropachev · 2023-12-12T17:56:42Z

@ptrsmrn , @bhalevy, I have different perspective on the issue, let's discuss it:

Issue is 100% valid and I have been thinking on how to solve it for quite long time.
I don't think logging sql statements is not best solution, since it is going to have tons of data, most of which is not usefull
Say it runs modest 500 rq/s, it is going to generate 1800000 records over an hour, smallest sct test is 4h test, so you gonna get 7200000, which is in the very very best case translates to 196 mb of data, which is hard to store, retrieve and analyze.

Instead of logging (or in parallel to it) I would suggest the following:

Every query is remembered by gemini (memory is not an option, we can go with sqlite), in such manner that it can be found by pk/ck
Test-wide queries, like schema changes, are stored separately
When error happens gemini pulls all the queries that where executed for given pk/ck with test-wide queries and logs/prints them

Reproducibility problem should be addressed by setting seeds, unfortunately it does not work properly, #370, but once issue is fixed you should be able to reproduce exact CQL flow by just setting --schema-seed and --seed

bhalevy · 2023-12-12T18:41:02Z

@ptrsmrn , @bhalevy, I have different perspective on the issue, let's discuss it:

Issue is 100% valid and I have been thinking on how to solve it for quite long time.

I don't think logging sql statements is not best solution, since it is going to have tons of data, most of which is not usefull
Say it runs modest 500 rq/s, it is going to generate 1800000 records over an hour, smallest sct test is 4h test, so you gonna get 7200000, which is in the very very best case translates to 196 mb of data, which is hard to store, retrieve and analyze.

196 MB are not hard to store.

As for analyzing the log, leave it to the engineer who's working on the issue.
One can use other tools like grep or write scripts to automate parts of the analysis.
But we need the raw data to start with.
Hiding stuff based on what we know today doesn't help since you will need that data for analyzing future problems that are unknown yet, so I would suggest to "expect the unexpected".

Instead of logging (or in parallel to it) I would suggest the following:

Every query is remembered by gemini (memory is not an option, we can go with sqlite), in such manner that it can be found by pk/ck

Test-wide queries, like schema changes, are stored separately

When error happens gemini pulls all the queries that where executed for given pk/ck with test-wide queries and logs/prints them

Reproducibility problem should be addressed by setting seeds, unfortunately it does not work properly, #370, but once issue is fixed you should be able to reproduce exact CQL flow by just setting --schema-seed and --seed

ptrsmrn · 2023-12-13T14:49:30Z

@dkropachev Saving logs only for failed tests makes sense, given their size. How gemini framework stores them internally, a file or sqlite, is not a concern to devs as long as devs don't need to query sqlite, so gemini logging them onto the screen/file or anything alike, as you suggested, is IMO good - essentially I'd just make these extra logs easily available for devs, so we don't hear further complaints ;)
A note regarding the file size - best to not produce too big files that are hard to open or are bloated with "spam", because it'll make debugging process harder, but that's just a wishful thinking, as I understand each test produces tons of queries.

ptrsmrn assigned dkropachev Dec 12, 2023

dkropachev mentioned this issue Dec 16, 2023

feat(cli): add statement-log-file option #413

Merged

dkropachev linked a pull request Dec 16, 2023 that will close this issue

feat(cli): add statement-log-file option #413

Merged

dkropachev closed this as completed in #413 Dec 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Log details required for debugging failed tests #409

Log details required for debugging failed tests #409

ptrsmrn commented Dec 12, 2023

bhalevy commented Dec 12, 2023

dkropachev commented Dec 12, 2023 •

edited

Loading

bhalevy commented Dec 12, 2023 •

edited

Loading

ptrsmrn commented Dec 13, 2023

Log details required for debugging failed tests #409

Log details required for debugging failed tests #409

Comments

ptrsmrn commented Dec 12, 2023

bhalevy commented Dec 12, 2023

dkropachev commented Dec 12, 2023 • edited Loading

bhalevy commented Dec 12, 2023 • edited Loading

ptrsmrn commented Dec 13, 2023

dkropachev commented Dec 12, 2023 •

edited

Loading

bhalevy commented Dec 12, 2023 •

edited

Loading