Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

system.log -> messages.log scraping can start lagging badly after connection is lost #9512

Open
piodul opened this issue Dec 9, 2024 · 2 comments

Comments

@piodul
Copy link
Contributor

piodul commented Dec 9, 2024

Argus run that prompted this issue: https://argus.scylladb.com/tests/scylla-cluster-tests/83efb5fa-0232-4b55-a1e1-219764056cee. I have left a comment in the discussion there which explains the problem in more detail. Summarizing:

  • An event happens which causes Scylla to generate a relatively large amount of logs (it was nodetool rebuild in this run),
  • The system.log -> messages.log scraping mechanism (I think it's called syslog-ng) loses connection to the node, reconnects after 60s but keeps lagging badly afterwards; I saw that one line was delayed by 16 minutes in that particular run,
  • The nemesis code which waits for a log line to appear times out, even though it appeared in system.log pretty quickly, leading to test flakiness.
@piodul piodul removed their assignment Dec 9, 2024
@fruch
Copy link
Contributor

fruch commented Dec 9, 2024

Image

I would argue it's a scylla bug, if it get to more 100K per sec log in one node

also, can you suggest a different way to identify a rebuild has start, without looking at logs ?

@roydahan
Copy link
Contributor

What are these logs (the sudden burst) about?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants