-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When nodetool command timeout, kill scylla-jmx with sigquit #303
Comments
@fruch who's working on this? |
Currently no one... there are more incidents of that getting stuck ? |
Yes, this frequently happens in dtest, see scylladb/scylladb#7244 |
I'll take a closer look (it mainly happen on those sec. indexes tests ? or on more then one test file / class ?) |
@elcallio seem that other places point to using SIGQUIT, (and not SIGINT), https://access.redhat.com/solutions/18178 I wasn't able to reproduce those issue, but I want to make sure I'm using the correct signal (and capturing the correct logs) |
…ired` When nodetool command get timeout, we try to send `SIGQUIT` to get a threaddump inforamtion into scylla-jmx stdout. Close: scylladb#303 Ref: scylladb/scylladb#7991 (comment)
…ired` When nodetool command get timeout, we try to send `SIGQUIT` to get a threaddump inforamtion into scylla-jmx stdout. Close: scylladb#303 Ref: scylladb/scylladb#7991 (comment)
…ired` When nodetool command get timeout, we try to send `SIGQUIT` to get a threaddump inforamtion into scylla-jmx stdout. Close: scylladb#303 Ref: scylladb/scylladb#7991 (comment)
When nodetool command get timeout, we try to send `SIGQUIT` to get a threaddump inforamtion into scylla-jmx stdout. Close: #303 Ref: scylladb/scylladb#7991 (comment)
The `SIGQUIT` is follow too soon by a forcefull kill, i.e. we don't let the process enough time to print the output to stdout Close: scylladb#303 Ref: scylladb/scylladb#7991 (comment)
The `SIGQUIT` is follow too soon by a forcefull kill, i.e. we don't let the process enough time to print the output to stdout Close: #303 Ref: scylladb/scylladb#7991 (comment)
Unfortunately, we still don't see the stacktrace :-(
|
Interestingly, but likely unrelated, shortly after, in a following test, there's this:
Note that a different post is in use. |
It smells like scylla-jmx might not exit cleanly and is holding on to the api port:
|
@bhalevy, oh great so it even made things worse ? |
I'm not sure, it may have exposed an existing issue. scylla-ccm/ccmlib/scylla_node.py Lines 692 to 726 in bc4dced
It looks like we're not stopping scylla_jmx if the node is not considered running. |
@bhalevy so 5sec wasn't enough for it to actually dump the threaddump... |
is those happening only on run on |
The timeouts happen also with no parallelism. |
The port-in-use issue could be unrelated. |
Cc @penberg |
We had few reports across the broad that nodetool commands are getting stuck.
a suggestion was raise by @elcallio to try catch and collect enough information when those things happen:
scylladb/scylladb#7991 (comment)
The text was updated successfully, but these errors were encountered: