Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log slow user-created searches #20473

Open
damianharouff opened this issue Sep 17, 2024 · 2 comments
Open

Log slow user-created searches #20473

damianharouff opened this issue Sep 17, 2024 · 2 comments
Labels

Comments

@damianharouff
Copy link

damianharouff commented Sep 17, 2024

Further to the new ability to cancel long-running user searches: #18308 a toggle-able option should be available to log slow searches, along with the user who triggered it.

This will be helpful for a busy Graylog installation where many users are running many searches, and the Graylog admin may want to ensure that users are not creating unreasonable searches that impact search cluster performance, e.g. log searches that take longer than 30 seconds to complete, before canceling at 60 seconds.

This stems from a situation encountered by a strategic customer where they have awareness that a user search is impacting their search cluster, but have no ability to understand which specific query is causing this without asking users currently logged into their system, and this may be hundreds of searches at a time. They have to assess each and every one of them manually, which is very time consuming. ZD 940 has more details too.

It's understood that Opensearch has functionality to log (or take further action on) slow queries, but these end up being the Graylog-computed query sent to Opensearch, which doesn't provide information about who is executing it, whereas Graylog can provide this information to the Graylog admin.

Understanding that we may not want to log the query itself due to concerns like the size of the query, or that it may contain sensitive information, a log entry as simple as "Search job 66e9b58b6a00143e28d8bbde started by user damian took more than search_slow_seconds to complete" would be very impactful for a Graylog administrator to investigate further.

@coffee-squirrel
Copy link

If this gets rolled into Audit Log it'd be nice to have the duration searchable (e.g. audit log query execution_time_ms:>15000), since the "slow" threshold may change when troubleshooting. I think we've brought this up before, but can't find the issue/case.

@damianharouff
Copy link
Author

damianharouff commented Sep 18, 2024

Audit log is an idea, although until https://github.com/Graylog2/graylog-plugin-enterprise/issues/7098 (Audit log entries into configurable stream) is implemented, there would be no way to events -> alerts on that data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants