-
Notifications
You must be signed in to change notification settings - Fork 18
Document metrics that help detect when WAL apply lag is increasing #191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
🚀 Build success! Latest successful preview: https://preview-191--questdb-documentation.netlify.app/docs/ Commit SHA: d8f679c
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces new observability metrics for WAL apply lag and a dedicated monitoring guide, while updating documentation structure.
- Adds three new Prometheus metrics for WAL apply and suspended tables
- Introduces a new “Monitoring and alerting” guide and links it in the sidebar
- Refactors existing logging-metrics wording for clarity
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
File | Description |
---|---|
documentation/third-party-tools/prometheus.md | Added questdb_wal_apply_seq_txn_total , questdb_wal_apply_writer_txn_total , questdb_suspended_tables and updated table separator |
documentation/sidebars.js | Inserted "operations/monitoring-alerting" into the docs sidebar |
documentation/operations/monitoring-alerting.md | Created a new guide covering suspended tables alerts and WAL apply lag detection |
documentation/operations/logging-metrics.md | Refactored the CPU-cores health check paragraph and adjusted surrounding text |
Comments suppressed due to low confidence (1)
documentation/operations/logging-metrics.md:213
- [nitpick] This sentence fragment lacks its introductory context (e.g., 'On systems with 8 cores or less...'). Please reintroduce or rephrase the initial clause for clarity.
for threads might increase the latency of health check service responses. If you
Co-authored-by: Copilot <[email protected]>
Adds three new metrics to the table of all metrics:
questdb_wal_apply_seq_txn_total
questdb_wal_apply_writer_txn_total
questdb_suspended_tables
TODO: document how to use these metrics:
To track increasing WAL apply lag not caused by suspended tables, monitor the difference
questdb_wal_apply_seq_txn_total - questdb_wal_apply_writer_txn_total
. If it's persistently growing, it is likely that at least one table is falling seriously behind.The gauge
questdb_suspended_tables
is simpler, any value above zero is a sign of trouble.