Document metrics that help detect when WAL apply lag is increasing #191

mtopolnik · 2025-05-26T11:21:10Z

Adds three new metrics to the table of all metrics:

counter: questdb_wal_apply_seq_txn_total
counter: questdb_wal_apply_writer_txn_total
gauge: questdb_suspended_tables

TODO: document how to use these metrics:

To track increasing WAL apply lag not caused by suspended tables, monitor the difference questdb_wal_apply_seq_txn_total - questdb_wal_apply_writer_txn_total. If it's persistently growing, it is likely that at least one table is falling seriously behind.

The gauge questdb_suspended_tables is simpler, any value above zero is a sign of trouble.

github-actions · 2025-05-26T11:23:57Z

🚀 Build success!

Latest successful preview: https://preview-191--questdb-documentation.netlify.app/docs/

Commit SHA: d8f679c

📦 Build generates a preview & updates link on each commit.

Copilot

Pull Request Overview

This PR introduces new observability metrics for WAL apply lag and a dedicated monitoring guide, while updating documentation structure.

Adds three new Prometheus metrics for WAL apply and suspended tables
Introduces a new “Monitoring and alerting” guide and links it in the sidebar
Refactors existing logging-metrics wording for clarity

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File	Description
documentation/third-party-tools/prometheus.md	Added `questdb_wal_apply_seq_txn_total`, `questdb_wal_apply_writer_txn_total`, `questdb_suspended_tables` and updated table separator
documentation/sidebars.js	Inserted `"operations/monitoring-alerting"` into the docs sidebar
documentation/operations/monitoring-alerting.md	Created a new guide covering suspended tables alerts and WAL apply lag detection
documentation/operations/logging-metrics.md	Refactored the CPU-cores health check paragraph and adjusted surrounding text

Comments suppressed due to low confidence (1)

documentation/operations/logging-metrics.md:213

[nitpick] This sentence fragment lacks its introductory context (e.g., 'On systems with 8 cores or less...'). Please reintroduce or rephrase the initial clause for clarity.

for threads might increase the latency of health check service responses. If you

documentation/operations/monitoring-alerting.md

Co-authored-by: Copilot <[email protected]>

mtopolnik added 3 commits May 26, 2025 12:50

Add new metrics to table

e4caad2

Simplify description of two items

82a3001

Realign table

33daa8a

mtopolnik changed the title ~~Add metrics that help detect when WAL apply lag is increasing~~ Document metrics that help detect when WAL apply lag is increasing May 26, 2025

mtopolnik added 4 commits May 27, 2025 16:04

Skeleton for Monitoring and Alerting page

cd56ecd

Auto-style on logging-metrics page

4a30618

Add Monitoring and Alerting content

0644a51

Merge branch 'main' into mt_wal-lag-metrics

a1a0c83

mtopolnik requested a review from Copilot June 27, 2025 09:41

Copilot AI reviewed Jun 27, 2025

View reviewed changes

documentation/operations/monitoring-alerting.md Outdated Show resolved Hide resolved

documentation/operations/monitoring-alerting.md Outdated Show resolved Hide resolved

Fix typos

d8f679c

Co-authored-by: Copilot <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Document metrics that help detect when WAL apply lag is increasing #191

Document metrics that help detect when WAL apply lag is increasing #191

Uh oh!

mtopolnik commented May 26, 2025 •

edited

Loading

Uh oh!

github-actions bot commented May 26, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Document metrics that help detect when WAL apply lag is increasing #191

Are you sure you want to change the base?

Document metrics that help detect when WAL apply lag is increasing #191

Uh oh!

Conversation

mtopolnik commented May 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented May 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mtopolnik commented May 26, 2025 •

edited

Loading

github-actions bot commented May 26, 2025 •

edited

Loading