From 0384ef270900a53702fa47d2ff3c10afff07b297 Mon Sep 17 00:00:00 2001 From: Aolin Date: Fri, 1 Mar 2024 17:45:31 +0800 Subject: [PATCH] This is an automated cherry-pick of #16380 Signed-off-by: ti-chi-bot --- ticdc/ticdc-alert-rules.md | 32 +++++++++++++++++++++++++++----- 1 file changed, 27 insertions(+), 5 deletions(-) diff --git a/ticdc/ticdc-alert-rules.md b/ticdc/ticdc-alert-rules.md index 44d586fbce8f0..d3fb94142d6bd 100644 --- a/ticdc/ticdc-alert-rules.md +++ b/ticdc/ticdc-alert-rules.md @@ -16,7 +16,11 @@ For critical alerts, you need to pay close attention to abnormal monitoring metr - Alert rule: +<<<<<<< HEAD (time() - ticdc_processor_checkpoint_ts / 1000) > 600 +======= + `(time() - ticdc_owner_checkpoint_ts / 1000) > 600` +>>>>>>> 5f3a210d15 (ticdc: add ticdc_changefeed_failed alert rule (#16380)) - Description: @@ -24,13 +28,17 @@ For critical alerts, you need to pay close attention to abnormal monitoring metr - Solution: - See [TiCDC Handle Replication Interruption](/ticdc/troubleshoot-ticdc.md#how-do-i-handle-replication-interruptions). + See [TiCDC Handles Replication Interruption](/ticdc/troubleshoot-ticdc.md#how-do-i-handle-replication-interruptions). ## `cdc_resolvedts_high_delay` - Alert rule: +<<<<<<< HEAD (time() - ticdc_processor_resolved_ts / 1000) > 300 +======= + `(time() - ticdc_owner_resolved_ts / 1000) > 300` +>>>>>>> 5f3a210d15 (ticdc: add ticdc_changefeed_failed alert rule (#16380)) - Description: @@ -38,7 +46,21 @@ For critical alerts, you need to pay close attention to abnormal monitoring metr - Solution: - See [TiCDC Handle Replication Interruption](/ticdc/troubleshoot-ticdc.md#how-do-i-handle-replication-interruptions). + See [TiCDC Handles Replication Interruption](/ticdc/troubleshoot-ticdc.md#how-do-i-handle-replication-interruptions). + +### `ticdc_changefeed_failed` + +- Alert rule: + + `(max_over_time(ticdc_owner_status[1m]) == 2) > 0` + +- Description: + + A replication task encounters an unrecoverable error and enters the failed state. + +- Solution: + + This alert is similar to replication interruption. See [TiCDC Handles Replication Interruption](/ticdc/troubleshoot-ticdc.md#how-do-i-handle-replication-interruptions). ### `ticdc_processor_exit_with_error_count` @@ -52,7 +74,7 @@ For critical alerts, you need to pay close attention to abnormal monitoring metr - Solution: - See [TiCDC Handle Replication Interruption](/ticdc/troubleshoot-ticdc.md#how-do-i-handle-replication-interruptions). + See [TiCDC Handles Replication Interruption](/ticdc/troubleshoot-ticdc.md#how-do-i-handle-replication-interruptions). ## Warning alerts @@ -112,7 +134,7 @@ Warning alerts are a reminder for an issue or error. - Solution: - See [TiCDC Handle Replication Interruption](/ticdc/troubleshoot-ticdc.md#how-do-i-handle-replication-interruptions). + See [TiCDC Handles Replication Interruption](/ticdc/troubleshoot-ticdc.md#how-do-i-handle-replication-interruptions). ### `ticdc_puller_entry_sorter_sort_bucket` @@ -146,7 +168,7 @@ Warning alerts are a reminder for an issue or error. - Alert rule: - `changes(tikv_cdc_min_resolved_ts[1m]) < 1 and ON (instance) tikv_cdc_region_resolve_status{status="resolved"} > 0` + `changes(tikv_cdc_min_resolved_ts[1m]) < 1 and ON (instance) tikv_cdc_region_resolve_status{status="resolved"} > 0 and ON (instance) tikv_cdc_captured_region_total > 0` - Description: