From 12180b8468fdefed3449435bb3ac7feb44a79ee9 Mon Sep 17 00:00:00 2001 From: Aolin Date: Tue, 30 Jan 2024 15:16:27 +0800 Subject: [PATCH 1/2] ticdc: add ticdc_changefeed_failed alert rule --- ticdc/ticdc-alert-rules.md | 20 +++++++++++++++++--- 1 file changed, 17 insertions(+), 3 deletions(-) diff --git a/ticdc/ticdc-alert-rules.md b/ticdc/ticdc-alert-rules.md index 91c4c8c54870d..4870cc0c54f8f 100644 --- a/ticdc/ticdc-alert-rules.md +++ b/ticdc/ticdc-alert-rules.md @@ -16,7 +16,7 @@ For critical alerts, you need to pay close attention to abnormal monitoring metr - Alert rule: - (time() - ticdc_owner_checkpoint_ts / 1000) > 600 + `(time() - ticdc_owner_checkpoint_ts / 1000) > 600` - Description: @@ -30,7 +30,7 @@ For critical alerts, you need to pay close attention to abnormal monitoring metr - Alert rule: - (time() - ticdc_owner_resolved_ts / 1000) > 300 + `(time() - ticdc_owner_resolved_ts / 1000) > 300` - Description: @@ -40,6 +40,20 @@ For critical alerts, you need to pay close attention to abnormal monitoring metr See [TiCDC Handle Replication Interruption](/ticdc/troubleshoot-ticdc.md#how-do-i-handle-replication-interruptions). +### `ticdc_changefeed_failed` + +- Alert rule: + + `(max_over_time(ticdc_owner_status[1m]) == 2) > 0` + +- Description: + + A replication task encounters an unrecoverable error and enters the failed state. + +- Solution: + + This alert is similar to replication interruption. See [TiCDC Handle Replication Interruption](/ticdc/troubleshoot-ticdc.md#how-do-i-handle-replication-interruptions). + ### `ticdc_processor_exit_with_error_count` - Alert rule: @@ -132,7 +146,7 @@ Warning alerts are a reminder for an issue or error. - Alert rule: - `changes(tikv_cdc_min_resolved_ts[1m]) < 1 and ON (instance) tikv_cdc_region_resolve_status{status="resolved"} > 0` + `changes(tikv_cdc_min_resolved_ts[1m]) < 1 and ON (instance) tikv_cdc_region_resolve_status{status="resolved"} > 0 and ON (instance) tikv_cdc_captured_region_total > 0` - Description: From d3915957c00a417c80a67156e4305aa7c7167fdf Mon Sep 17 00:00:00 2001 From: Aolin Date: Fri, 1 Mar 2024 17:41:55 +0800 Subject: [PATCH 2/2] fix typo --- ticdc/ticdc-alert-rules.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/ticdc/ticdc-alert-rules.md b/ticdc/ticdc-alert-rules.md index 4870cc0c54f8f..10e9f0235271a 100644 --- a/ticdc/ticdc-alert-rules.md +++ b/ticdc/ticdc-alert-rules.md @@ -24,7 +24,7 @@ For critical alerts, you need to pay close attention to abnormal monitoring metr - Solution: - See [TiCDC Handle Replication Interruption](/ticdc/troubleshoot-ticdc.md#how-do-i-handle-replication-interruptions). + See [TiCDC Handles Replication Interruption](/ticdc/troubleshoot-ticdc.md#how-do-i-handle-replication-interruptions). ### `cdc_resolvedts_high_delay` @@ -38,7 +38,7 @@ For critical alerts, you need to pay close attention to abnormal monitoring metr - Solution: - See [TiCDC Handle Replication Interruption](/ticdc/troubleshoot-ticdc.md#how-do-i-handle-replication-interruptions). + See [TiCDC Handles Replication Interruption](/ticdc/troubleshoot-ticdc.md#how-do-i-handle-replication-interruptions). ### `ticdc_changefeed_failed` @@ -52,7 +52,7 @@ For critical alerts, you need to pay close attention to abnormal monitoring metr - Solution: - This alert is similar to replication interruption. See [TiCDC Handle Replication Interruption](/ticdc/troubleshoot-ticdc.md#how-do-i-handle-replication-interruptions). + This alert is similar to replication interruption. See [TiCDC Handles Replication Interruption](/ticdc/troubleshoot-ticdc.md#how-do-i-handle-replication-interruptions). ### `ticdc_processor_exit_with_error_count` @@ -66,7 +66,7 @@ For critical alerts, you need to pay close attention to abnormal monitoring metr - Solution: - See [TiCDC Handle Replication Interruption](/ticdc/troubleshoot-ticdc.md#how-do-i-handle-replication-interruptions). + See [TiCDC Handles Replication Interruption](/ticdc/troubleshoot-ticdc.md#how-do-i-handle-replication-interruptions). ## Warning alerts @@ -112,7 +112,7 @@ Warning alerts are a reminder for an issue or error. - Solution: - See [TiCDC Handle Replication Interruption](/ticdc/troubleshoot-ticdc.md#how-do-i-handle-replication-interruptions). + See [TiCDC Handles Replication Interruption](/ticdc/troubleshoot-ticdc.md#how-do-i-handle-replication-interruptions). ### `ticdc_puller_entry_sorter_sort_bucket`