diff --git a/ticdc/ticdc-alert-rules.md b/ticdc/ticdc-alert-rules.md index 44d586fbce8f0..d0452a4ebb587 100644 --- a/ticdc/ticdc-alert-rules.md +++ b/ticdc/ticdc-alert-rules.md @@ -16,7 +16,7 @@ For critical alerts, you need to pay close attention to abnormal monitoring metr - Alert rule: - (time() - ticdc_processor_checkpoint_ts / 1000) > 600 + `(time() - ticdc_processor_checkpoint_ts / 1000) > 600` - Description: @@ -24,13 +24,13 @@ For critical alerts, you need to pay close attention to abnormal monitoring metr - Solution: - See [TiCDC Handle Replication Interruption](/ticdc/troubleshoot-ticdc.md#how-do-i-handle-replication-interruptions). + See [TiCDC Handles Replication Interruption](/ticdc/troubleshoot-ticdc.md#how-do-i-handle-replication-interruptions). ## `cdc_resolvedts_high_delay` - Alert rule: - (time() - ticdc_processor_resolved_ts / 1000) > 300 + `(time() - ticdc_processor_resolved_ts / 1000) > 300` - Description: @@ -38,7 +38,21 @@ For critical alerts, you need to pay close attention to abnormal monitoring metr - Solution: - See [TiCDC Handle Replication Interruption](/ticdc/troubleshoot-ticdc.md#how-do-i-handle-replication-interruptions). + See [TiCDC Handles Replication Interruption](/ticdc/troubleshoot-ticdc.md#how-do-i-handle-replication-interruptions). + +### `ticdc_changefeed_failed` + +- Alert rule: + + `(max_over_time(ticdc_owner_status[1m]) == 2) > 0` + +- Description: + + A replication task encounters an unrecoverable error and enters the failed state. + +- Solution: + + This alert is similar to replication interruption. See [TiCDC Handles Replication Interruption](/ticdc/troubleshoot-ticdc.md#how-do-i-handle-replication-interruptions). ### `ticdc_processor_exit_with_error_count` @@ -52,7 +66,7 @@ For critical alerts, you need to pay close attention to abnormal monitoring metr - Solution: - See [TiCDC Handle Replication Interruption](/ticdc/troubleshoot-ticdc.md#how-do-i-handle-replication-interruptions). + See [TiCDC Handles Replication Interruption](/ticdc/troubleshoot-ticdc.md#how-do-i-handle-replication-interruptions). ## Warning alerts @@ -112,7 +126,7 @@ Warning alerts are a reminder for an issue or error. - Solution: - See [TiCDC Handle Replication Interruption](/ticdc/troubleshoot-ticdc.md#how-do-i-handle-replication-interruptions). + See [TiCDC Handles Replication Interruption](/ticdc/troubleshoot-ticdc.md#how-do-i-handle-replication-interruptions). ### `ticdc_puller_entry_sorter_sort_bucket` @@ -146,7 +160,7 @@ Warning alerts are a reminder for an issue or error. - Alert rule: - `changes(tikv_cdc_min_resolved_ts[1m]) < 1 and ON (instance) tikv_cdc_region_resolve_status{status="resolved"} > 0` + `changes(tikv_cdc_min_resolved_ts[1m]) < 1 and ON (instance) tikv_cdc_region_resolve_status{status="resolved"} > 0 and ON (instance) tikv_cdc_captured_region_total > 0` - Description: