Skip to content

Commit

Permalink
ticdc: add ticdc_changefeed_failed alert rule (#16380) (#16665)
Browse files Browse the repository at this point in the history
  • Loading branch information
ti-chi-bot authored Mar 4, 2024
1 parent 75a32f0 commit 4f531e8
Showing 1 changed file with 21 additions and 7 deletions.
28 changes: 21 additions & 7 deletions ticdc/ticdc-alert-rules.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,29 +16,43 @@ For critical alerts, you need to pay close attention to abnormal monitoring metr

- Alert rule:

(time() - ticdc_owner_checkpoint_ts / 1000) > 600
`(time() - ticdc_owner_checkpoint_ts / 1000) > 600`

- Description:

A replication task is delayed more than 10 minutes.

- Solution:

See [TiCDC Handle Replication Interruption](/ticdc/troubleshoot-ticdc.md#how-do-i-handle-replication-interruptions).
See [TiCDC Handles Replication Interruption](/ticdc/troubleshoot-ticdc.md#how-do-i-handle-replication-interruptions).

### `cdc_resolvedts_high_delay`

- Alert rule:

(time() - ticdc_owner_resolved_ts / 1000) > 300
`(time() - ticdc_owner_resolved_ts / 1000) > 300`

- Description:

The Resolved TS of a replication task is delayed more than 5 minutes.

- Solution:

See [TiCDC Handle Replication Interruption](/ticdc/troubleshoot-ticdc.md#how-do-i-handle-replication-interruptions).
See [TiCDC Handles Replication Interruption](/ticdc/troubleshoot-ticdc.md#how-do-i-handle-replication-interruptions).

### `ticdc_changefeed_failed`

- Alert rule:

`(max_over_time(ticdc_owner_status[1m]) == 2) > 0`

- Description:

A replication task encounters an unrecoverable error and enters the failed state.

- Solution:

This alert is similar to replication interruption. See [TiCDC Handles Replication Interruption](/ticdc/troubleshoot-ticdc.md#how-do-i-handle-replication-interruptions).

### `ticdc_processor_exit_with_error_count`

Expand All @@ -52,7 +66,7 @@ For critical alerts, you need to pay close attention to abnormal monitoring metr

- Solution:

See [TiCDC Handle Replication Interruption](/ticdc/troubleshoot-ticdc.md#how-do-i-handle-replication-interruptions).
See [TiCDC Handles Replication Interruption](/ticdc/troubleshoot-ticdc.md#how-do-i-handle-replication-interruptions).

## Warning alerts

Expand Down Expand Up @@ -98,7 +112,7 @@ Warning alerts are a reminder for an issue or error.

- Solution:

See [TiCDC Handle Replication Interruption](/ticdc/troubleshoot-ticdc.md#how-do-i-handle-replication-interruptions).
See [TiCDC Handles Replication Interruption](/ticdc/troubleshoot-ticdc.md#how-do-i-handle-replication-interruptions).

### `ticdc_puller_entry_sorter_sort_bucket`

Expand Down Expand Up @@ -132,7 +146,7 @@ Warning alerts are a reminder for an issue or error.

- Alert rule:

`changes(tikv_cdc_min_resolved_ts[1m]) < 1 and ON (instance) tikv_cdc_region_resolve_status{status="resolved"} > 0`
`changes(tikv_cdc_min_resolved_ts[1m]) < 1 and ON (instance) tikv_cdc_region_resolve_status{status="resolved"} > 0 and ON (instance) tikv_cdc_captured_region_total > 0`

- Description:

Expand Down

0 comments on commit 4f531e8

Please sign in to comment.