Skip to content

Commit

Permalink
This is an automated cherry-pick of pingcap#16380
Browse files Browse the repository at this point in the history
Signed-off-by: ti-chi-bot <[email protected]>
  • Loading branch information
Oreoxmt authored and ti-chi-bot committed Mar 1, 2024
1 parent 074c382 commit 0384ef2
Showing 1 changed file with 27 additions and 5 deletions.
32 changes: 27 additions & 5 deletions ticdc/ticdc-alert-rules.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,29 +16,51 @@ For critical alerts, you need to pay close attention to abnormal monitoring metr

- Alert rule:

<<<<<<< HEAD
(time() - ticdc_processor_checkpoint_ts / 1000) > 600
=======
`(time() - ticdc_owner_checkpoint_ts / 1000) > 600`
>>>>>>> 5f3a210d15 (ticdc: add ticdc_changefeed_failed alert rule (#16380))
- Description:

A replication task is delayed more than 10 minutes.

- Solution:

See [TiCDC Handle Replication Interruption](/ticdc/troubleshoot-ticdc.md#how-do-i-handle-replication-interruptions).
See [TiCDC Handles Replication Interruption](/ticdc/troubleshoot-ticdc.md#how-do-i-handle-replication-interruptions).

## `cdc_resolvedts_high_delay`

- Alert rule:

<<<<<<< HEAD
(time() - ticdc_processor_resolved_ts / 1000) > 300
=======
`(time() - ticdc_owner_resolved_ts / 1000) > 300`
>>>>>>> 5f3a210d15 (ticdc: add ticdc_changefeed_failed alert rule (#16380))
- Description:

The Resolved TS of a replication task is delayed more than 5 minutes.

- Solution:

See [TiCDC Handle Replication Interruption](/ticdc/troubleshoot-ticdc.md#how-do-i-handle-replication-interruptions).
See [TiCDC Handles Replication Interruption](/ticdc/troubleshoot-ticdc.md#how-do-i-handle-replication-interruptions).

### `ticdc_changefeed_failed`

- Alert rule:

`(max_over_time(ticdc_owner_status[1m]) == 2) > 0`

- Description:

A replication task encounters an unrecoverable error and enters the failed state.

- Solution:

This alert is similar to replication interruption. See [TiCDC Handles Replication Interruption](/ticdc/troubleshoot-ticdc.md#how-do-i-handle-replication-interruptions).

### `ticdc_processor_exit_with_error_count`

Expand All @@ -52,7 +74,7 @@ For critical alerts, you need to pay close attention to abnormal monitoring metr

- Solution:

See [TiCDC Handle Replication Interruption](/ticdc/troubleshoot-ticdc.md#how-do-i-handle-replication-interruptions).
See [TiCDC Handles Replication Interruption](/ticdc/troubleshoot-ticdc.md#how-do-i-handle-replication-interruptions).

## Warning alerts

Expand Down Expand Up @@ -112,7 +134,7 @@ Warning alerts are a reminder for an issue or error.

- Solution:

See [TiCDC Handle Replication Interruption](/ticdc/troubleshoot-ticdc.md#how-do-i-handle-replication-interruptions).
See [TiCDC Handles Replication Interruption](/ticdc/troubleshoot-ticdc.md#how-do-i-handle-replication-interruptions).

### `ticdc_puller_entry_sorter_sort_bucket`

Expand Down Expand Up @@ -146,7 +168,7 @@ Warning alerts are a reminder for an issue or error.

- Alert rule:

`changes(tikv_cdc_min_resolved_ts[1m]) < 1 and ON (instance) tikv_cdc_region_resolve_status{status="resolved"} > 0`
`changes(tikv_cdc_min_resolved_ts[1m]) < 1 and ON (instance) tikv_cdc_region_resolve_status{status="resolved"} > 0 and ON (instance) tikv_cdc_captured_region_total > 0`

- Description:

Expand Down

0 comments on commit 0384ef2

Please sign in to comment.