From 01964237d010b3249183a9e82dedf189bb911428 Mon Sep 17 00:00:00 2001 From: Aolin Date: Tue, 2 Apr 2024 15:55:37 +0800 Subject: [PATCH] cdc: Add description about protocol behavior change --- TOC.md | 1 + releases/release-6.5.3.md | 6 ++++ releases/release-6.5.4.md | 4 +++ releases/release-7.1.1.md | 4 +++ releases/release-7.1.2.md | 4 +++ releases/release-7.2.0.md | 4 +++ releases/release-7.4.0.md | 2 ++ ticdc/ticdc-behavior-change.md | 54 ++++++++++++++++++++++++++++++++++ 8 files changed, 79 insertions(+) create mode 100644 ticdc/ticdc-behavior-change.md diff --git a/TOC.md b/TOC.md index 49b9ff4920e18..61fc0a4e6f59c 100644 --- a/TOC.md +++ b/TOC.md @@ -564,6 +564,7 @@ - [Bidirectional Replication](/ticdc/ticdc-bidirectional-replication.md) - [Data Integrity Validation for Single-Row Data](/ticdc/ticdc-integrity-check.md) - [Data Consistency Validation for TiDB Upstream/Downstream Clusters](/ticdc/ticdc-upstream-downstream-check.md) + - [TiCDC Behavior Changes](/ticdc/ticdc-behavior-change.md) - Monitor and Alert - [Monitoring Metrics Summary](/ticdc/ticdc-summary-monitor.md) - [Monitoring Metrics Details](/ticdc/monitor-ticdc.md) diff --git a/releases/release-6.5.3.md b/releases/release-6.5.3.md index 8a2afd9bb1488..093aed31dac16 100644 --- a/releases/release-6.5.3.md +++ b/releases/release-6.5.3.md @@ -11,6 +11,12 @@ TiDB version: 6.5.3 Quick access: [Quick start](https://docs.pingcap.com/tidb/v6.5/quick-start-with-tidb) | [Production deployment](https://docs.pingcap.com/tidb/v6.5/production-deployment-using-tiup) | [Installation packages](https://www.pingcap.com/download/?version=v6.5.3#version-list) +## Compatibility changes + +### Behavior changes + +- When processing update event, TiCDC splits an event into delete and insert events if the primary key or non-null unique index value is modified in the event. For more information, see [documentation](/ticdc/ticdc-behavior-change.md#transactions-containing-a-single-update-change). + ## Improvements + TiDB diff --git a/releases/release-6.5.4.md b/releases/release-6.5.4.md index acad4c22dc0da..a4f5b5100dc3f 100644 --- a/releases/release-6.5.4.md +++ b/releases/release-6.5.4.md @@ -16,6 +16,10 @@ Quick access: [Quick start](https://docs.pingcap.com/tidb/v6.5/quick-start-with- - To fix the issue that TiDB consumes too much memory when using `Cursor Fetch` to fetch a large result set, TiDB automatically writes the result set to the disk to release memory [#43233](https://github.com/pingcap/tidb/issues/43233) @[YangKeao](https://github.com/YangKeao) - Disable periodic compaction of RocksDB by default, so that the default behavior of TiKV RocksDB is now consistent with that in versions before v6.5.0. This change prevents potential performance impact caused by a significant number of compactions after upgrading. In addition, TiKV introduces two new configuration items [`rocksdb.[defaultcf|writecf|lockcf].periodic-compaction-seconds`](https://docs.pingcap.com/tidb/v6.5/tikv-configuration-file#periodic-compaction-seconds-new-in-v654) and [`rocksdb.[defaultcf|writecf|lockcf].ttl`](https://docs.pingcap.com/tidb/v6.5/tikv-configuration-file#ttl-new-in-v654), enabling you to manually configure periodic compaction of RocksDB [#15355](https://github.com/tikv/tikv/issues/15355) @[LykxSassinator](https://github.com/LykxSassinator) +### Behavior changes + +- For transactions containing multiple changes, if the primary key or non-null unique index value is modified in the update event, TiCDC splits an event into delete and insert events and ensures that all events follow the sequence of delete events preceding insert events. For more information, see [documentation](/ticdc/ticdc-behavior-change.md#transactions-containing-multiple-update-changes). + ## Improvements + TiDB diff --git a/releases/release-7.1.1.md b/releases/release-7.1.1.md index b394885f87809..9aa331a081959 100644 --- a/releases/release-7.1.1.md +++ b/releases/release-7.1.1.md @@ -15,6 +15,10 @@ Quick access: [Quick start](https://docs.pingcap.com/tidb/v7.1/quick-start-with- - TiDB introduces a new system variable `tidb_lock_unchanged_keys` to control whether to lock unchanged keys [#44714](https://github.com/pingcap/tidb/issues/44714) @[ekexium](https://github.com/ekexium) +### Behavior changes + +- When processing update event, TiCDC splits an event into delete and insert events if the primary key or non-null unique index value is modified in the event. For more information, see [documentation](/ticdc/ticdc-behavior-change.md#transactions-containing-a-single-update-change). + ## Improvements + TiDB diff --git a/releases/release-7.1.2.md b/releases/release-7.1.2.md index 02d3ba937f0f8..6185034c45a85 100644 --- a/releases/release-7.1.2.md +++ b/releases/release-7.1.2.md @@ -20,6 +20,10 @@ Quick access: [Quick start](https://docs.pingcap.com/tidb/v7.1/quick-start-with- - TiCDC introduces the [`sink.csv.binary-encoding-method`](/ticdc/ticdc-changefeed-config.md#changefeed-configuration-parameters) configuration item to control the encoding method of binary data in the CSV protocol. The default value is `'base64'` [#9373](https://github.com/pingcap/tiflow/issues/9373) @[CharlesCheung96](https://github.com/CharlesCheung96) - TiCDC introduces the [`large-message-handle-option`](/ticdc/ticdc-sink-to-kafka.md#handle-messages-that-exceed-the-kafka-topic-limit) configuration item. It is empty by default, which means that the changefeed fails when the message size exceeds the limit of the Kafka topic. When this configuration is set to `"handle-key-only"`, if the message exceeds the size limit, only the handle key will be sent to reduce the message size; if the reduced message still exceeds the limit, then the changefeed fails [#9680](https://github.com/pingcap/tiflow/issues/9680) @[3AceShowHand](https://github.com/3AceShowHand) +### Behavior changes + +- For transactions containing multiple changes, if the primary key or non-null unique index value is modified in the update event, TiCDC splits an event into delete and insert events and ensures that all events follow the sequence of delete events preceding insert events. For more information, see [documentation](/ticdc/ticdc-behavior-change.md#transactions-containing-multiple-update-changes). + ## Improvements + TiDB diff --git a/releases/release-7.2.0.md b/releases/release-7.2.0.md index ea85bb347a999..ffaef46deddfb 100644 --- a/releases/release-7.2.0.md +++ b/releases/release-7.2.0.md @@ -170,6 +170,10 @@ Quick access: [Quick start](https://docs.pingcap.com/tidb/v7.2/quick-start-with- > > This section provides compatibility changes you need to know when you upgrade from v7.1.0 to the current version (v7.2.0). If you are upgrading from v7.0.0 or earlier versions to the current version, you might also need to check the compatibility changes introduced in intermediate versions. +### Behavior changes + +- When processing update event, TiCDC splits an event into delete and insert events if the primary key or non-null unique index value is modified in the event. For more information, see [documentation](/ticdc/ticdc-behavior-change.md#transactions-containing-a-single-update-change). + ### System variables | Variable name | Change type | Description | diff --git a/releases/release-7.4.0.md b/releases/release-7.4.0.md index cfe015548dcdc..2d3cd93e69a4c 100644 --- a/releases/release-7.4.0.md +++ b/releases/release-7.4.0.md @@ -286,6 +286,8 @@ Quick access: [Quick start](https://docs.pingcap.com/tidb/v7.4/quick-start-with- - The [`information_schema.CHECK_CONSTRAINTS`](/information-schema/information-schema-check-constraints.md) table is added for improved compatibility with MySQL 8.0. +- For transactions containing multiple changes, if the primary key or non-null unique index value is modified in the update event, TiCDC splits an event into delete and insert events and ensures that all events follow the sequence of delete events preceding insert events. For more information, see [documentation](/ticdc/ticdc-behavior-change.md#transactions-containing-multiple-update-changes). + ### System variables | Variable name | Change type | Description | diff --git a/ticdc/ticdc-behavior-change.md b/ticdc/ticdc-behavior-change.md new file mode 100644 index 0000000000000..887ff14aa49f4 --- /dev/null +++ b/ticdc/ticdc-behavior-change.md @@ -0,0 +1,54 @@ +--- +title: TiCDC Behavior Changes +summary: Introduce the behavior changes of TiCDC changefeed, including the reasons and the impact of these changes. +--- + +# TiCDC Behavior Changes + +## Split update events into delete and insert events + +### Transactions containing a single update change + +Starting from v6.5.3, v7.1.1, and v7.2.0, when using a non-MySQL sink, for transactions that only contain a single update change, if the primary key or non-null unique index value is modified in the update event, TiCDC splits this event into delete and insert events. For more information, see GitHub issue [#9086](https://github.com/pingcap/tiflow/issues/9086). + +This change primarily addresses the following issues: + +* When using the CSV and AVRO protocols, only the new value is output without the old value. Therefore, when the primary key or non-null unique index value changes, the consumer can only receive the new value, making it impossible to process the value before the change (for example, delete the old value). +* When using the index value dispatcher to distribute data across different Kafka partitions based on the key, multiple consumer processes in the downstream consumer group consume Kafka topic partitions independently. Due to different consumption progress, data inconsistency might occur. + +Take the following SQL as an example: + +```sql +CREATE TABLE t (a INT PRIMARY KEY, b INT); +INSERT INTO t VALUES (1, 1); +UPDATE t SET a = 2 WHERE a = 1; +``` + +In this example, the primary key `a` is updated from `1` to `2`. If the update event is not split: + +* When using the CSV and AVRO protocols, the consumer only obtains the new value `a = 2` and cannot obtain the old value `a = 1`. This might cause the downstream consumer to only insert the new value `2` without deleting the old value `1`. +* When using the index value dispatcher, the event for inserting `(1, 1)` might be sent to Partition 0, and the update event `(2, 1)` might be sent to Partition 1. If the consumption progress of Partition 1 is faster than that of Partition 0, an error might occur due to the absence of corresponding data in the downstream. Therefore, TiCDC splits the update event into delete and insert events. The event for deleting `(1, 1)` is sent to Partition 0, and the event for writing `(2, 1)` is sent to Partition 1, ensuring that the events are consumed successfully regardless of the progress of the consumer. + +### Transactions containing multiple update changes + +Starting from v6.5.4, v7.1.2, and v7.4.0, for transactions containing multiple changes, if the primary key or non-null unique index value is modified in the update event, TiCDC splits the event into delete and insert events and ensures that all events follow the sequence of delete events preceding insert events. For more information, see GitHub issue [#9430](https://github.com/pingcap/tiflow/issues/9430). + +This change primarily addresses the potential issue of primary key conflicts when using the MySQL sink to directly write these two events to the downstream, leading to changefeed errors. + +Take the following SQL as an example: + +```sql +CREATE TABLE t (a INT PRIMARY KEY, b INT); +INSERT INTO t VALUES (1, 1); +INSERT INTO t VALUES (2, 2); + +BEGIN; +UPDATE t SET a = 1 WHERE a = 3; +UPDATE t SET a = 2 WHERE a = 1; +UPDATE t SET a = 3 WHERE a = 2; +COMMIT; +``` + +In this example, by executing three SQL statements to swap the primary keys of two rows, TiCDC only receives two update change events, that is, changing the primary key `a` from `1` to `2` and changing the primary key `a` from `2` to `1`. If the MySQL sink directly writes these two update events to the downstream, a primary key conflict might occur, leading to changefeed errors. + +Therefore, TiCDC splits these two events into four events, that is, deleting records `(1, 1)` and `(2, 2)` and writing records `(2, 1)` and `(1, 2)`.