From 50b418ccd3be8e3f3ca8ff9bc17563a38c11ef08 Mon Sep 17 00:00:00 2001 From: xixirangrang Date: Tue, 28 May 2024 17:35:53 +0800 Subject: [PATCH 1/8] DFX: support running multiple ADD INDEX in parallel (#17637) --- tidb-distributed-execution-framework.md | 1 - 1 file changed, 1 deletion(-) diff --git a/tidb-distributed-execution-framework.md b/tidb-distributed-execution-framework.md index 9cbae4961647e..a25317f755863 100644 --- a/tidb-distributed-execution-framework.md +++ b/tidb-distributed-execution-framework.md @@ -44,7 +44,6 @@ The DXF can only schedule up to 16 tasks (including [`ADD INDEX`](/sql-statement ## `ADD INDEX` limitation -- For each cluster, only one [`ADD INDEX`](/sql-statements/sql-statement-add-index.md) task is allowed for distributed execution at a time. If a new [`ADD INDEX`](/sql-statements/sql-statement-add-index.md) task is submitted before the current [`ADD INDEX`](/sql-statements/sql-statement-add-index.md) distributed task has finished, the new [`ADD INDEX`](/sql-statements/sql-statement-add-index.md) task is executed through a transaction instead of being scheduled by DXF. - Adding indexes on columns with the `TIMESTAMP` data type through the DXF is not supported, because it might lead to inconsistency between the index and the data. ## Prerequisites From 9478671de54942f217201868c74e9d411bb4a245 Mon Sep 17 00:00:00 2001 From: Aolin Date: Tue, 28 May 2024 17:46:22 +0800 Subject: [PATCH 2/8] fix MDX syntax error in machine translation (#17639) --- tiproxy/tiproxy-overview.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tiproxy/tiproxy-overview.md b/tiproxy/tiproxy-overview.md index e0558f9a902df..0438e32b3c8a5 100644 --- a/tiproxy/tiproxy-overview.md +++ b/tiproxy/tiproxy-overview.md @@ -11,7 +11,7 @@ TiProxy is an optional component. You can also use a third-party proxy component The following figure shows the architecture of TiProxy: -TiProxy architecture +TiProxy architecture ## Main features @@ -23,7 +23,7 @@ TiProxy can migrate connections from one TiDB server to another without breaking As shown in the following figure, the client originally connects to TiDB 1 through TiProxy. After the connection migration, the client actually connects to TiDB 2. When TiDB 1 is about to be offline or the ratio of connections on TiDB 1 to connections on TiDB 2 exceeds the set threshold, the connection migration is triggered. The client is unaware of the connection migration. -TiProxy connection migration +TiProxy connection migration Connection migration usually occurs in the following scenarios: From 2c076151e003f0c7beb914c200c0cc9bd9f03544 Mon Sep 17 00:00:00 2001 From: Lilian Lee Date: Wed, 29 May 2024 11:49:20 +0800 Subject: [PATCH 3/8] ticdc: add how TiCDC deals with data changes (#17612) --- ticdc/ticdc-overview.md | 55 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 55 insertions(+) diff --git a/ticdc/ticdc-overview.md b/ticdc/ticdc-overview.md index c9ad339ac7795..efc6911015eae 100644 --- a/ticdc/ticdc-overview.md +++ b/ticdc/ticdc-overview.md @@ -87,6 +87,61 @@ As shown in the architecture diagram, TiCDC supports replicating data to TiDB, M - To ensure eventual consistency when using TiCDC for disaster recovery, you need to configure [redo log](/ticdc/ticdc-sink-to-mysql.md#eventually-consistent-replication-in-disaster-scenarios) and ensure that the storage system where the redo log is written can be read normally when a disaster occurs in the upstream. +## Implementation of processing data changes + +This section mainly describes how TiCDC processes data changes generated by upstream DML operations. + +For data changes generated by upstream DDL operations, TiCDC obtains the complete DDL SQL statement, converts it into the corresponding format based on the sink type of the downstream, and sends it to the downstream. This section does not elaborate on this. + +> **Note:** +> +> The logic of how TiCDC processes data changes might be adjusted in subsequent versions. + +MySQL binlog directly records all DML SQL statements executed in the upstream. Unlike MySQL, TiCDC listens to the real-time information of each Region Raft Log in the upstream TiKV, and generates data change information based on the difference between the data before and after each transaction, which corresponds to multiple SQL statements. TiCDC only guarantees that the output change events are equivalent to the changes in the upstream TiDB, and does not guarantee that it can accurately restore the SQL statements that caused the data changes in the upstream TiDB. + +Data change information includes the data change types and the data values before and after the change. The difference between the data before and after the transaction can result in three types of events: + +1. The `DELETE` event: corresponds to a `DELETE` type data change message, which contains the data before the change. + +2. The `INSERT` event: corresponds to a `PUT` type data change message, which contains the data after the change. + +3. The `UPDATE` event: corresponds to a `PUT` type data change message, which contains the data both before and after the change. + +Based on the data change information, TiCDC generates data in the appropriate formats for different downstream types, and sends the data to the downstream. For example, it generates data in formats such as Canal-JSON and Avro, and writes the data to Kafka, or converts the data back into SQL statements and sends them to the downstream MySQL or TiDB. + +Currently, when TiCDC adapts data change information for the corresponding protocol, for specific `UPDATE` events, it might split them into one `DELETE` event and one `INSERT` event. For more information, see [Split update events into delete and insert events](/ticdc/ticdc-behavior-change.md#split-update-events-into-delete-and-insert-events). + +When the downstream is MySQL or TiDB, TiCDC cannot guarantee that the SQL statements written to the downstream are exactly the same as the SQL statements executed in the upstream. This is because TiCDC does not directly obtain the original DML statements executed in the upstream, but generates SQL statements based on the data change information. However, TiCDC ensures the consistency of the final results. + +For example, the following SQL statement is executed in the upstream: + +```sql +Create Table t1 (A int Primary Key, B int); + +BEGIN; +Insert Into t1 values(1,2); +Insert Into t1 values(2,2); +Insert Into t1 values(3,3); +Commit; + +Update t1 set b = 4 where b = 2; +``` + +TiCDC generates the following two SQL statements based on the data change information, and writes them to the downstream: + +```sql +INSERT INTO `test.t1` (`A`,`B`) VALUES (1,1),(2,2),(3,3); +UPDATE `test`.`t1` +SET `A` = CASE + WHEN `A` = 1 THEN 1 + WHEN `A` = 2 THEN 2 +END, `B` = CASE + WHEN `A` = 1 THEN 4 + WHEN `A` = 2 THEN 4 +END +WHERE `A` = 1 OR `A` = 2; +``` + ## Unsupported scenarios Currently, the following scenarios are not supported: From 0d4856971835bf9b6397f3da42e96c9d9752b13a Mon Sep 17 00:00:00 2001 From: Lilian Lee Date: Wed, 29 May 2024 11:53:50 +0800 Subject: [PATCH 4/8] releases: fix two outdated 404 links (#17646) --- releases/release-6.1.6.md | 2 +- releases/release-6.2.0.md | 2 +- releases/release-6.5.1.md | 2 +- releases/release-7.0.0.md | 6 +++--- releases/release-7.2.0.md | 2 +- 5 files changed, 7 insertions(+), 7 deletions(-) diff --git a/releases/release-6.1.6.md b/releases/release-6.1.6.md index ac2608d13fa85..17725f58ad6e3 100644 --- a/releases/release-6.1.6.md +++ b/releases/release-6.1.6.md @@ -92,7 +92,7 @@ Quick access: [Quick start](https://docs.pingcap.com/tidb/v6.1/quick-start-with- + TiDB Lightning - - Fix the issue that the conflict resolution logic (`duplicate-resolution`) might lead to inconsistent checksums [#40657](https://github.com/pingcap/tidb/issues/40657) @[gozssky](https://github.com/gozssky) + - Fix the issue that the conflict resolution logic (`duplicate-resolution`) might lead to inconsistent checksums [#40657](https://github.com/pingcap/tidb/issues/40657) @[sleepymole](https://github.com/sleepymole) - Fix the issue that TiDB Lightning panics in the split-region phase [#40934](https://github.com/pingcap/tidb/issues/40934) @[lance6716](https://github.com/lance6716) - Fix the issue that when importing data in Local Backend mode, the target columns do not automatically generate data if the compound primary key of the imported target table has an `auto_random` column and no value for the column is specified in the source data [#41454](https://github.com/pingcap/tidb/issues/41454) @[D3Hunter](https://github.com/D3Hunter) - Fix the issue that TiDB Lightning might incorrectly skip conflict resolution when all but the last TiDB Lightning instance encounters a local duplicate record during a parallel import [#40923](https://github.com/pingcap/tidb/issues/40923) @[lichunzhu](https://github.com/lichunzhu) diff --git a/releases/release-6.2.0.md b/releases/release-6.2.0.md index 468ecf8ebdac9..1c7d9e57dc1b7 100644 --- a/releases/release-6.2.0.md +++ b/releases/release-6.2.0.md @@ -222,7 +222,7 @@ In v6.2.0-DMR, the key new features and improvements are as follows: This feature does not need manual configuration. If your TiDB cluster is v6.1.0 or later versions and TiDB Lightning is v6.2.0 or later versions, the new physical import mode takes effect automatically. - [User document](/tidb-lightning/tidb-lightning-physical-import-mode-usage.md#scope-of-pausing-scheduling-during-import) [#35148](https://github.com/pingcap/tidb/issues/35148) @[gozssky](https://github.com/gozssky) + [User document](/tidb-lightning/tidb-lightning-physical-import-mode-usage.md#scope-of-pausing-scheduling-during-import) [#35148](https://github.com/pingcap/tidb/issues/35148) @[sleepymole](https://github.com/sleepymole) * Refactor the [user documentation of TiDB Lightning](/tidb-lightning/tidb-lightning-overview.md) to make its structure more reasonable and clear. The terms for "backend" is also modified to lower the understanding barrier for new users: diff --git a/releases/release-6.5.1.md b/releases/release-6.5.1.md index b0febb93ef432..25ee8c978a65e 100644 --- a/releases/release-6.5.1.md +++ b/releases/release-6.5.1.md @@ -182,6 +182,6 @@ Quick access: [Quick start](https://docs.pingcap.com/tidb/v6.5/quick-start-with- - Fix the issue that TiDB Lightning prechecks cannot find dirty data left by previously failed imports [#39477](https://github.com/pingcap/tidb/issues/39477) @[dsdashun](https://github.com/dsdashun) - Fix the issue that TiDB Lightning panics in the split-region phase [#40934](https://github.com/pingcap/tidb/issues/40934) @[lance6716](https://github.com/lance6716) - - Fix the issue that the conflict resolution logic (`duplicate-resolution`) might lead to inconsistent checksums [#40657](https://github.com/pingcap/tidb/issues/40657) @[gozssky](https://github.com/gozssky) + - Fix the issue that the conflict resolution logic (`duplicate-resolution`) might lead to inconsistent checksums [#40657](https://github.com/pingcap/tidb/issues/40657) @[sleepymole](https://github.com/sleepymole) - Fix the issue that TiDB Lightning might incorrectly skip conflict resolution when all but the last TiDB Lightning instance encounters a local duplicate record during a parallel import [#40923](https://github.com/pingcap/tidb/issues/40923) @[lichunzhu](https://github.com/lichunzhu) - Fix the issue that when importing data in Local Backend mode, the target columns do not automatically generate data if the compound primary key of the imported target table has an `auto_random` column and no value for the column is specified in the source data [#41454](https://github.com/pingcap/tidb/issues/41454) @[D3Hunter](https://github.com/D3Hunter) diff --git a/releases/release-7.0.0.md b/releases/release-7.0.0.md index 34e2adb119703..ddf2e5eda74ac 100644 --- a/releases/release-7.0.0.md +++ b/releases/release-7.0.0.md @@ -268,7 +268,7 @@ In v7.0.0-DMR, the key new features and improvements are as follows: For more information, see [documentation](/sql-statements/sql-statement-load-data.md). -* TiDB Lightning supports enabling compressed transfers when sending key-value pairs to TiKV (GA) [#41163](https://github.com/pingcap/tidb/issues/41163) @[gozssky](https://github.com/gozssky) +* TiDB Lightning supports enabling compressed transfers when sending key-value pairs to TiKV (GA) [#41163](https://github.com/pingcap/tidb/issues/41163) @[sleepymole](https://github.com/sleepymole) Starting from v6.6.0, TiDB Lightning supports compressing locally encoded and sorted key-value pairs for network transfer when sending them to TiKV, thus reducing the amount of data transferred over the network and lowering the network bandwidth overhead. In the earlier TiDB versions before this feature is supported, TiDB Lightning requires relatively high network bandwidth and incurs high traffic charges in case of large data volumes. @@ -419,7 +419,7 @@ In v7.0.0-DMR, the key new features and improvements are as follows: + TiDB Lightning - - TiDB Lightning Physical Import Mode supports separating data import and index import to improve import speed and stability [#42132](https://github.com/pingcap/tidb/issues/42132) @[gozssky](https://github.com/gozssky) + - TiDB Lightning Physical Import Mode supports separating data import and index import to improve import speed and stability [#42132](https://github.com/pingcap/tidb/issues/42132) @[sleepymole](https://github.com/sleepymole) Add the `add-index-by-sql` parameter. The default value is `false`, which means that TiDB Lightning encodes both row data and index data into KV pairs and import them into TiKV together. If you set it to `true`, it means that TiDB Lightning adds indexes via the `ADD INDEX` SQL statement after importing the row data to improve import speed and stability. @@ -504,7 +504,7 @@ We would like to thank the following contributors from the TiDB community: - [BornChanger](https://github.com/BornChanger) - [Dousir9](https://github.com/Dousir9) - [erwadba](https://github.com/erwadba) -- [HappyUncle](https://github.com/HappyUncle) +- [happy-v587](https://github.com/happy-v587) - [jiyfhust](https://github.com/jiyfhust) - [L-maple](https://github.com/L-maple) - [liumengya94](https://github.com/liumengya94) diff --git a/releases/release-7.2.0.md b/releases/release-7.2.0.md index 168dc1bebdde0..b5016238aa89d 100644 --- a/releases/release-7.2.0.md +++ b/releases/release-7.2.0.md @@ -324,7 +324,7 @@ We would like to thank the following contributors from the TiDB community: - [darraes](https://github.com/darraes) - [demoManito](https://github.com/demoManito) - [dhysum](https://github.com/dhysum) -- [HappyUncle](https://github.com/HappyUncle) +- [happy-v587](https://github.com/happy-v587) - [jiyfhust](https://github.com/jiyfhust) - [L-maple](https://github.com/L-maple) - [nyurik](https://github.com/nyurik) From 99ac41946e56236e1141fa137d9bea11317989ec Mon Sep 17 00:00:00 2001 From: Lilian Lee Date: Wed, 29 May 2024 14:40:20 +0800 Subject: [PATCH 5/8] releases: add 8.1.0 to release notes summary (#17659) --- releases/release-notes.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/releases/release-notes.md b/releases/release-notes.md index 78aa3d2ed58bc..b64bd0f9b9934 100644 --- a/releases/release-notes.md +++ b/releases/release-notes.md @@ -1,7 +1,7 @@ --- title: Release Notes aliases: ['/docs/dev/releases/release-notes/','/docs/dev/releases/rn/'] -summary: TiDB has released multiple versions, including 8.0.0-DMR, 7.6.0-DMR, 7.5.1, 7.5.0, 7.4.0-DMR, 7.3.0-DMR, 7.2.0-DMR, 7.1.4, 7.1.3, 7.1.2, 7.1.1, 7.1.0, 7.0.0-DMR, 6.6.0-DMR, 6.5.9, 6.5.8, 6.5.7, 6.5.6, 6.5.5, 6.5.4, 6.5.3, 6.5.2, 6.5.1, 6.5.0, 6.4.0-DMR, 6.3.0-DMR, 6.2.0-DMR, 6.1.7, 6.1.6, 6.1.5, 6.1.4, 6.1.3, 6.1.2, 6.1.1, 6.1.0, 6.0.0-DMR, 5.4.3, 5.4.2, 5.4.1, 5.4.0, 5.3.4, 5.3.3, 5.3.2, 5.3.1, 5.3.0, 5.2.4, 5.2.3, 5.2.2, 5.2.1, 5.2.0, 5.1.5, 5.1.4, 5.1.3, 5.1.2, 5.1.1, 5.1.0, 5.0.6, 5.0.5, 5.0.4, 5.0.3, 5.0.2, 5.0.1, 5.0.0, 5.0.0-rc, 4.0.16, 4.0.15, 4.0.14, 4.0.13, 4.0.12, 4.0.11, 4.0.10, 4.0.9, 4.0.8, 4.0.7, 4.0.6, 4.0.5, 4.0.4, 4.0.3, 4.0.2, 4.0.1, 4.0.0, 4.0.0-rc.2, 4.0.0-rc.1, 4.0.0-rc, 4.0.0-beta.2, 4.0.0-beta.1, 4.0.0-beta, 3.1.2, 3.1.1, 3.1.0, 3.1.0-rc, 3.1.0-beta.2, 3.1.0-beta.1, 3.1.0-beta, 3.0.20, 3.0.19, 3.0.18, 3.0.17, 3.0.16, 3.0.15, 3.0.14, 3.0.13, 3.0.12, 3.0.11, 3.0.10, 3.0.9, 3.0.8, 3.0.7, 3.0.6, 3.0.5, 3.0.4, 3.0.3, 3.0.2, 3.0.1, 3.0.0, 3.0.0-rc.3, 3.0.0-rc.2, 3.0.0-rc.1, 3.0.0-beta.1, 3.0.0-beta, 2.1.19, 2.1.18, 2.1.17, 2.1.16, 2.1.15, 2.1.14, 2.1.13, 2.1.12, 2.1.11, 2.1.10, 2.1.9, 2.1.8, 2.1.7, 2.1.6, 2.1.5, 2.1.4, 2.1.3, 2.1.2, 2.1.1, 2.1.0, 2.1.0-rc.5, 2.1.0-rc.4, 2.1.0-rc.3, 2.1.0-rc.2, 2.1.0-rc.1, 2.1.0-beta, 2.0.11, 2.0.10, 2.0.9, 2.0.8, 2.0.7, 2.0.6, 2.0.5, 2.0.4, 2.0.3, 2.0.2, 2.0.1, 2.0.0, 2.0.0-rc.5, 2.0.0-rc.4, 2.0.0-rc.3, 2.0.0-rc.1, 1.1.0-beta, 1.1.0-alpha, 1.0.8, 1.0.7, 1.0.6, 1.0.5, 1.0.4, 1.0.3, 1.0.2, 1.0.1, 1.0.0, Pre-GA, rc4, rc3, rc2, rc1. +summary: TiDB has released multiple versions, including 8.1.0, 8.0.0-DMR, 7.6.0-DMR, 7.5.1, 7.5.0, 7.4.0-DMR, 7.3.0-DMR, 7.2.0-DMR, 7.1.4, 7.1.3, 7.1.2, 7.1.1, 7.1.0, 7.0.0-DMR, 6.6.0-DMR, 6.5.9, 6.5.8, 6.5.7, 6.5.6, 6.5.5, 6.5.4, 6.5.3, 6.5.2, 6.5.1, 6.5.0, 6.4.0-DMR, 6.3.0-DMR, 6.2.0-DMR, 6.1.7, 6.1.6, 6.1.5, 6.1.4, 6.1.3, 6.1.2, 6.1.1, 6.1.0, 6.0.0-DMR, 5.4.3, 5.4.2, 5.4.1, 5.4.0, 5.3.4, 5.3.3, 5.3.2, 5.3.1, 5.3.0, 5.2.4, 5.2.3, 5.2.2, 5.2.1, 5.2.0, 5.1.5, 5.1.4, 5.1.3, 5.1.2, 5.1.1, 5.1.0, 5.0.6, 5.0.5, 5.0.4, 5.0.3, 5.0.2, 5.0.1, 5.0.0, 5.0.0-rc, 4.0.16, 4.0.15, 4.0.14, 4.0.13, 4.0.12, 4.0.11, 4.0.10, 4.0.9, 4.0.8, 4.0.7, 4.0.6, 4.0.5, 4.0.4, 4.0.3, 4.0.2, 4.0.1, 4.0.0, 4.0.0-rc.2, 4.0.0-rc.1, 4.0.0-rc, 4.0.0-beta.2, 4.0.0-beta.1, 4.0.0-beta, 3.1.2, 3.1.1, 3.1.0, 3.1.0-rc, 3.1.0-beta.2, 3.1.0-beta.1, 3.1.0-beta, 3.0.20, 3.0.19, 3.0.18, 3.0.17, 3.0.16, 3.0.15, 3.0.14, 3.0.13, 3.0.12, 3.0.11, 3.0.10, 3.0.9, 3.0.8, 3.0.7, 3.0.6, 3.0.5, 3.0.4, 3.0.3, 3.0.2, 3.0.1, 3.0.0, 3.0.0-rc.3, 3.0.0-rc.2, 3.0.0-rc.1, 3.0.0-beta.1, 3.0.0-beta, 2.1.19, 2.1.18, 2.1.17, 2.1.16, 2.1.15, 2.1.14, 2.1.13, 2.1.12, 2.1.11, 2.1.10, 2.1.9, 2.1.8, 2.1.7, 2.1.6, 2.1.5, 2.1.4, 2.1.3, 2.1.2, 2.1.1, 2.1.0, 2.1.0-rc.5, 2.1.0-rc.4, 2.1.0-rc.3, 2.1.0-rc.2, 2.1.0-rc.1, 2.1.0-beta, 2.0.11, 2.0.10, 2.0.9, 2.0.8, 2.0.7, 2.0.6, 2.0.5, 2.0.4, 2.0.3, 2.0.2, 2.0.1, 2.0.0, 2.0.0-rc.5, 2.0.0-rc.4, 2.0.0-rc.3, 2.0.0-rc.1, 1.1.0-beta, 1.1.0-alpha, 1.0.8, 1.0.7, 1.0.6, 1.0.5, 1.0.4, 1.0.3, 1.0.2, 1.0.1, 1.0.0, Pre-GA, rc4, rc3, rc2, rc1. --- # TiDB Release Notes From e2f5023b47929d7a42cd5543254d54b2aead266b Mon Sep 17 00:00:00 2001 From: Grace Cai Date: Wed, 29 May 2024 15:26:53 +0800 Subject: [PATCH 6/8] fix the description of tidb_query_duration (#17641) --- alert-rules.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/alert-rules.md b/alert-rules.md index d3b7eef544875..61fba8e2b4b31 100644 --- a/alert-rules.md +++ b/alert-rules.md @@ -121,7 +121,7 @@ This section gives the alert rules for the TiDB component. * Description: - The latency of handling a request in TiDB. If the ninety-ninth percentile latency exceeds 1 second, an alert is triggered. + The latency of handling a request in TiDB. The response time for 99% of requests should be within 1 second; otherwise, an alert is triggered. * Solution: From b480789ed754c8ae02df236656dbab281b047157 Mon Sep 17 00:00:00 2001 From: lidezhu <47731263+lidezhu@users.noreply.github.com> Date: Thu, 30 May 2024 10:16:22 +0800 Subject: [PATCH 7/8] ticdc: add description for cdc behaviour change (#17530) --- releases/release-8.1.0.md | 1 + ticdc/ticdc-behavior-change.md | 79 ++++++++++++++++++++++++++++++---- 2 files changed, 72 insertions(+), 8 deletions(-) diff --git a/releases/release-8.1.0.md b/releases/release-8.1.0.md index 634663d60de26..76a954360180e 100644 --- a/releases/release-8.1.0.md +++ b/releases/release-8.1.0.md @@ -172,6 +172,7 @@ Compared with the previous LTS 7.5.0, 8.1.0 includes new features, improvements, * In earlier versions, the `tidb.tls` configuration item in TiDB Lightning treats values `"false"` and `""` the same, as well as treating the values `"preferred"` and `"skip-verify"` the same. Starting from v8.1.0, TiDB Lightning distinguishes the behavior of `"false"`, `""`, `"skip-verify"`, and `"preferred"` for `tidb.tls`. For more information, see [TiDB Lightning configuration](/tidb-lightning/tidb-lightning-configuration.md). * For tables with `AUTO_ID_CACHE=1`, TiDB supports a [centralized auto-increment ID allocating service](/auto-increment.md#mysql-compatibility-mode). In earlier versions, the primary TiDB node of this service automatically performs a `forceRebase` operation when the TiDB process exits (for example, during the TiDB node restart) to keep auto-assigned IDs as consecutive as possible. However, when there are too many tables with `AUTO_ID_CACHE=1`, executing `forceRebase` becomes very time-consuming, preventing TiDB from restarting promptly and even blocking data writes, thus affecting system availability. To resolve this issue, starting from v8.1.0, TiDB removes the `forceRebase` behavior, but this change will cause some auto-assigned IDs to be non-consecutive during the failover. +* In earlier versions, when processing a transaction containing `UPDATE` changes, if the primary key or non-null unique index value is modified in an `UPDATE` event, TiCDC splits this event into `DELETE` and `INSERT` events. Starting from v8.1.0, when using the MySQL sink, TiCDC only splits an `UPDATE` event into `DELETE` and `INSERT` events if the primary key or non-null unique index value is modified in the `UPDATE` event and the transaction `commitTS` is less than TiCDC `thresholdTs` (which is the current timestamp that TiCDC fetches from PD at TiCDC startup). This behavior change addresses the issue of downstream data inconsistencies caused by the potentially incorrect order of `UPDATE` events received by TiCDC, which can lead to an incorrect order of split `DELETE` and `INSERT` events. For more information, see [documentation](/ticdc/ticdc-behavior-change.md#mysql-sink). ### System variables diff --git a/ticdc/ticdc-behavior-change.md b/ticdc/ticdc-behavior-change.md index e66a60bca88f5..317b971071531 100644 --- a/ticdc/ticdc-behavior-change.md +++ b/ticdc/ticdc-behavior-change.md @@ -5,11 +5,11 @@ summary: Introduce the behavior changes of TiCDC changefeed, including the reaso # TiCDC Behavior Changes -## Split update events into delete and insert events +## Split `UPDATE` events into `DELETE` and `INSERT` events -### Transactions containing a single update change +### Transactions containing a single `UPDATE` change -Starting from v6.5.3, v7.1.1, and v7.2.0, when using a non-MySQL sink, for transactions that only contain a single update change, if the primary key or non-null unique index value is modified in the update event, TiCDC splits this event into delete and insert events. For more information, see GitHub issue [#9086](https://github.com/pingcap/tiflow/issues/9086). +Starting from v6.5.3, v7.1.1, and v7.2.0, when using a non-MySQL sink, for transactions that only contain a single update change, if the primary key or non-null unique index value is modified in an `UPDATE` event, TiCDC splits this event into `DELETE` and `INSERT` events. For more information, see GitHub issue [#9086](https://github.com/pingcap/tiflow/issues/9086). This change primarily addresses the following issues: @@ -24,14 +24,14 @@ INSERT INTO t VALUES (1, 1); UPDATE t SET a = 2 WHERE a = 1; ``` -In this example, the primary key `a` is updated from `1` to `2`. If the update event is not split: +In this example, the primary key `a` is updated from `1` to `2`. If the `UPDATE` event is not split: * When using the CSV and AVRO protocols, the consumer only obtains the new value `a = 2` and cannot obtain the old value `a = 1`. This might cause the downstream consumer to only insert the new value `2` without deleting the old value `1`. -* When using the index value dispatcher, the event for inserting `(1, 1)` might be sent to Partition 0, and the update event `(2, 1)` might be sent to Partition 1. If the consumption progress of Partition 1 is faster than that of Partition 0, an error might occur due to the absence of corresponding data in the downstream. Therefore, TiCDC splits the update event into delete and insert events. The event for deleting `(1, 1)` is sent to Partition 0, and the event for writing `(2, 1)` is sent to Partition 1, ensuring that the events are consumed successfully regardless of the progress of the consumer. +* When using the index value dispatcher, the event for inserting `(1, 1)` might be sent to Partition 0, and the `UPDATE` event `(2, 1)` might be sent to Partition 1. If the consumption progress of Partition 1 is faster than that of Partition 0, an error might occur due to the absence of corresponding data in the downstream. Therefore, TiCDC splits the `UPDATE` event into `DELETE` and `INSERT` events. The event for deleting `(1, 1)` is sent to Partition 0, and the event for writing `(2, 1)` is sent to Partition 1, ensuring that the events are consumed successfully regardless of the progress of the consumer. -### Transactions containing multiple update changes +### Transactions containing multiple `UPDATE` changes -Starting from v6.5.4, v7.1.2, and v7.4.0, for transactions containing multiple changes, if the primary key or non-null unique index value is modified in the update event, TiCDC splits the event into delete and insert events and ensures that all events follow the sequence of delete events preceding insert events. For more information, see GitHub issue [#9430](https://github.com/pingcap/tiflow/issues/9430). +Starting from v6.5.4, v7.1.2, and v7.4.0, for transactions containing multiple changes, if the primary key or non-null unique index value is modified in the `UPDATE` event, TiCDC splits the event into `DELETE` and `INSERT` events and ensures that all events follow the sequence of `DELETE` events preceding `INSERT` events. For more information, see GitHub issue [#9430](https://github.com/pingcap/tiflow/issues/9430). This change primarily addresses the potential issue of primary key or unique key conflicts when using the MySQL sink to directly write these two events to the downstream, leading to changefeed errors. When using the Kafka sink or other sinks, you might encounter the same error if the consumer writes messages to a relational database or performs similar operation. @@ -49,6 +49,69 @@ UPDATE t SET a = 2 WHERE a = 3; COMMIT; ``` -In this example, by executing three SQL statements to swap the primary keys of two rows, TiCDC only receives two update change events, that is, changing the primary key `a` from `1` to `2` and changing the primary key `a` from `2` to `1`. If the MySQL sink directly writes these two update events to the downstream, a primary key conflict might occur, leading to changefeed errors. +In this example, by executing three SQL statements to swap the primary keys of two rows, TiCDC only receives two update change events, that is, changing the primary key `a` from `1` to `2` and changing the primary key `a` from `2` to `1`. If the MySQL sink directly writes these two `UPDATE` events to the downstream, a primary key conflict might occur, leading to changefeed errors. Therefore, TiCDC splits these two events into four events, that is, deleting records `(1, 1)` and `(2, 2)` and writing records `(2, 1)` and `(1, 2)`. + +### MySQL sink + +Starting from v8.1.0, when using the MySQL sink, TiCDC fetches the current timestamp `thresholdTs` from PD at startup and decides whether to split `UPDATE` events based on the value of this timestamp: + +- For transactions containing one or multiple `UPDATE` changes, if the primary key or non-null unique index value is modified in an `UPDATE` event and the transaction `commitTS` is less than `thresholdTs`, TiCDC splits the `UPDATE` event into a `DELETE` event and an `INSERT` event before writing them to the Sorter module. +- For `UPDATE` events with the transaction `commitTS` greater than or equal to `thresholdTs`, TiCDC does not split them. For more information, see GitHub issue [#10918](https://github.com/pingcap/tiflow/issues/10918). + +This behavior change addresses the issue of downstream data inconsistencies caused by the potentially incorrect order of `UPDATE` events received by TiCDC, which can lead to an incorrect order of split `DELETE` and `INSERT` events. + +Take the following SQL statements as an example: + +```sql +CREATE TABLE t (a INT PRIMARY KEY, b INT); +INSERT INTO t VALUES (1, 1); +INSERT INTO t VALUES (2, 2); + +BEGIN; +UPDATE t SET a = 3 WHERE a = 2; +UPDATE t SET a = 2 WHERE a = 1; +COMMIT; +``` + +In this example, the two `UPDATE` statements within the transaction have a sequential dependency on execution. The primary key `a` is changed from `2` to `3`, and then the primary key `a` is changed from `1` to `2`. After this transaction is executed, the records in the upstream database are `(2, 1)` and `(3, 2)`. + +However, the order of `UPDATE` events received by TiCDC might differ from the actual execution order of the upstream transaction. For example: + +```sql +UPDATE t SET a = 2 WHERE a = 1; +UPDATE t SET a = 3 WHERE a = 2; +``` + +- Before this behavior change, TiCDC writes these `UPDATE` events to the Sorter module and then splits them into `DELETE` and `INSERT` events. After the split, the actual execution order of these events in the downstream is as follows: + + ```sql + BEGIN; + DELETE FROM t WHERE a = 1; + REPLACE INTO t VALUES (2, 1); + DELETE FROM t WHERE a = 2; + REPLACE INTO t VALUES (3, 2); + COMMIT; + ``` + + After the downstream executes the transaction, the records in the database are `(3, 2)`, which are different from the records in the upstream database (`(2, 1)` and `(3, 2)`), indicating a data inconsistency issue. + +- After this behavior change, if the transaction `commitTS` is less than the `thresholdTs` obtained by TiCDC at startup, TiCDC splits these `UPDATE` events into `DELETE` and `INSERT` events before writing them to the Sorter module. After the sorting by the Sorter module, the actual execution order of these events in the downstream is as follows: + + ```sql + BEGIN; + DELETE FROM t WHERE a = 1; + DELETE FROM t WHERE a = 2; + REPLACE INTO t VALUES (2, 1); + REPLACE INTO t VALUES (3, 2); + COMMIT; + ``` + + After the downstream executes the transaction, the records in the downstream database are the same as those in the upstream database, which are `(2, 1)` and `(3, 2)`, ensuring data consistency. + +As you can see from the preceding example, splitting the `UPDATE` event into `DELETE` and `INSERT` events before writing them to the Sorter module ensures that all `DELETE` events are executed before `INSERT` events after the split, thereby maintaining data consistency regardless of the order of `UPDATE` events received by TiCDC. + +> **Note:** +> +> After this behavior change, when using the MySQL sink, TiCDC does not split the `UPDATE` event in most cases. Consequently, there might be primary key or unique key conflicts during changefeed runtime, causing the changefeed to restart automatically. After the restart, TiCDC will split the conflicting `UPDATE` events into `DELETE` and `INSERT` events before writing them to the Sorter module. This ensures that all events within the same transaction are correctly ordered, with all `DELETE` events preceding `INSERT` events, thus correctly completing data replication. \ No newline at end of file From de3ea0741a4a3f4399efcfdc51c3b8455ad459f2 Mon Sep 17 00:00:00 2001 From: Xiang Zhang Date: Fri, 31 May 2024 11:46:22 +0800 Subject: [PATCH 8/8] remove redundant lightning command line flags doc (#17678) --- .../tidb-lightning-configuration.md | 51 ------------------- .../tidb-lightning-physical-import-mode.md | 2 +- tidb-lightning/tidb-lightning-prechecks.md | 2 +- 3 files changed, 2 insertions(+), 53 deletions(-) diff --git a/tidb-lightning/tidb-lightning-configuration.md b/tidb-lightning/tidb-lightning-configuration.md index 741f565360297..2d14f5cc4ce5c 100644 --- a/tidb-lightning/tidb-lightning-configuration.md +++ b/tidb-lightning/tidb-lightning-configuration.md @@ -463,54 +463,3 @@ log-progress = "5m" # The default value is 60 seconds. # check-disk-quota = "60s" ``` - -## Command line parameters - -### Usage of `tidb-lightning` - -| Parameter | Explanation | Corresponding setting | -|:----|:----|:----| -| --config *file* | Reads global configuration from *file*. If not specified, the default configuration would be used. | | -| -V | Prints program version | | -| -d *directory* | Directory or [external storage URI](/external-storage-uri.md) of the data dump to read from | `mydumper.data-source-dir` | -| -L *level* | Log level: debug, info, warn, error, fatal (default = info) | `lightning.log-level` | -| -f *rule* | [Table filter rules](/table-filter.md) (can be specified multiple times) | `mydumper.filter` | -| --backend *[backend](/tidb-lightning/tidb-lightning-overview.md)* | Select an import mode. `local` refers to the physical import mode; `tidb` refers to the logical import mode. | `local` | -| --log-file *file* | Log file path. By default, it is `/tmp/lightning.log.{timestamp}`. If set to '-', it means that the log files will be output to stdout. | `lightning.log-file` | -| --status-addr *ip:port* | Listening address of the TiDB Lightning server | `lightning.status-port` | -| --pd-urls *host:port* | PD endpoint address | `tidb.pd-addr` | -| --tidb-host *host* | TiDB server host | `tidb.host` | -| --tidb-port *port* | TiDB server port (default = 4000) | `tidb.port` | -| --tidb-status *port* | TiDB status port (default = 10080) | `tidb.status-port` | -| --tidb-user *user* | User name to connect to TiDB | `tidb.user` | -| --tidb-password *password* | Password to connect to TiDB. The password can either be plaintext or Base64 encoded. | `tidb.password` | -| --enable-checkpoint *bool* | Whether to enable checkpoints (default = true) | `checkpoint.enable` | -| --analyze *level* | Analyze tables after importing. Available values are "required", "optional" (default value), and "off" | `post-restore.analyze` | -| --checksum *level* | Compare checksum after importing. Available values are "required" (default value), "optional", and "off" | `post-restore.checksum` | -| --check-requirements *bool* | Check cluster version compatibility before starting the task, and check whether TiKV has more than 10% free space left during running time. (default = true) | `lightning.check-requirements` | -| --ca *file* | CA certificate path for TLS connection | `security.ca-path` | -| --cert *file* | Certificate path for TLS connection | `security.cert-path` | -| --key *file* | Private key path for TLS connection | `security.key-path` | -| --server-mode | Start TiDB Lightning in server mode | `lightning.server-mode` | - -If a command line parameter and the corresponding setting in the configuration file are both provided, the command line parameter will be used. For example, running `tiup tidb-lightning -L debug --config cfg.toml` would always set the log level to "debug" regardless of the content of `cfg.toml`. - -## Usage of `tidb-lightning-ctl` - -This tool can execute various actions given one of the following parameters: - -| Parameter | Explanation | -|:----|:----| -| --compact | Performs a full compaction | -| --switch-mode *mode* | Switches every TiKV store to the given mode: normal, import | -| --fetch-mode | Prints the current mode of every TiKV store | -| --import-engine *uuid* | Imports the closed engine file from TiKV Importer into the TiKV cluster | -| --cleanup-engine *uuid* | Deletes the engine file from TiKV Importer | -| --checkpoint-dump *folder* | Dumps current checkpoint as CSVs into the folder | -| --checkpoint-error-destroy *tablename* | Removes the checkpoint and drops the table if it caused error | -| --checkpoint-error-ignore *tablename* | Ignores any error recorded in the checkpoint involving the given table | -| --checkpoint-remove *tablename* | Unconditionally removes the checkpoint of the table | - -The *tablename* must either be a qualified table name in the form `` `db`.`tbl` `` (including the backquotes), or the keyword "all". - -Additionally, all parameters of `tidb-lightning` described in the section above are valid in `tidb-lightning-ctl`. diff --git a/tidb-lightning/tidb-lightning-physical-import-mode.md b/tidb-lightning/tidb-lightning-physical-import-mode.md index 6e017e0ba944b..032b9fecdfa3b 100644 --- a/tidb-lightning/tidb-lightning-physical-import-mode.md +++ b/tidb-lightning/tidb-lightning-physical-import-mode.md @@ -21,7 +21,7 @@ The backend for the physical import mode is `local`. You can modify it in `tidb- 1. Before importing data, TiDB Lightning automatically switches the TiKV nodes to "import mode", which improves write performance and stops auto-compaction. TiDB Lightning determines whether to pause global scheduling according to the TiDB Lightning version. - - Starting from v7.1.0, you can you can control the scope of pausing scheduling by using the TiDB Lightning parameter [`pause-pd-scheduler-scope`](/tidb-lightning/tidb-lightning-configuration.md). + - Starting from v7.1.0, you can control the scope of pausing scheduling by using the TiDB Lightning parameter [`pause-pd-scheduler-scope`](/tidb-lightning/tidb-lightning-configuration.md). - For TiDB Lightning versions between v6.2.0 and v7.0.0, the behavior of pausing global scheduling depends on the TiDB cluster version. When the TiDB cluster >= v6.1.0, TiDB Lightning pauses scheduling for the Region that stores the target table data. After the import is completed, TiDB Lightning recovers scheduling. For other versions, TiDB Lightning pauses global scheduling. - When TiDB Lightning < v6.2.0, TiDB Lightning pauses global scheduling. diff --git a/tidb-lightning/tidb-lightning-prechecks.md b/tidb-lightning/tidb-lightning-prechecks.md index c5ae306246c71..c55455afc9bce 100644 --- a/tidb-lightning/tidb-lightning-prechecks.md +++ b/tidb-lightning/tidb-lightning-prechecks.md @@ -14,7 +14,7 @@ The following table describes each check item and detailed explanation. | Cluster version and status| >= 5.3.0 | Check whether the cluster can be connected in the configuration, and whether the TiKV/PD/TiFlash version supports the physical import mode. | | Permissions | >= 5.3.0 | When the data source is cloud storage (Amazon S3), check whether TiDB Lightning has the necessary permissions and make sure that the import will not fail due to lack of permissions. | | Disk space | >= 5.3.0 | Check whether there is enough space on the local disk and on the TiKV cluster for importing data. TiDB Lightning samples the data sources and estimates the percentage of the index size from the sample result. Because indexes are included in the estimation, there might be cases where the size of the source data is less than the available space on the local disk, but still, the check fails. In the physical import mode, TiDB Lightning also checks whether the local storage is sufficient because external sorting needs to be done locally. For more details about the TiKV cluster space and local storage space (controlled by `sort-kv-dir`), see [Downstream storage space requirements](/tidb-lightning/tidb-lightning-requirements.md#storage-space-of-the-target-database) and [Resource requirements](/tidb-lightning/tidb-lightning-physical-import-mode.md#environment-requirements). | -| Region distribution status | >= 5.3.0 | Check whether the Regions in the TiKV cluster are distributed evenly and whether there are too many empty Regions. If the number of empty Regions exceeds max(1000, number of tables * 3), i.e. greater than the bigger one of "1000" or "3 times the number of tables ", then the import cannot be executed. | +| Region distribution status | >= 5.3.0 | Check whether the Regions in the TiKV cluster are distributed evenly and whether there are too many empty Regions. If the number of empty Regions exceeds max(1000, number of tables * 3), i.e. greater than the bigger one of "1000" or "3 times the number of tables", then the import cannot be executed. | | Exceedingly Large CSV files in the data file | >= 5.3.0 | When there are CSV files larger than 10 GiB in the backup file and auto-slicing is not enabled (StrictFormat=false), it will impact the import performance. The purpose of this check is to remind you to ensure the data is in the right format and to enable auto-slicing. | | Recovery from breakpoints | >= 5.3.0 | This check ensures that no changes are made to the source file or schema in the database during the breakpoint recovery process that would result in importing the wrong data. | | Import into an existing table | >= 5.3.0 | When importing into an already created table, it checks, as much as possible, whether the source file matches the existing table. Check if the number of columns matches. If the source file has column names, check if the column names match. When there are default columns in the source file, it checks if the default columns have Default Value, and if they have, the check passes. |