Skip to content

Commit

Permalink
ticdc: update ticdc docs (#15046)
Browse files Browse the repository at this point in the history
  • Loading branch information
qiancai authored Oct 30, 2023
1 parent 6c78052 commit d7417e7
Show file tree
Hide file tree
Showing 6 changed files with 27 additions and 28 deletions.
2 changes: 1 addition & 1 deletion hardware-and-software-requirements.md
Original file line number Diff line number Diff line change
Expand Up @@ -179,7 +179,7 @@ Before you deploy TiFlash, note the following items:
- It is recommended to deploy TiFlash on different nodes from TiKV. If you must deploy TiFlash and TiKV on the same node, increase the number of CPU cores and memory, and try to deploy TiFlash and TiKV on different disks to avoid interfering each other.
- The total capacity of the TiFlash disks is calculated in this way: `the data volume of the entire TiKV cluster to be replicated / the number of TiKV replicas * the number of TiFlash replicas`. For example, if the overall planned capacity of TiKV is 1 TB, the number of TiKV replicas is 3, and the number of TiFlash replicas is 2, then the recommended total capacity of TiFlash is `1024 GB / 3 * 2`. You can replicate only the data of some tables. In such case, determine the TiFlash capacity according to the data volume of the tables to be replicated.

Before you deploy TiCDC, note that it is recommended to deploy TiCDC on PCIe-SSD disks larger than 1 TB.
Before you deploy TiCDC, note that it is recommended to deploy TiCDC on PCIe-SSD disks larger than 500 GB.

## Network requirements

Expand Down
4 changes: 2 additions & 2 deletions ticdc/deploy-ticdc.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,9 @@ In production environments, the recommendations of software and hardware for TiC
| Red Hat Enterprise Linux | 7.3 or later versions |
| CentOS | 7.3 or later versions |

| CPU | Memory | Disk type | Network | Number of TiCDC cluster instances (minimum requirements for production environment) |
| CPU | Memory | Disk | Network | Number of TiCDC cluster instances (minimum requirements for production environment) |
| :--- | :--- | :--- | :--- | :--- |
| 16 core+ | 64 GB+ | SSD | 10 Gigabit network card (2 preferred) | 2 |
| 16 core+ | 64 GB+ | 500 GB+ SSD | 10 Gigabit network card (2 preferred) | 2 |

For more information, see [Software and Hardware Recommendations](/hardware-and-software-requirements.md).

Expand Down
18 changes: 12 additions & 6 deletions ticdc/ticdc-faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,10 +49,7 @@ The expected output is as follows:
```

* `checkpoint`: TiCDC has replicated all data before this timestamp to downstream.
* `state`: The state of this replication task:
* `normal`: The task runs normally.
* `stopped`: The task is stopped manually or encounters an error.
* `removed`: The task is removed.
* `state`: The state of this replication task. For more information about each state and its meaning, see [Changefeed states](/ticdc/ticdc-changefeed-overview.md#changefeed-state-transfer).

> **Note:**
>
Expand Down Expand Up @@ -281,15 +278,24 @@ For TiCDC versions earlier than v6.5.2, it is recommended that you deploy TiCDC

## What is the order of executing DML and DDL statements?

The execution order is: DML -> DDL -> DML. To ensure that the table schema is correct when DML events are executed downstream during data replication, it is necessary to coordinate the execution order of DDL and DML statements. Currently, TiCDC adopts a simple approach: it replicates all DML statements before the DDL ts to downstream first, and then replicates DDL statements.
Currently, TiCDC adopts the following order:

1. TiCDC blocks the replication progress of the tables affected by DDL statements until the DDL `CommiTs`. This ensures that DML statements executed before DDL `CommiTs` can be successfully replicated to the downstream first.
2. TiCDC continues with the replication of DDL statements. If there are multiple DDL statements, TiCDC replicates them in a serial manner.
3. After the DDL statements are executed in the downstream, TiCDC will continue with the replication of DML statements executed after DDL `CommiTs`.

## How should I check whether the upstream and downstream data is consistent?

If the downstream is a TiDB cluster or MySQL instance, it is recommended that you compare the data using [sync-diff-inspector](/sync-diff-inspector/sync-diff-inspector-overview.md).

## Replication of a single table can only be run on a single TiCDC node. Will it be possible to use multiple TiCDC nodes to replicate data of multiple tables?

This feature is currently not supported, which might be supported in a future release. By then, TiCDC might replicate data change logs by TiKV Region, which means scalable processing capability.
Starting from v7.1.0, TiCDC supports the MQ sink to replicate data change logs at the granularity of TiKV Regions, which achieves scalable processing capability and allows TiCDC to replicate a single table with a large number of Regions. To enable this feature, you can configure the following parameter in the [TiCDC configuration file](/ticdc/ticdc-changefeed-config.md):

```toml
[scheduler]
enable-table-across-nodes = true
```

## Does TiCDC replication get stuck if the upstream has long-running uncommitted transactions?

Expand Down
2 changes: 1 addition & 1 deletion ticdc/ticdc-open-api-v2.md
Original file line number Diff line number Diff line change
Expand Up @@ -331,7 +331,7 @@ The `sink` parameters are described as follows:
| `schema_registry` | `STRING` type. The schema registry address. (Optional) |
| `terminator` | `STRING` type. The terminator is used to separate two data change events. The default value is null, which means `"\r\n"` is used as the terminator. (Optional) |
| `transaction_atomicity` | `STRING` type. The atomicity level of the transaction. (Optional) |
| `only_output_updated_columns` | `BOOLEAN` type. For MQ sinks using the `canal-json` or `open-protocol` protocol, you can specify whether only output the modified columns. The default value is `false`. |
| `only_output_updated_columns` | `BOOLEAN` type. For MQ sinks using the `canal-json` or `open-protocol` protocol, you can specify whether only output the modified columns. The default value is `false`. (Optional) |

`sink.column_selectors` is an array. The parameters are described as follows:

Expand Down
7 changes: 5 additions & 2 deletions ticdc/ticdc-overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,14 +75,17 @@ As shown in the architecture diagram, TiCDC supports replicating data to TiDB, M

## Best practices

- If the network latency between two TiDB clusters is higher than 100 ms, it is recommended to deploy TiCDC in the region (IDC) where the downstream TiDB cluster is located when replicating data between the two clusters.
- When you use TiCDC to replicate data between two TiDB clusters, if the network latency between the two clusters is higher than 100 ms:

- For TiCDC versions earlier than v6.5.2, it is recommended to deploy TiCDC in the region (IDC) where the downstream TiDB cluster is located.
- With a series of improvements introduced starting from TiCDC v6.5.2, it is recommended to deploy TiCDC in the region (IDC) where the upstream TiDB cluster is located.

- TiCDC only replicates tables that have at least one valid index. A valid index is defined as follows:

- A primary key (`PRIMARY KEY`) is a valid index.
- A unique index (`UNIQUE INDEX`) is valid if every column of the index is explicitly defined as non-nullable (`NOT NULL`) and the index does not have a virtual generated column (`VIRTUAL GENERATED COLUMNS`).

- To use TiCDC in disaster recovery scenarios, you need to configure [redo log](/ticdc/ticdc-sink-to-mysql.md#eventually-consistent-replication-in-disaster-scenarios).
- When you replicate a wide table with a large single row (greater than 1K), it is recommended to configure the [`per-table-memory-quota`](/ticdc/ticdc-server-config.md) so that `per-table-memory-quota` = `ticdcTotalMemory`/(`tableCount` * 2). `ticdcTotalMemory` is the memory of a TiCDC node, and `tableCount` is the number of target tables that a TiCDC node replicates.

## Unsupported scenarios

Expand Down
22 changes: 6 additions & 16 deletions ticdc/troubleshoot-ticdc.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ This document introduces the common errors you might encounter when using TiCDC,

> **Note:**
>
> In this document, the server address specified in `cdc cli` commands is `server=http://127.0.0.1:8300`. When you use the command, replace the address with your actual PD address.
> In this document, the server address specified in `cdc cli` commands is `server=http://127.0.0.1:8300`. When you use the command, replace the address with your actual TiCDC Server address.
## TiCDC replication interruptions

Expand All @@ -31,12 +31,7 @@ You can know whether the replication task is stopped manually by executing `cdc
cdc cli changefeed query --server=http://127.0.0.1:8300 --changefeed-id 28c43ffc-2316-4f4f-a70b-d1a7c59ba79f
```

In the output of the above command, `admin-job-type` shows the state of this replication task:

- `0`: In progress, which means that the task is not stopped manually.
- `1`: Paused. When the task is paused, all replicated `processor`s exit. The configuration and the replication status of the task are retained, so you can resume the task from `checkpiont-ts`.
- `2`: Resumed. The replication task resumes from `checkpoint-ts`.
- `3`: Removed. When the task is removed, all replicated `processor`s are ended, and the configuration information of the replication task is cleared up. The replication status is retained only for later queries.
In the output of the above command, `admin-job-type` shows the state of this replication task. For more information about each state and its meaning, see [Changefeed states](/ticdc/ticdc-changefeed-overview.md#changefeed-state-transfer).

### How do I handle replication interruptions?

Expand All @@ -46,22 +41,17 @@ A replication task might be interrupted in the following known scenarios:

- In this scenario, TiCDC saves the task information. Because TiCDC has set the service GC safepoint in PD, the data after the task checkpoint is not cleaned by TiKV GC within the valid period of `gc-ttl`.

- Handling method: You can resume the replication task via the HTTP interface after the downstream is back to normal.
- Handling method: You can resume the replication task by executing `cdc cli changefeed resume` after the downstream is back to normal.

- Replication cannot continue because of incompatible SQL statement(s) in the downstream.

- In this scenario, TiCDC saves the task information. Because TiCDC has set the service GC safepoint in PD, the data after the task checkpoint is not cleaned by TiKV GC within the valid period of `gc-ttl`.
- Handling procedures:
1. Query the status information of the replication task using the `cdc cli changefeed query` command and record the value of `checkpoint-ts`.
2. Use the new task configuration file and add the `ignore-txn-start-ts` parameter to skip the transaction corresponding to the specified `start-ts`.
3. Stop the old replication task via HTTP API. Execute `cdc cli changefeed create` to create a new task and specify the new task configuration file. Specify `checkpoint-ts` recorded in step 1 as the `start-ts` and start a new task to resume the replication.

- In TiCDC v4.0.13 and earlier versions, when TiCDC replicates the partitioned table, it might encounter an error that leads to replication interruption.

- In this scenario, TiCDC saves the task information. Because TiCDC has set the service GC safepoint in PD, the data after the task checkpoint is not cleaned by TiKV GC within the valid period of `gc-ttl`.
- Handling procedures:
1. Pause the replication task by executing `cdc cli changefeed pause -c <changefeed-id>`.
2. Wait for about one munite, and then resume the replication task by executing `cdc cli changefeed resume -c <changefeed-id>`.
3. Pause the replication task by executing `cdc cli changefeed pause -c <changefeed-id>`.
4. Specify the new task configuration file by executing `cdc cli changefeed update -c <changefeed-id> --config <config-file-path>`.
5. Resume the replication task by executing `cdc cli changefeed resume -c <changefeed-id>`.

### What should I do to handle the OOM that occurs after TiCDC is restarted after a task interruption?

Expand Down

0 comments on commit d7417e7

Please sign in to comment.