Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added event filter docs and consistency updates #15264

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions tidb-cloud/changefeed-overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,8 +65,10 @@ It takes about 10 minutes to complete the scaling process (during which the chan

TiDB Cloud populates the changefeed configuration by default. You can modify the following configurations:

- MySQL sink: **MySQL Connection** and **Table Filter**.
- Kafka sink: all configurations.
- Apache Kafka sink: all configurations.
- MySQL sink: **MySQL Connection**, **Table Filter**, and **Event Filter**.
- TiDB Cloud sink: **TiDB Cloud Connection**, **Table Filter**, and **Event Filter**.
- Cloud storage sink: **Storage Endpoint**, **Table Filter**, and **Event Filter**.

4. After editing the configuration, click **...** > **Resume** to resume the corresponding changefeed.

Expand Down
27 changes: 16 additions & 11 deletions tidb-cloud/changefeed-sink-to-apache-kafka.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ This document describes how to create a changefeed to stream data from TiDB Clou

## Restrictions

- For each TiDB Cloud cluster, you can create up to 5 changefeeds.
- For each TiDB Cloud cluster, you can create up to 100 changefeeds.
- Currently, TiDB Cloud does not support uploading self-signed TLS certificates to connect to Kafka brokers.
- Because TiDB Cloud uses TiCDC to establish changefeeds, it has the same [restrictions as TiCDC](https://docs.pingcap.com/tidb/stable/ticdc-overview#unsupported-scenarios).
- If the table to be replicated does not have a primary key or a non-null unique index, the absence of a unique constraint during replication could result in duplicated data being inserted downstream in some retry scenarios.
Expand Down Expand Up @@ -80,25 +80,30 @@ For example, if your Kafka cluster is in Confluent Cloud, you can see [Resources

1. Customize **Table Filter** to filter the tables that you want to replicate. For the rule syntax, refer to [table filter rules](/table-filter.md).

- **Add filter rules**: you can set filter rules in this column. By default, there is a rule `*.*`, which stands for replicating all tables. When you add a new rule, TiDB Cloud queries all the tables in TiDB and displays only the tables that match the rules in the **Tables to be replicated** column. You can add up to 20 filter rules.
- **Tables to be replicated**: this column shows the tables to be replicated. But it does not show the new tables to be replicated in the future or the schemas to be fully replicated.
- **Tables without valid keys**: this column shows tables without unique and primary keys. For these tables, because no unique identifier can be used by the downstream system to handle duplicate events, their data might be inconsistent during replication. To avoid such issues, it is recommended that you add unique keys or primary keys to these tables before the replication, or set filter rules to filter out these tables. For example, you can filter out the table `test.tbl1` using "!test.tbl1".
- **Filter Rules**: you can set filter rules in this column. By default, there is a rule `*.*`, which stands for replicating all tables. When you add a new rule, TiDB Cloud queries all the tables in TiDB and displays only the tables that match the rules in the box on the right. You can add up to 100 filter rules.
- **Tables with valid keys**: this column displays the tables that have valid keys, including primary keys or unique indexes.
- **Tables without valid keys**: this column shows tables that lack primary keys or unique keys. These tables present a challenge during replication because the absence of a unique identifier can result in inconsistent data when the downstream handles duplicate events. To ensure data consistency, it is recommended to add unique keys or primary keys to these tables before initiating the replication. Alternatively, you can add filter rules to exclude these tables. For example, you can exclude the table `test.tbl1` by using the rule `"!test.tbl1"`.

2. In the **Data Format** area, select your desired format of Kafka messages.
2. Customize **Event Filter** to filter the events that you want to replicate.

- **Tables matching**: you can set which tables the event filter will be applied to in this column. The rule syntax is the same as that used for the preceding **Table Filter** area. You can add up to 10 event filter rules per changefeed.
- **Ignored events**: you can set which types of events the event filter will exclude from the changefeed.

3. In the **Data Format** area, select your desired format of Kafka messages.

- Avro is a compact, fast, and binary data format with rich data structures, which is widely used in various flow systems. For more information, see [Avro data format](https://docs.pingcap.com/tidb/stable/ticdc-avro-protocol).
- Canal-JSON is a plain JSON text format, which is easy to parse. For more information, see [Canal-JSON data format](https://docs.pingcap.com/tidb/stable/ticdc-canal-json).

3. Enable the **TiDB Extension** option if you want to add TiDB-extension fields to the Kafka message body.
4. Enable the **TiDB Extension** option if you want to add TiDB-extension fields to the Kafka message body.

For more information about TiDB-extension fields, see [TiDB extension fields in Avro data format](https://docs.pingcap.com/tidb/stable/ticdc-avro-protocol#tidb-extension-fields) and [TiDB extension fields in Canal-JSON data format](https://docs.pingcap.com/tidb/stable/ticdc-canal-json#tidb-extension-field).

4. If you select **Avro** as your data format, you will see some Avro-specific configurations on the page. You can fill in these configurations as follows:
5. If you select **Avro** as your data format, you will see some Avro-specific configurations on the page. You can fill in these configurations as follows:

- In the **Decimal** and **Unsigned BigInt** configurations, specify how TiDB Cloud handles the decimal and unsigned bigint data types in Kafka messages.
- In the **Schema Registry** area, fill in your schema registry endpoint. If you enable **HTTP Authentication**, the fields for user name and password are displayed and automatically filled in with your TiDB cluster endpoint and password.

5. In the **Topic Distribution** area, select a distribution mode, and then fill in the topic name configurations according to the mode.
6. In the **Topic Distribution** area, select a distribution mode, and then fill in the topic name configurations according to the mode.

If you select **Avro** as your data format, you can only choose the **Distribute changelogs by table to Kafka Topics** mode in the **Distribution Mode** drop-down list.

Expand All @@ -120,7 +125,7 @@ For example, if your Kafka cluster is in Confluent Cloud, you can see [Resources

If you want the changefeed to create one Kafka topic for all changelogs, choose this mode. Then, all Kafka messages in the changefeed will be sent to one Kafka topic. You can define the topic name in the **Topic Name** field.

6. In the **Partition Distribution** area, you can decide which partition a Kafka message will be sent to:
7. In the **Partition Distribution** area, you can decide which partition a Kafka message will be sent to:

- **Distribute changelogs by index value to Kafka partition**

Expand All @@ -130,12 +135,12 @@ For example, if your Kafka cluster is in Confluent Cloud, you can see [Resources

If you want the changefeed to send Kafka messages of a table to one Kafka partition, choose this distribution method. The table name of a row changelog will determine which partition the changelog is sent to. This distribution method ensures table orderliness but might cause unbalanced partitions.

7. In the **Topic Configuration** area, configure the following numbers. The changefeed will automatically create the Kafka topics according to the numbers.
8. In the **Topic Configuration** area, configure the following numbers. The changefeed will automatically create the Kafka topics according to the numbers.

- **Replication Factor**: controls how many Kafka servers each Kafka message is replicated to.
- **Partition Number**: controls how many partitions exist in a topic.

8. Click **Next**.
9. Click **Next**.

## Step 4. Configure your changefeed specification

Expand Down
15 changes: 10 additions & 5 deletions tidb-cloud/changefeed-sink-to-cloud-storage.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ This document describes how to create a changefeed to stream data from TiDB Clou

## Restrictions

- For each TiDB Cloud cluster, you can create up to 5 changefeeds.
- For each TiDB Cloud cluster, you can create up to 100 changefeeds.
- Because TiDB Cloud uses TiCDC to establish changefeeds, it has the same [restrictions as TiCDC](https://docs.pingcap.com/tidb/stable/ticdc-overview#unsupported-scenarios).
- If the table to be replicated does not have a primary key or a non-null unique index, the absence of a unique constraint during replication could result in duplicated data being inserted downstream in some retry scenarios.

Expand Down Expand Up @@ -98,17 +98,22 @@ Click **Next** to establish the connection from the TiDB Dedicated cluster to Am

![the table filter of changefeed](/media/tidb-cloud/changefeed/sink-to-s3-02-table-filter.jpg)

- **Filter Rules**: you can set filter rules in this column. By default, there is a rule `*.*`, which stands for replicating all tables. When you add a new rule, TiDB Cloud queries all the tables in TiDB and displays only the tables that match the rules in the box on the right. You can add up to 20 filter rules.
- **Filter Rules**: you can set filter rules in this column. By default, there is a rule `*.*`, which stands for replicating all tables. When you add a new rule, TiDB Cloud queries all the tables in TiDB and displays only the tables that match the rules in the box on the right. You can add up to 100 filter rules.
- **Tables with valid keys**: this column displays the tables that have valid keys, including primary keys or unique indexes.
- **Tables without valid keys**: this column shows tables that lack primary keys or unique keys. These tables present a challenge during replication because the absence of a unique identifier can result in inconsistent data when handling duplicate events downstream. To ensure data consistency, it is recommended to add unique keys or primary keys to these tables before initiating the replication. Alternatively, you can employ filter rules to exclude these tables. For example, you can exclude the table `test.tbl1` by using the rule `"!test.tbl1"`.

2. In the **Start Replication Position** area, select one of the following replication positions:
2. Customize **Event Filter** to filter the events that you want to replicate.

- **Tables matching**: you can set which tables the event filter will be applied to in this column. The rule syntax is the same as that used for the preceding **Table Filter** area. You can add up to 10 event filter rules per changefeed.
- **Ignored events**: you can set which types of events the event filter will exclude from the changefeed.

3. In the **Start Replication Position** area, select one of the following replication positions:

- Start replication from now on
- Start replication from a specific [TSO](https://docs.pingcap.com/tidb/stable/glossary#tso)
- Start replication from a specific time

3. In the **Data Format** area, select either the **CSV** or **Canal-JSON** format.
4. In the **Data Format** area, select either the **CSV** or **Canal-JSON** format.

<SimpleTab>
<div label="Configure CSV format">
Expand All @@ -133,7 +138,7 @@ Click **Next** to establish the connection from the TiDB Dedicated cluster to Am
</div>
</SimpleTab>

4. In the **Flush Parameters** area, you can configure two items:
5. In the **Flush Parameters** area, you can configure two items:

- **Flush Interval**: set to 60 seconds by default, adjustable within a range of 2 seconds to 10 minutes;
- **File Size**: set to 64 MB by default, adjustable within a range of 1 MB to 512 MB.
Expand Down
25 changes: 15 additions & 10 deletions tidb-cloud/changefeed-sink-to-mysql.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ This document describes how to stream data from TiDB Cloud to MySQL using the **

## Restrictions

- For each TiDB Cloud cluster, you can create up to 5 changefeeds.
- For each TiDB Cloud cluster, you can create up to 100 changefeeds.
- Because TiDB Cloud uses TiCDC to establish changefeeds, it has the same [restrictions as TiCDC](https://docs.pingcap.com/tidb/stable/ticdc-overview#unsupported-scenarios).
- If the table to be replicated does not have a primary key or a non-null unique index, the absence of a unique constraint during replication could result in duplicated data being inserted downstream in some retry scenarios.

Expand Down Expand Up @@ -104,32 +104,37 @@ After completing the prerequisites, you can sink your data to MySQL.

5. Customize **Table Filter** to filter the tables that you want to replicate. For the rule syntax, refer to [table filter rules](/table-filter.md).

- **Add filter rules**: you can set filter rules in this column. By default, there is a rule `*. *`, which stands for replicating all tables. When you add a new rule, TiDB Cloud queries all the tables in TiDB and displays only the tables that match the rules in the box on the right. You can add up to 20 filter rules.
- **Tables to be replicated**: this column shows the tables to be replicated. But it does not show the new tables to be replicated in the future or the schemas to be fully replicated.
- **Tables without valid keys**: this column shows tables without unique and primary keys. For these tables, because no unique identifier can be used by the downstream system to handle duplicate events, their data might be inconsistent during replication. To avoid such issues, it is recommended that you add unique keys or primary keys to these tables before the replication, or set filter rules to filter out these tables. For example, you can filter out the table `test.tbl1` using "!test.tbl1".
- **Filter Rules**: you can set filter rules in this column. By default, there is a rule `*.*`, which stands for replicating all tables. When you add a new rule, TiDB Cloud queries all the tables in TiDB and displays only the tables that match the rules in the box on the right. You can add up to 100 filter rules.
- **Tables with valid keys**: this column displays the tables that have valid keys, including primary keys or unique indexes.
- **Tables without valid keys**: this column shows tables that lack primary keys or unique keys. These tables present a challenge during replication because the absence of a unique identifier can result in inconsistent data when the downstream handles duplicate events. To ensure data consistency, it is recommended to add unique keys or primary keys to these tables before initiating the replication. Alternatively, you can add filter rules to exclude these tables. For example, you can exclude the table `test.tbl1` by using the rule `"!test.tbl1"`.

6. In **Start Position**, configure the starting position for your MySQL sink.
6. Customize **Event Filter** to filter the events that you want to replicate.

- **Tables matching**: you can set which tables the event filter will be applied to in this column. The rule syntax is the same as that used for the preceding **Table Filter** area. You can add up to 10 event filter rules per changefeed.
- **Ignored events**: you can set which types of events the event filter will exclude from the changefeed.

7. In **Start Replication Position**, configure the starting position for your MySQL sink.

- If you have [loaded the existing data](#load-existing-data-optional) using Dumpling, select **Start replication from a specific TSO** and fill in the TSO that you get from Dumpling exported metadata files.
- If you do not have any data in the upstream TiDB cluster, select **Start replication from now on**.
- Otherwise, you can customize the start time point by choosing **Start replication from a specific time**.

7. Click **Next** to configure your changefeed specification.
8. Click **Next** to configure your changefeed specification.

- In the **Changefeed Specification** area, specify the number of Replication Capacity Units (RCUs) to be used by the changefeed.
- In the **Changefeed Name** area, specify a name for the changefeed.

8. Click **Next** to review the changefeed configuration.
9. Click **Next** to review the changefeed configuration.

If you confirm all configurations are correct, check the compliance of cross-region replication, and click **Create**.
If you confirm that all configurations are correct, check the compliance of cross-region replication, and click **Create**.

If you want to modify some configurations, click **Previous** to go back to the previous configuration page.

9. The sink starts soon, and you can see the status of the sink changes from "**Creating**" to "**Running**".
10. The sink starts soon, and you can see the status of the sink changes from **Creating** to **Running**.

Click the changefeed name, and you can see more details about the changefeed, such as the checkpoint, replication latency, and other metrics.

10. If you have [loaded the existing data](#load-existing-data-optional) using Dumpling, you need to restore the GC time to its original value (the default value is `10m`) after the sink is created:
11. If you have [loaded the existing data](#load-existing-data-optional) using Dumpling, you need to restore the GC time to its original value (the default value is `10m`) after the sink is created:

{{< copyable "sql" >}}

Expand Down
Loading