Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cloud: DM supports physical mode #14993

Merged
merged 12 commits into from
Oct 11, 2023
40 changes: 36 additions & 4 deletions tidb-cloud/migrate-from-mysql-using-data-migration.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,25 +14,35 @@ If you want to migrate incremental data only, see [Migrate Incremental Data from

## Limitations

### Availability

- The Data Migration feature is available only for **TiDB Dedicated** clusters.

- The Data Migration feature is only available to clusters that are created in [certain regions](https://www.pingcap.com/tidb-cloud-pricing-details/#dm-cost) after November 9, 2022. If your **project** was created before the date or if your cluster is in another region, this feature is not available to your cluster and the **Data Migration** tab will not be displayed on the cluster overview page in the TiDB Cloud console.

- Amazon Aurora MySQL writer instances support both existing data and incremental data migration. Amazon Aurora MySQL reader instances only support existing data migration and do not support incremental data migration.

### Maximum number of migration jobs

- You can create up to 200 migration jobs for each organization. To create more migration jobs, you need to [file a support ticket](/tidb-cloud/tidb-cloud-support.md).
hfxsd marked this conversation as resolved.
Show resolved Hide resolved

### Filtered out and deleted databases

- The system databases will be filtered out and not migrated to TiDB Cloud even if you select all of the databases to migrate. That is, `mysql`, `information_schema`, `information_schema`, and `sys` will not be migrated using this feature.

- When you delete a cluster in TiDB Cloud, all migration jobs in that cluster are automatically deleted and not recoverable.

### Limitations of existing data migration

- During existing data migration, if the table to be migrated already exists in the target database with duplicated keys, the duplicate keys will be replaced.

- During incremental data migration, if the table to be migrated already exists in the target database with duplicated keys, an error is reported and the migration is interrupted. In this situation, you need to make sure whether the upstream data is accurate. If yes, click the "Restart" button of the migration job and the migration job will replace the downstream conflicting records with the upstream records.
- If your dataset size is smaller than 1 TiB, it is recommended that you use logical mode (the default mode). If your dataset size is larger than 1 TiB, or you want to migrate existing data faster, you can use physical mode. For more information, see [Migrate existing data and incremental data](#migrate-existing-data-and-incremental-data).
hfxsd marked this conversation as resolved.
Show resolved Hide resolved

- When you delete a cluster in TiDB Cloud, all migration jobs in that cluster are automatically deleted and not recoverable.
### Limitations of incremental data migration

- During incremental replication (migrating ongoing changes to your cluster), if the migration job recovers from an abrupt error, it might open the safe mode for 60 seconds. During the safe mode, `INSERT` statements are migrated as `REPLACE`, `UPDATE` statements as `DELETE` and `REPLACE`, and then these transactions are migrated to the downstream cluster to make sure that all the data during the abrupt error has been migrated smoothly to the downstream cluster. In this scenario, for upstream tables without primary keys or not-null unique indexes, some data might be duplicated in the downstream cluster because the data might be inserted repeatedly to the downstream.
- During incremental data migration, if the table to be migrated already exists in the target database with duplicated keys, an error is reported and the migration is interrupted. In this situation, you need to make sure whether the upstream data is accurate. If yes, click the "Restart" button of the migration job and the migration job will replace the downstream conflicting records with the upstream records.

- When you use Data Migration, it is recommended to keep the size of your dataset smaller than 1 TiB. If the dataset size is larger than 1 TiB, the existing data migration will take a long time due to limited specifications.
- During incremental replication (migrating ongoing changes to your cluster), if the migration job recovers from an abrupt error, it might open the safe mode for 60 seconds. During the safe mode, `INSERT` statements are migrated as `REPLACE`, `UPDATE` statements as `DELETE` and `REPLACE`, and then these transactions are migrated to the downstream cluster to make sure that all the data during the abrupt error has been migrated smoothly to the downstream cluster. In this scenario, for upstream tables without primary keys or not-null unique indexes, some data might be duplicated in the downstream cluster because the data might be inserted repeatedly to the downstream.

- In the following scenarios, if the migration job takes longer than 24 hours, do not purge binary logs in the source database to ensure that Data Migration can get consecutive binary logs for incremental replication:

Expand Down Expand Up @@ -195,10 +205,32 @@ In the **Choose the objects to be migrated** step, you can choose existing data

To migrate data to TiDB Cloud once and for all, choose both **Existing data migration** and **Incremental data migration**, which ensures data consistency between the source and target databases.

You can use [physical mode](https://docs.pingcap.com/tidb/stable/tidb-lightning-physical-import-mode) or [logical mode](https://docs.pingcap.com/tidb/stable/tidb-lightning-logical-import-mode) to migrate **existing data**.

- The default mode is **Logical mode**. This mode supports migrating data into a table with existing data. But the performance is slower than physical mode.

- It is recommended to use **Physical mode** for large datasets. When you use this mode, the target table must be empty. For the specification of 16RCU, the performance is about 2.5 times faster than logical mode. The performance of other specifications can also increase by 20% to 50% compared with logical mode. Note that the performance data is for reference only and might vary in different scenarios.
hfxsd marked this conversation as resolved.
Show resolved Hide resolved

There are limitations for using physical mode:

- When you use physical mode, you cannot create a second migration job or import task for the TiDB cluster before the existing data migration is completed.
- Physical mode is available only for TiDB clusters in AWS regions.

Physical mode exports the upstream data as fast as possible, so [different specifications](/tidb-cloud/tidb-cloud-billing-dm.md#specifications-for-data-migration) have different performance impacts on QPS and TPS of the upstream database during data export. The following table shows the performance regression of each specification.
hfxsd marked this conversation as resolved.
Show resolved Hide resolved

| Migration specifications | Maximum Export speed | Performance regression of the upstream database |
|---------|-------------|--------|
| 2RCU | 80.84 MB/s | 15.6% |
| 4RCU | 214.2 MB/s | 20.0% |
| 8RCU | 365.5 MB/s | 28.9% |
| 16RCU | 424.6 MB/s | 46.7% |
hfxsd marked this conversation as resolved.
Show resolved Hide resolved

### Migrate only existing data

To migrate only existing data of the source database to TiDB Cloud, choose **Existing data migration**.

You can choose to use [physical mode](https://docs.pingcap.com/tidb/stable/tidb-lightning-physical-import-mode) or [logical mode](https://docs.pingcap.com/tidb/stable/tidb-lightning-logical-import-mode) to migrate existing data. For more information, see [Migrate existing data and incremental data](#migrate-existing-data-and-incremental-data).

### Migrate only incremental data

To migrate only the incremental data of the source database to TiDB Cloud, choose **Incremental data migration**. In this case, the migration job does not migrate the existing data of the source database to TiDB Cloud, but only migrates the ongoing changes of the source database that are explicitly specified by the migration job.
Expand Down