diff --git a/TOC.md b/TOC.md
index c8e15684ed435..281e8e371f46c 100644
--- a/TOC.md
+++ b/TOC.md
@@ -991,6 +991,7 @@
- [Error Codes](/error-codes.md)
- [Table Filter](/table-filter.md)
- [Schedule Replicas by Topology Labels](/schedule-replicas-by-topology-labels.md)
+ - [URI Formats of External Storage Services](/external-storage-uri.md)
- Internal Components
- [TiDB Backend Task Distributed Execution Framework](/tidb-distributed-execution-framework.md)
- FAQs
diff --git a/best-practices/pd-scheduling-best-practices.md b/best-practices/pd-scheduling-best-practices.md
index 03ecfbc0e8129..2a3f28e757b03 100644
--- a/best-practices/pd-scheduling-best-practices.md
+++ b/best-practices/pd-scheduling-best-practices.md
@@ -228,7 +228,7 @@ This scenario requires examining the generation and execution of operators throu
If operators are successfully generated but the scheduling process is slow, possible reasons are:
-- The scheduling speed is limited by default. You can adjust `leader-schedule-limit` or `replica-schedule-limit` to larger value.s Similarly, you can consider loosening the limits on `max-pending-peer-count` and `max-snapshot-count`.
+- The scheduling speed is limited by default. You can adjust `leader-schedule-limit` or `replica-schedule-limit` to larger values. Similarly, you can consider loosening the limits on `max-pending-peer-count` and `max-snapshot-count`.
- Other scheduling tasks are running concurrently and racing for resources in the system. You can refer to the solution in [Leaders/regions are not evenly distributed](#leadersregions-are-not-evenly-distributed).
- When you take a single node offline, a number of region leaders to be processed (around 1/3 under the configuration of 3 replicas) are distributed on the node to remove. Therefore, the speed is limited by the speed at which snapshots are generated by this single node. You can speed it up by manually adding an `evict-leader-scheduler` to migrate leaders.
diff --git a/br/backup-and-restore-overview.md b/br/backup-and-restore-overview.md
index 1482f2e6bc6e7..3b2cee357a07f 100644
--- a/br/backup-and-restore-overview.md
+++ b/br/backup-and-restore-overview.md
@@ -99,7 +99,7 @@ Corresponding to the backup features, you can perform two types of restore: full
TiDB supports backing up data to Amazon S3, Google Cloud Storage (GCS), Azure Blob Storage, NFS, and other S3-compatible file storage services. For details, see the following documents:
-- [Specify backup storage in URI](/br/backup-and-restore-storages.md#uri-format)
+- [Specify backup storage in URI](/external-storage-uri.md)
- [Configure access privileges to backup storages](/br/backup-and-restore-storages.md#authentication)
## Compatibility
diff --git a/br/br-incremental-guide.md b/br/br-incremental-guide.md
index 098547c63fa50..03e29d2f893d6 100644
--- a/br/br-incremental-guide.md
+++ b/br/br-incremental-guide.md
@@ -30,7 +30,7 @@ tiup br backup full --pd "${PD_IP}:2379" \
- `--lastbackupts`: The last backup timestamp.
- `--ratelimit`: The maximum speed **per TiKV** performing backup tasks (in MiB/s).
-- `storage`: The storage path of backup data. You need to save the incremental backup data under a different path from the previous snapshot backup. In the preceding example, incremental backup data is saved in the `incr` directory under the full backup data. For details, see [Backup storage URI configuration](/br/backup-and-restore-storages.md#uri-format).
+- `storage`: The storage path of backup data. You need to save the incremental backup data under a different path from the previous snapshot backup. In the preceding example, incremental backup data is saved in the `incr` directory under the full backup data. For details, see [URI Formats of External Storage Services](/external-storage-uri.md).
## Restore incremental data
diff --git a/br/br-pitr-manual.md b/br/br-pitr-manual.md
index c57e37f31cc28..3aa04c5015646 100644
--- a/br/br-pitr-manual.md
+++ b/br/br-pitr-manual.md
@@ -77,7 +77,7 @@ The example output only shows the common parameters. These parameters are descri
- `task-name`: specifies the task name for the log backup. This name is also used to query, pause, and resume the backup task.
- `--ca`, `--cert`, `--key`: specifies the mTLS encryption method to communicate with TiKV and PD.
- `--pd`: specifies the PD address for the backup cluster. BR needs to access PD to start the log backup task.
-- `--storage`: specifies the backup storage address. Currently, BR supports Amazon S3, Google Cloud Storage (GCS), or Azure Blob Storage as the storage for log backup. The preceding command uses Amazon S3 as an example. For details, see [URI format of backup storages](/br/backup-and-restore-storages.md#uri-format).
+- `--storage`: specifies the backup storage address. Currently, BR supports Amazon S3, Google Cloud Storage (GCS), or Azure Blob Storage as the storage for log backup. The preceding command uses Amazon S3 as an example. For details, see [URI Formats of External Storage Services](/external-storage-uri.md).
Usage example:
@@ -283,7 +283,7 @@ This command only accesses the backup storage and does not access the TiDB clust
- `--dry-run`: run the command but do not really delete the files.
- `--until`: delete all log backup data before the specified timestamp.
-- `--storage`: the backup storage address. Currently, BR supports Amazon S3, GCS, or Azure Blob Storage as the storage for log backup. For details, see [URI format of backup storages](/br/backup-and-restore-storages.md#uri-format).
+- `--storage`: the backup storage address. Currently, BR supports Amazon S3, GCS, or Azure Blob Storage as the storage for log backup. For details, see [URI Formats of External Storage Services](/external-storage-uri.md).
Usage example:
@@ -324,7 +324,7 @@ Global Flags:
This command only accesses the backup storage and does not access the TiDB cluster.
-The `--storage` parameter is used to specify the backup storage address. Currently, BR supports Amazon S3, GCS, or Azure Blob Storage as the storage for log backup. For details, see [URI format of backup storages](/br/backup-and-restore-storages.md#uri-format).
+The `--storage` parameter is used to specify the backup storage address. Currently, BR supports Amazon S3, GCS, or Azure Blob Storage as the storage for log backup. For details, see [URI Formats of External Storage Services](/external-storage-uri.md).
Usage example:
@@ -368,12 +368,12 @@ Global Flags:
The example output only shows the common parameters. These parameters are described as follows:
-- `--full-backup-storage`: the storage address for the snapshot (full) backup. To use PITR, specify this parameter and choose the latest snapshot backup before the restore timestamp. To restore only log backup data, you can omit this parameter. Currently, BR supports Amazon S3, GCS, or Azure Blob Storage as the storage for log backup. For details, see [URI format of backup storages](/br/backup-and-restore-storages.md#uri-format).
+- `--full-backup-storage`: the storage address for the snapshot (full) backup. To use PITR, specify this parameter and choose the latest snapshot backup before the restore timestamp. To restore only log backup data, you can omit this parameter. Currently, BR supports Amazon S3, GCS, or Azure Blob Storage as the storage for log backup. For details, see [URI Formats of External Storage Services](/external-storage-uri.md).
- `--restored-ts`: the timestamp that you want to restore data to. If this parameter is not specified, BR restores data to the latest timestamp available in the log backup, that is, the checkpoint of the backup data.
- `--start-ts`: the start timestamp that you want to restore log backup data from. If you only need to restore log backup data, you must specify this parameter.
- `--pd`: the PD address of the restore cluster.
- `--ca`, `--cert`, `--key`: specify the mTLS encryption method to communicate with TiKV and PD.
-- `--storage`: the storage address for the log backup. Currently, BR supports Amazon S3, GCS, or Azure Blob Storage as the storage for log backup. For details, see [URI format of backup storages](/br/backup-and-restore-storages.md#uri-format).
+- `--storage`: the storage address for the log backup. Currently, BR supports Amazon S3, GCS, or Azure Blob Storage as the storage for log backup. For details, see [URI Formats of External Storage Services](/external-storage-uri.md).
Usage example:
diff --git a/br/br-snapshot-guide.md b/br/br-snapshot-guide.md
index 2e697836f322a..556af5d47ac2d 100644
--- a/br/br-snapshot-guide.md
+++ b/br/br-snapshot-guide.md
@@ -33,7 +33,7 @@ tiup br backup full --pd "${PD_IP}:2379" \
In the preceding command:
- `--backupts`: The time point of the snapshot. The format can be [TSO](/glossary.md#tso) or timestamp, such as `400036290571534337` or `2018-05-11 01:42:23`. If the data of this snapshot is garbage collected, the `br backup` command returns an error and `br` exits. If you leave this parameter unspecified, `br` picks the snapshot corresponding to the backup start time.
-- `--storage`: The storage address of the backup data. Snapshot backup supports Amazon S3, Google Cloud Storage, and Azure Blob Storage as backup storage. The preceding command uses Amazon S3 as an example. For more details, see [URI format of backup storages](/br/backup-and-restore-storages.md#uri-format).
+- `--storage`: The storage address of the backup data. Snapshot backup supports Amazon S3, Google Cloud Storage, and Azure Blob Storage as backup storage. The preceding command uses Amazon S3 as an example. For more details, see [URI Formats of External Storage Services](/external-storage-uri.md).
- `--ratelimit`: The maximum speed **per TiKV** performing backup tasks. The unit is in MiB/s.
During backup, a progress bar is displayed in the terminal as shown below. When the progress bar advances to 100%, the backup task is completed and statistics such as total backup time, average backup speed, and backup data size are displayed.
diff --git a/br/use-br-command-line-tool.md b/br/use-br-command-line-tool.md
index 15dbbef486578..c9bd3da4f16d9 100644
--- a/br/use-br-command-line-tool.md
+++ b/br/use-br-command-line-tool.md
@@ -42,7 +42,7 @@ A `br` command consists of multiple layers of sub-commands. Currently, br comman
### Common options
* `--pd`: specifies the PD service address. For example, `"${PD_IP}:2379"`.
-* `-s` (or `--storage`): specifies the path where the backup files are stored. Amazon S3, Google Cloud Storage (GCS), Azure Blob Storage, and NFS are supported to store backup data. For more details, refer to [URI format of backup storages](/br/backup-and-restore-storages.md#uri-format).
+* `-s` (or `--storage`): specifies the path where the backup files are stored. Amazon S3, Google Cloud Storage (GCS), Azure Blob Storage, and NFS are supported to store backup data. For more details, refer to [URI Formats of External Storage Services](/external-storage-uri.md).
* `--ca`: specifies the path to the trusted CA certificate in the PEM format.
* `--cert`: specifies the path to the SSL certificate in the PEM format.
* `--key`: specifies the path to the SSL certificate key in the PEM format.
diff --git a/dumpling-overview.md b/dumpling-overview.md
index aa2b13ae87e58..306f309cb3852 100644
--- a/dumpling-overview.md
+++ b/dumpling-overview.md
@@ -94,19 +94,7 @@ dumpling -u root -P 4000 -h 127.0.0.1 --filetype sql -t 8 -o /tmp/test -r 200000
In the command above:
+ The `-h`, `-P`, and `-u` option respectively mean the address, the port, and the user. If a password is required for authentication, you can use `-p $YOUR_SECRET_PASSWORD` to pass the password to Dumpling.
-
-
-
-+ The `-o` (or `--output`) option specifies the export directory of the storage, which supports an absolute local file path or an [external storage URI](/br/backup-and-restore-storages.md#uri-format).
-
-
-
-
-
-+ The `-o` (or `--output`) option specifies the export directory of the storage, which supports an absolute local file path or an [external storage URI](https://docs.pingcap.com/tidb/stable/backup-and-restore-storages#uri-format).
-
-
-
++ The `-o` (or `--output`) option specifies the export directory of the storage, which supports an absolute local file path or an [external storage URI](/external-storage-uri.md).
+ The `-t` option specifies the number of threads for the export. Increasing the number of threads improves the concurrency of Dumpling and the export speed, and also increases the database's memory consumption. Therefore, it is not recommended to set the number too large. Usually, it's less than 64.
+ The `-r` option enables the in-table concurrency to speed up the export. The default value is `0`, which means disabled. A value greater than 0 means it is enabled, and the value is of `INT` type. When the source database is TiDB, a `-r` value greater than 0 indicates that the TiDB region information is used for splitting, and reduces the memory usage. The specific `-r` value does not affect the split algorithm. When the source database is MySQL and the primary key is of the `INT` type, specifying `-r` can also enable the in-table concurrency.
+ The `-F` option is used to specify the maximum size of a single file (the unit here is `MiB`; inputs like `5GiB` or `8KB` are also acceptable). It is recommended to keep its value to 256 MiB or less if you plan to use TiDB Lightning to load this file into a TiDB instance.
@@ -115,6 +103,16 @@ In the command above:
>
> If the size of a single exported table exceeds 10 GB, it is **strongly recommended to use** the `-r` and `-F` options.
+#### URI formats of the storage services
+
+This section describes the URI formats of the storage services, including Amazon S3, GCS, and Azure Blob Storage. The URI format is as follows:
+
+```shell
+[scheme]://[host]/[path]?[parameters]
+```
+
+For more information, see [URI Formats of External Storage Services](/external-storage-uri.md).
+
### Export to CSV files
You can export data to CSV files by adding the `--filetype csv` argument.
@@ -232,19 +230,7 @@ export AWS_ACCESS_KEY_ID=${AccessKey}
export AWS_SECRET_ACCESS_KEY=${SecretKey}
```
-
-
-Dumpling also supports reading credential files from `~/.aws/credentials`. Parameters for exporting data to Amazon S3 using Dumpling are the same as the parameters used in BR. For more parameter descriptions, see [external storage URI](/br/backup-and-restore-storages.md#uri-format).
-
-
-
-
-
-Dumpling also supports reading credential files from `~/.aws/credentials`. Parameters for exporting data to Amazon S3 using Dumpling are the same as the parameters used in BR. For more parameter descriptions, see [external storage URI](https://docs.pingcap.com/tidb/stable/backup-and-restore-storages#uri-format).
-
-
-
-{{< copyable "shell-regular" >}}
+Dumpling also supports reading credential files from `~/.aws/credentials`. For more information about URI parameter descriptions, see [URI Formats of External Storage Services](/external-storage-uri.md).
```shell
./dumpling -u root -P 4000 -h 127.0.0.1 -r 200000 -o "s3://${Bucket}/${Folder}"
@@ -256,8 +242,6 @@ Dumpling also supports reading credential files from `~/.aws/credentials`. Param
By default, Dumpling exports all databases except system databases (including `mysql`, `sys`, `INFORMATION_SCHEMA`, `PERFORMANCE_SCHEMA`, `METRICS_SCHEMA`, and `INSPECTION_SCHEMA`). You can use `--where ` to select the records to be exported.
-{{< copyable "shell-regular" >}}
-
```shell
./dumpling -u root -P 4000 -h 127.0.0.1 -o /tmp/test --where "id < 100"
```
@@ -401,7 +385,7 @@ SET GLOBAL tidb_gc_life_time = '10m';
| `-s` or `--statement-size` | Control the size of the `INSERT` statements; the unit is bytes |
| `-F` or `--filesize` | The file size of the divided tables. The unit must be specified such as `128B`, `64KiB`, `32MiB`, and `1.5GiB`. |
| `--filetype` | Exported file type (csv/sql) | "sql" |
-| `-o` or `--output` | Specify the absolute local file path or [external storage URI](https://docs.pingcap.com/tidb/stable/backup-and-restore-storages#uri-format) for exporting the data. | "./export-${time}" |
+| `-o` or `--output` | Specify the absolute local file path or [external storage URI](/external-storage-uri.md) for exporting the data. | "./export-${time}" |
| `-S` or `--sql` | Export data according to the specified SQL statement. This command does not support concurrent export. |
| `--consistency` | flush: use FTWRL before the dump
snapshot: dump the TiDB data of a specific snapshot of a TSO
lock: execute `lock tables read` on all tables to be dumped
none: dump without adding locks, which cannot guarantee consistency
auto: use --consistency flush for MySQL; use --consistency snapshot for TiDB | "auto" |
| `--snapshot` | Snapshot TSO; valid only when `consistency=snapshot` |
diff --git a/error-codes.md b/error-codes.md
index 20c46da0497f6..78b53417c69c9 100644
--- a/error-codes.md
+++ b/error-codes.md
@@ -371,7 +371,7 @@ TiDB is compatible with the error codes in MySQL, and in most cases returns the
* Error Number: 8158
- The provided path is invalid. Refer to the specific error message for actions. For Amazon S3 or GCS path settings, see [External storage](/br/backup-and-restore-storages.md#uri-format).
+ The provided path is invalid. Refer to the specific error message for actions. For Amazon S3 or GCS path settings, see [URI Formats of External Storage Services](/external-storage-uri.md).
* Error Number: 8159
diff --git a/external-storage-uri.md b/external-storage-uri.md
new file mode 100644
index 0000000000000..b9ddf28d1c007
--- /dev/null
+++ b/external-storage-uri.md
@@ -0,0 +1,92 @@
+---
+title: URI Formats of External Storage Services
+summary: Learn about the storage URI formats of external storage services, including Amazon S3, GCS, and Azure Blob Storage.
+---
+
+## URI Formats of External Storage Services
+
+This document describes the URI formats of external storage services, including Amazon S3, GCS, and Azure Blob Storage.
+
+The basic format of the URI is as follows:
+
+```shell
+[scheme]://[host]/[path]?[parameters]
+```
+
+## Amazon S3 URI format
+
+- `scheme`: `s3`
+- `host`: `bucket name`
+- `parameters`:
+
+ - `access-key`: Specifies the access key.
+ - `secret-access-key`: Specifies the secret access key.
+ - `session-token`: Specifies the temporary session token. BR does not support this parameter yet.
+ - `use-accelerate-endpoint`: Specifies whether to use the accelerate endpoint on Amazon S3 (defaults to `false`).
+ - `endpoint`: Specifies the URL of custom endpoint for S3-compatible services (for example, ``).
+ - `force-path-style`: Use path style access rather than virtual hosted style access (defaults to `true`).
+ - `storage-class`: Specifies the storage class of the uploaded objects (for example, `STANDARD` or `STANDARD_IA`).
+ - `sse`: Specifies the server-side encryption algorithm used to encrypt the uploaded objects (value options: ``, `AES256`, or `aws:kms`).
+ - `sse-kms-key-id`: Specifies the KMS ID if `sse` is set to `aws:kms`.
+ - `acl`: Specifies the canned ACL of the uploaded objects (for example, `private` or `authenticated-read`).
+ - `role-arn`: When you need to access Amazon S3 data from a third party using a specified [IAM role](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles.html), you can specify the corresponding [Amazon Resource Name (ARN)](https://docs.aws.amazon.com/general/latest/gr/aws-arns-and-namespaces.html) of the IAM role with the `role-arn` URL query parameter, such as `arn:aws:iam::888888888888:role/my-role`. For more information about using an IAM role to access Amazon S3 data from a third party, see [AWS documentation](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_common-scenarios_third-party.html).
+ - `external-id`: When you access Amazon S3 data from a third party, you might need to specify a correct [external ID](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-user_externalid.html) to assume [the IAM role](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles.html). In this case, you can use this `external-id` URL query parameter to specify the external ID and make sure that you can assume the IAM role. An external ID is an arbitrary string provided by the third party together with the IAM role ARN to access the Amazon S3 data. Providing an external ID is optional when assuming an IAM role, which means if the third party does not require an external ID for the IAM role, you can assume the IAM role and access the corresponding Amazon S3 data without providing this parameter.
+
+The following is an example of an Amazon S3 URI for TiDB Lightning and BR. In this example, you need to specify a specific file path `testfolder`.
+
+```shell
+s3://external/testfolder?access-key=${access-key}&secret-access-key=${secret-access-key}
+```
+
+The following is an example of an Amazon S3 URI for [`IMPORT INTO`](/sql-statements/sql-statement-import-into.md). In this example, you need to specify a specific filename `test.csv`.
+
+```shell
+s3://external/test.csv?access-key=${access-key}&secret-access-key=${secret-access-key}"
+```
+
+## GCS URI format
+
+- `scheme`: `gcs` or `gs`
+- `host`: `bucket name`
+- `parameters`:
+
+ - `credentials-file`: Specifies the path to the credentials JSON file on the migration tool node.
+ - `storage-class`: Specifies the storage class of the uploaded objects (for example, `STANDARD` or `COLDLINE`)
+ - `predefined-acl`: Specifies the predefined ACL of the uploaded objects (for example, `private` or `project-private`)
+
+The following is an example of a GCS URI for TiDB Lightning and BR. In this example, you need to specify a specific file path `testfolder`.
+
+```shell
+gcs://external/testfolder?credentials-file=${credentials-file-path}
+```
+
+The following is an example of a GCS URI for [`IMPORT INTO`](/sql-statements/sql-statement-import-into.md). In this example, you need to specify a specific filename `test.csv`.
+
+```shell
+gcs://external/test.csv?credentials-file=${credentials-file-path}
+```
+
+## Azure Blob Storage URI format
+
+- `scheme`: `azure` or `azblob`
+- `host`: `container name`
+- `parameters`:
+
+ - `account-name`: Specifies the account name of the storage.
+ - `account-key`: Specifies the access key.
+ - `sas-token`: Specifies the shared access signature (SAS) token.
+ - `access-tier`: Specifies the access tier of the uploaded objects, for example, `Hot`, `Cool`, or `Archive`. The default value is the default access tier of the storage account.
+ - `encryption-scope`: Specifies the [encryption scope](https://learn.microsoft.com/en-us/azure/storage/blobs/encryption-scope-manage?tabs=powershell#upload-a-blob-with-an-encryption-scope) for server-side encryption.
+ - `encryption-key`: Specifies the [encryption key](https://learn.microsoft.com/en-us/azure/storage/blobs/encryption-customer-provided-keys) for server-side encryption, which uses the AES256 encryption algorithm.
+
+The following is an example of an Azure Blob Storage URI for TiDB Lightning and BR. In this example, you need to specify a specific file path `testfolder`.
+
+```shell
+azure://external/testfolder?account-name=${account-name}&account-key=${account-key}
+```
+
+The following is an example of an Azure Blob Storage URI for [`IMPORT INTO`](/sql-statements/sql-statement-import-into.md). In this example, you need to specify a specific filename `test.csv`.
+
+```shell
+azure://external/test.csv?account-name=${account-name}&account-key=${account-key}
+```
\ No newline at end of file
diff --git a/migrate-large-mysql-shards-to-tidb.md b/migrate-large-mysql-shards-to-tidb.md
index b5ff486c3c747..b5f3513e322f6 100644
--- a/migrate-large-mysql-shards-to-tidb.md
+++ b/migrate-large-mysql-shards-to-tidb.md
@@ -94,7 +94,7 @@ The following table describes parameters in the command above. For more informat
| `-p` or `--port` | Specifies the port to be used.|
| `-h` or `--host` | Specifies the IP address of the data source. |
| `-t` or `--thread` | Specifies the number of threads for the export. Increasing the number of threads improves the concurrency of Dumpling and the export speed, and increases the database's memory consumption. Therefore, it is not recommended to set the number too large. Usually, it's less than 64.|
-| `-o` or `--output` | Specifies the export directory of the storage, which supports a local file path or an [external storage URI](/br/backup-and-restore-storages.md#uri-format).|
+| `-o` or `--output` | Specifies the export directory of the storage, which supports a local file path or an [external storage URI](/external-storage-uri.md).|
| `-r` or `--row` | Specifies the maximum number of rows in a single file. If you use this parameter, Dumpling enables the in-table concurrency to speed up the export and reduce the memory usage.|
| `-F` | Specifies the maximum size of a single file. The unit is `MiB`. It is recommended to keep the value to 256 MiB. |
| `-B` or `--database` | Specifies databases to be exported. |
diff --git a/migrate-large-mysql-to-tidb.md b/migrate-large-mysql-to-tidb.md
index cb21154d79649..c7d86294e7723 100644
--- a/migrate-large-mysql-to-tidb.md
+++ b/migrate-large-mysql-to-tidb.md
@@ -69,7 +69,7 @@ The target TiKV cluster must have enough disk space to store the imported data.
|`-P` or `--port` |MySQL port|
|`-h` or `--host` |MySQL IP address|
|`-t` or `--thread` |The number of threads used for export|
- |`-o` or `--output` |The directory that stores the exported file. Supports a local path or an [external storage URI](/br/backup-and-restore-storages.md#uri-format)|
+ |`-o` or `--output` |The directory that stores the exported file. Supports a local path or an [external storage URI](/external-storage-uri.md)|
|`-r` or `--row` |The maximum number of rows in a single file|
|`-F` |The maximum size of a single file, in MiB. Recommended value: 256 MiB.|
|-`B` or `--database` |Specifies a database to be exported|
diff --git a/sql-statements/sql-statement-backup.md b/sql-statements/sql-statement-backup.md
index 802f6a4f93e25..a3d24b146bfe8 100644
--- a/sql-statements/sql-statement-backup.md
+++ b/sql-statements/sql-statement-backup.md
@@ -107,8 +107,23 @@ BR supports backing up data to S3 or GCS:
BACKUP DATABASE `test` TO 's3://example-bucket-2020/backup-05/?access-key={YOUR_ACCESS_KEY}&secret-access-key={YOUR_SECRET_KEY}';
```
+<<<<<<< HEAD
The URL syntax is further explained in [external storage URI](/br/backup-and-restore-storages.md#uri-format).
+=======
+
+
+The URL syntax is further explained in [URI Formats of External Storage Services](/external-storage-uri.md).
+
+
+
+
+
+The URL syntax is further explained in [external storage URI](https://docs.pingcap.com/tidb/stable/external-storage-uri).
+
+
+
+>>>>>>> 6e066679c2 (dumpling: add URI formats (#15165))
When running on cloud environment where credentials should not be distributed, set the `SEND_CREDENTIALS_TO_TIKV` option to `FALSE`:
{{< copyable "sql" >}}
diff --git a/sql-statements/sql-statement-import-into.md b/sql-statements/sql-statement-import-into.md
index 2ab27bece9ffa..a89ca3923b355 100644
--- a/sql-statements/sql-statement-import-into.md
+++ b/sql-statements/sql-statement-import-into.md
@@ -19,7 +19,7 @@ This TiDB statement is not applicable to TiDB Cloud.
`IMPORT INTO` supports importing data from files stored in Amazon S3, GCS, and the TiDB local storage.
-- For data files stored in Amazon S3 or GCS, `IMPORT INTO` supports running in the [TiDB backend task distributed execution framework](/tidb-distributed-execution-framework.md).
+- For data files stored in Amazon S3, GCS, or Azure Blob Storage, `IMPORT INTO` supports running in the [TiDB backend task distributed execution framework](/tidb-distributed-execution-framework.md).
- When this framework is enabled ([tidb_enable_dist_task](/system-variables.md#tidb_enable_dist_task-new-in-v710) is `ON`), `IMPORT INTO` splits a data import job into multiple sub-jobs and distributes these sub-jobs to different TiDB nodes for execution to improve the import efficiency.
- When this framework is disabled, `IMPORT INTO` only supports running on the TiDB node where the current user is connected.
@@ -97,9 +97,9 @@ In the left side of the `SET` expression, you can only reference a column name t
### fileLocation
-It specifies the storage location of the data file, which can be an Amazon S3 or GCS URI path, or a TiDB local file path.
+It specifies the storage location of the data file, which can be an Amazon S3, GCS, or Azure Blob Storage URI path, or a TiDB local file path.
-- Amazon S3 or GCS URI path: for URI configuration details, see [External storage](/br/backup-and-restore-storages.md#uri-format).
+- Amazon S3, GCS, or Azure Blob Storage URI path: for URI configuration details, see [URI Formats of External Storage Services](/external-storage-uri.md).
- TiDB local file path: it must be an absolute path, and the file extension must be `.csv`, `.sql`, or `.parquet`. Make sure that the files corresponding to this path are stored on the TiDB node connected by the current user, and the user has the `FILE` privilege.
> **Note:**
@@ -139,6 +139,37 @@ The supported options are described as follows:
| `MAX_WRITE_SPEED=''` | All formats | Controls the write speed to a TiKV node. By default, there is no speed limit. For example, you can specify this option as `1MiB` to limit the write speed to 1 MiB/s. |
| `CHECKSUM_TABLE=''` | All formats | Configures whether to perform a checksum check on the target table after the import to validate the import integrity. The supported values include `"required"` (default), `"optional"`, and `"off"`. `"required"` means performing a checksum check after the import. If the checksum check fails, TiDB will return an error and the import will exit. `"optional"` means performing a checksum check after the import. If an error occurs, TiDB will return a warning and ignore the error. `"off"` means not performing a checksum check after the import. |
| `DETACHED` | All Formats | Controls whether to execute `IMPORT INTO` asynchronously. When this option is enabled, executing `IMPORT INTO` immediately returns the information of the import job (such as the `Job_ID`), and the job is executed asynchronously in the backend. |
+<<<<<<< HEAD
+=======
+| `CLOUD_STORAGE_URI` | All formats | Specifies the target address where encoded KV data for [global sorting](#global-sorting) is stored. When `CLOUD_STORAGE_URI` is not specified, `IMPORT INTO` determines whether to use global sorting based on the value of the system variable [`tidb_cloud_storage_uri`](/system-variables.md#tidb_cloud_storage_uri-new-in-v740). If this system variable specifies a target storage address, `IMPORT INTO` uses this address for global sorting. When `CLOUD_STORAGE_URI` is specified with a non-empty value, `IMPORT INTO` uses that value as the target storage address. When `CLOUD_STORAGE_URI` is specified with an empty value, local sorting is enforced. Currently, the target storage address only supports S3. For details about the URI configuration, see [Amazon S3 URI format](/external-storage-uri.md#amazon-s3-uri-format). When this feature is used, all TiDB nodes must have read and write access for the target S3 bucket. |
+
+## Compressed files
+
+`IMPORT INTO` supports importing compressed `CSV` and `SQL` files. It can automatically determine whether a file is compressed and the compression format based on the file extension:
+
+| Extension | Compression format |
+|:---|:---|
+| `.gz`, `.gzip` | gzip compression format |
+| `.zstd`, `.zst` | ZStd compression format |
+| `.snappy` | snappy compression format |
+
+## Global sorting
+
+`IMPORT INTO` splits the data import job of a source data file into multiple sub-jobs, each sub-job independently encoding and sorting data before importing. If the encoded KV ranges of these sub-jobs have significant overlap (to learn how TiDB encodes data to KV, see [TiDB computing](/tidb-computing.md)), TiKV needs to keep compaction during import, leading to a decrease in import performance and stability.
+
+In the following scenarios, there can be significant overlap in KV ranges:
+
+- If rows in the data file assigned to each sub-job have overlapping primary key ranges, the data KV generated by the encoding of each sub-job will also overlap.
+ - `IMPORT INTO` splits sub-jobs based on the traversal order of data files, usually sorted by file name in lexicographic order.
+- If the target table has many indexes, or the index column values are scattered in the data file, the index KV generated by the encoding of each sub-job will also overlap.
+
+When [Backend task distributed execution framework](/tidb-distributed-execution-framework.md) is enabled, you can enable global sorting by specifying the `CLOUD_STORAGE_URI` option in the `IMPORT INTO` statement or by specifying the target storage address for encoded KV data using the system variable [`tidb_cloud_storage_uri`](/system-variables.md#tidb_cloud_storage_uri-new-in-v740). Note that currently, only S3 is supported as the global sorting storage address. When global sorting is enabled, `IMPORT INTO` writes encoded KV data to the cloud storage, performs global sorting in the cloud storage, and then parallelly imports the globally sorted index and table data into TiKV. This prevents problems caused by KV overlap and enhances import stability.
+
+> **Note:**
+>
+> - If the KV range overlap in a source data file is low, enabling global sorting might decrease import performance. This is because when global sorting is enabled, TiDB needs to wait for the completion of local sorting in all sub-jobs before proceeding with the global sorting operations and subsequent import.
+> - After an import job using global sorting completes, the files stored in the cloud storage for global sorting are cleaned up asynchronously in a background thread.
+>>>>>>> 6e066679c2 (dumpling: add URI formats (#15165))
## Output
@@ -210,7 +241,7 @@ Assume that there are three files named `file-01.csv`, `file-02.csv`, and `file-
IMPORT INTO t FROM '/path/to/file-*.csv'
```
-### Import data files from Amazon S3 or GCS
+### Import data files from Amazon S3, GCS, or Azure Blob Storage
- Import data files from Amazon S3:
@@ -221,10 +252,16 @@ IMPORT INTO t FROM '/path/to/file-*.csv'
- Import data files from GCS:
```sql
- IMPORT INTO t FROM 'gs://bucket-name/test.csv';
+ IMPORT INTO t FROM 'gs://import/test.csv?credentials-file=${credentials-file-path}';
+ ```
+
+- Import data files from Azure Blob Storage:
+
+ ```sql
+ IMPORT INTO t FROM 'azure://import/test.csv?credentials-file=${credentials-file-path}';
```
-For details about the URI path configuration for Amazon S3 or GCS, see [External storage](/br/backup-and-restore-storages.md#uri-format).
+For details about the URI path configuration for Amazon S3, GCS, or Azure Blob Storage, see [URI Formats of External Storage Services](/external-storage-uri.md).
### Calculate column values using SetClause
diff --git a/sql-statements/sql-statement-restore.md b/sql-statements/sql-statement-restore.md
index fec3c8b5565f5..653787cb0907b 100644
--- a/sql-statements/sql-statement-restore.md
+++ b/sql-statements/sql-statement-restore.md
@@ -98,8 +98,23 @@ BR supports restoring data from S3 or GCS:
RESTORE DATABASE * FROM 's3://example-bucket-2020/backup-05/';
```
+<<<<<<< HEAD
The URL syntax is further explained in [external storage URI](/br/backup-and-restore-storages.md#uri-format).
+=======
+
+
+The URL syntax is further explained in [URI Formats of External Storage Services](/external-storage-uri.md).
+
+
+
+
+
+The URL syntax is further explained in [external storage URI](https://docs.pingcap.com/tidb/stable/external-storage-uri).
+
+
+
+>>>>>>> 6e066679c2 (dumpling: add URI formats (#15165))
When running on cloud environment where credentials should not be distributed, set the `SEND_CREDENTIALS_TO_TIKV` option to `FALSE`:
{{< copyable "sql" >}}
diff --git a/system-variables.md b/system-variables.md
index 9b86ffbb8c6d3..7217e4e4cdb5b 100644
--- a/system-variables.md
+++ b/system-variables.md
@@ -1409,6 +1409,37 @@ mysql> SELECT job_info FROM mysql.analyze_jobs ORDER BY end_time DESC LIMIT 1;
- Starting from TiDB v7.2.0, the framework supports distributedly executing the [`IMPORT INTO`](https://docs.pingcap.com/tidb/v7.2/sql-statement-import-into) statement for import jobs of TiDB Self-Hosted. For TiDB Cloud, the `IMPORT INTO` statement is not applicable.
- This variable is renamed from `tidb_ddl_distribute_reorg`.
+<<<<<<< HEAD
+=======
+### tidb_cloud_storage_uri New in v7.4.0
+
+> **Warning:**
+>
+> This feature is experimental. It is not recommended that you use it in the production environment. This feature might be changed or removed without prior notice. If you find a bug, you can report an [issue](https://github.com/pingcap/tidb/issues) on GitHub.
+
+- Scope: GLOBAL
+- Persists to cluster: Yes
+- Applies to hint [SET_VAR](/optimizer-hints.md#set_varvar_namevar_value): No
+- Default value: `""`
+
+
+
+- This variable is used to specify the Amazon S3 cloud storage URI to enable [Global Sort](/tidb-global-sort.md). After enabling the [distributed execution framework](/tidb-distributed-execution-framework.md), you can use the Global Sort feature by configuring the URI and pointing it to an appropriate cloud storage path with the necessary permissions to access the storage. For more details, see [Amazon S3 URI format](/external-storage-uri.md#amazon-s3-uri-format).
+- The following statements can use the Global Sort feature.
+ - The [`ADD INDEX`](/sql-statements/sql-statement-add-index.md) statement.
+ - The [`IMPORT INTO`](/sql-statements/sql-statement-import-into.md) statement for import jobs of TiDB Self-Hosted. For TiDB Cloud, the `IMPORT INTO` statement is not applicable.
+
+
+
+
+- This variable is used to specify the cloud storage URI to enable [Global Sort](/tidb-global-sort.md). After enabling the [distributed execution framework](/tidb-distributed-execution-framework.md), you can use the Global Sort feature by configuring the URI and pointing it to an appropriate cloud storage path with the necessary permissions to access the storage. For more details, see [URI Formats of External Storage Services](https://docs.pingcap.com/tidb/stable/external-storage-uri).
+- The following statements can use the Global Sort feature.
+ - The [`ADD INDEX`](/sql-statements/sql-statement-add-index.md) statement.
+ - The [`IMPORT INTO`](https://docs.pingcap.com/tidb/dev/sql-statement-import-into) statement for import jobs of TiDB Self-Hosted. For TiDB Cloud, the `IMPORT INTO` statement is not applicable.
+
+
+
+>>>>>>> 6e066679c2 (dumpling: add URI formats (#15165))
### tidb_ddl_error_count_limit
- Scope: GLOBAL
diff --git a/ticdc/ticdc-sink-to-cloud-storage.md b/ticdc/ticdc-sink-to-cloud-storage.md
index 87928270c2ad3..3d5fa0a831d61 100644
--- a/ticdc/ticdc-sink-to-cloud-storage.md
+++ b/ticdc/ticdc-sink-to-cloud-storage.md
@@ -79,7 +79,7 @@ The following is an example configuration for Azure Blob Storage:
> **Tip:**
>
-> The URI parameters of Amazon S3, GCS, and Azure Blob Storage in TiCDC are the same as their URI parameters in BR. For details, see [Backup storage URI format](/br/backup-and-restore-storages.md#uri-format-description).
+> For more information about the URI parameters of Amazon S3, GCS, and Azure Blob Storage in TiCDC, see [URI Formats of External Storage Services](/external-storage-uri.md).
### Configure sink URI for NFS
diff --git a/ticdc/ticdc-sink-to-kafka.md b/ticdc/ticdc-sink-to-kafka.md
index 6326751c64d4b..e496880d62bc4 100644
--- a/ticdc/ticdc-sink-to-kafka.md
+++ b/ticdc/ticdc-sink-to-kafka.md
@@ -351,3 +351,77 @@ When a Kafka consumer receives a message, it first checks the `onlyHandleKey` fi
> **Warning:**
>
> When the Kafka consumer processes data and queries TiDB, the data might have been deleted by GC. You need to [modify the GC Lifetime of the TiDB cluster](/system-variables.md#tidb_gc_life_time-new-in-v50) to a larger value to avoid this situation.
+<<<<<<< HEAD
+=======
+
+### Send large messages to external storage
+
+Starting from v7.4.0, TiCDC Kafka sink supports sending large messages to external storage when the message size exceeds the limit. Meanwhile, TiCDC sends a message to Kafka that contains the address of the large message in the external storage. This can avoid changefeed failures caused by the message size exceeding the Kafka topic limit.
+
+An example configuration is as follows:
+
+```toml
+[sink.kafka-config.large-message-handle]
+# large-message-handle-option is introduced in v7.3.0.
+# Defaults to "none". When the message size exceeds the limit, the changefeed fails.
+# When set to "handle-key-only", if the message size exceeds the limit, only the handle key is sent in the data field. If the message size still exceeds the limit, the changefeed fails.
+# When set to "claim-check", if the message size exceeds the limit, the message is sent to external storage.
+large-message-handle-option = "claim-check"
+claim-check-storage-uri = "s3://claim-check-bucket"
+```
+
+When `large-message-handle-option` is set to `"claim-check"`, `claim-check-storage-uri` must be set to a valid external storage address. Otherwise, creating the changefeed will fail.
+
+> **Tip**
+>
+> For more information about the URI parameters of Amazon S3, GCS, and Azure Blob Storage in TiCDC, see [URI Formats of External Storage Services](/external-storage-uri.md).
+
+TiCDC does not clean up messages on external storage services. Data consumers need to manage external storage services on their own.
+
+### Consume large messages from external storage
+
+The Kafka consumer receives a message that contains the address of the large message in the external storage. The message format is as follows:
+
+```json
+{
+ "id": 0,
+ "database": "test",
+ "table": "tp_int",
+ "pkNames": [
+ "id"
+ ],
+ "isDdl": false,
+ "type": "INSERT",
+ "es": 1639633141221,
+ "ts": 1639633142960,
+ "sql": "",
+ "sqlType": {
+ "id": 4
+ },
+ "mysqlType": {
+ "id": "int"
+ },
+ "data": [
+ {
+ "id": "2"
+ }
+ ],
+ "old": null,
+ "_tidb": { // TiDB extension fields
+ "commitTs": 163963314122145239,
+ "claimCheckLocation": "s3:/claim-check-bucket/${uuid}.json"
+ }
+}
+```
+
+If the message contains the `claimCheckLocation` field, the Kafka consumer reads the large message data stored in JSON format according to the address provided by the field. The message format is as follows:
+
+```json
+{
+ key: "xxx",
+ value: "xxx",
+}
+```
+
+The `key` and `value` fields contain the encoded large message, which should have been sent to the corresponding field in the Kafka message. Consumers can parse the data in these two parts to restore the content of the large message.
+>>>>>>> 6e066679c2 (dumpling: add URI formats (#15165))
diff --git a/tidb-lightning/tidb-lightning-command-line-full.md b/tidb-lightning/tidb-lightning-command-line-full.md
index 6450a9a9d5aa2..cb6995ffd024d 100644
--- a/tidb-lightning/tidb-lightning-command-line-full.md
+++ b/tidb-lightning/tidb-lightning-command-line-full.md
@@ -17,7 +17,7 @@ You can configure the following parameters using `tidb-lightning`:
| :---- | :---- | :---- |
| `--config ` | Read the global configuration from the file. If this parameter is not specified, TiDB Lightning uses the default configuration. | |
| `-V` | Print the program version. | |
-| `-d ` | Local directory or [external storage URI](/br/backup-and-restore-storages.md#uri-format) of data files. | `mydumper.data-source-dir` |
+| `-d ` | Local directory or [external storage URI](/external-storage-uri.md) of data files. | `mydumper.data-source-dir` |
| `-L ` | Log level: `debug`, `info`, `warn`, `error`, or `fatal`. `info` by default.| `lightning.level` |
| `-f ` | [Table filter rules](/table-filter.md). Can be specified multiple times. | `mydumper.filter` |
| `--backend ` | Select an import mode. `local` refers to [physical import mode](/tidb-lightning/tidb-lightning-physical-import-mode.md); `tidb` refers to [logical import mode](/tidb-lightning/tidb-lightning-logical-import-mode.md). | `tikv-importer.backend` |
diff --git a/tidb-lightning/tidb-lightning-configuration.md b/tidb-lightning/tidb-lightning-configuration.md
index d42b0221e183c..d00a4c730632f 100644
--- a/tidb-lightning/tidb-lightning-configuration.md
+++ b/tidb-lightning/tidb-lightning-configuration.md
@@ -447,7 +447,7 @@ log-progress = "5m"
|:----|:----|:----|
| --config *file* | Reads global configuration from *file*. If not specified, the default configuration would be used. | |
| -V | Prints program version | |
-| -d *directory* | Directory or [external storage URI](/br/backup-and-restore-storages.md#uri-format) of the data dump to read from | `mydumper.data-source-dir` |
+| -d *directory* | Directory or [external storage URI](/external-storage-uri.md) of the data dump to read from | `mydumper.data-source-dir` |
| -L *level* | Log level: debug, info, warn, error, fatal (default = info) | `lightning.log-level` |
| -f *rule* | [Table filter rules](/table-filter.md) (can be specified multiple times) | `mydumper.filter` |
| --backend *[backend](/tidb-lightning/tidb-lightning-overview.md)* | Select an import mode. `local` refers to the physical import mode; `tidb` refers to the logical import mode. | `local` |
diff --git a/tidb-lightning/tidb-lightning-data-source.md b/tidb-lightning/tidb-lightning-data-source.md
index 721a1665b0c0f..5a7c8bf7dada3 100644
--- a/tidb-lightning/tidb-lightning-data-source.md
+++ b/tidb-lightning/tidb-lightning-data-source.md
@@ -333,7 +333,7 @@ type = '$3'
## Import data from Amazon S3
-The following examples show how to import data from Amazon S3 using TiDB Lightning. For more parameter configurations, see [external storage URI](/br/backup-and-restore-storages.md#uri-format).
+The following examples show how to import data from Amazon S3 using TiDB Lightning. For more parameter configurations, see [URI Formats of External Storage Services](/external-storage-uri.md).
+ Use the locally configured permissions to access S3 data:
diff --git a/tidb-lightning/tidb-lightning-distributed-import.md b/tidb-lightning/tidb-lightning-distributed-import.md
index 5ca0828cbd5b0..20d6219d0ea09 100644
--- a/tidb-lightning/tidb-lightning-distributed-import.md
+++ b/tidb-lightning/tidb-lightning-distributed-import.md
@@ -111,7 +111,7 @@ If the data source is stored in external storage such as Amazon S3 or GCS, you n
-d 's3://my-bucket/sql-backup'
```
-For more parameter descriptions, see [external storage URI](/br/backup-and-restore-storages.md#uri-format).
+For more parameter descriptions, see [URI Formats of External Storage Services](/external-storage-uri.md).
### Step 3: Start TiDB Lightning to import data
@@ -143,7 +143,7 @@ Wait for all TiDB Lightning instances to finish, then the entire import is compl
## Example 2: Import single tables in parallel
-TiDB Lightning also supports parallel import of single tables. For example, import multiple single tables stored in Amazon S3 by different TiDB Lightning instances into the downstream TiDB cluster in parallel. This method can speed up the overall import speed. When remote storages such as Amazon S3 is used, the configuration parameters of TiDB Lightning are the same as those of BR. For more details, see [external storage URI](/br/backup-and-restore-storages.md#uri-format).
+TiDB Lightning also supports parallel import of single tables. For example, import multiple single tables stored in Amazon S3 by different TiDB Lightning instances into the downstream TiDB cluster in parallel. This method can speed up the overall import speed. When remote storages such as Amazon S3 is used, the configuration parameters of TiDB Lightning are the same as those of BR. For more details, see [URI Formats of External Storage Services](/external-storage-uri.md).
> **Note:**
>
diff --git a/tidb-lightning/tidb-lightning-overview.md b/tidb-lightning/tidb-lightning-overview.md
index d4918a27a1152..8aab79e8d5e4b 100644
--- a/tidb-lightning/tidb-lightning-overview.md
+++ b/tidb-lightning/tidb-lightning-overview.md
@@ -16,8 +16,9 @@ TiDB Lightning supports the following file formats:
TiDB Lightning can read data from the following sources:
- Local
-- [Amazon S3](/br/backup-and-restore-storages.md#uri-format)
-- [Google Cloud Storage](/br/backup-and-restore-storages.md#uri-format)
+- [Amazon S3](/external-storage-uri.md#amazon-s3-uri-format)
+- [Google Cloud Storage](/external-storage-uri.md#gcs-uri-format)
+- [Azure Blob Storage](/external-storage-uri.md#azure-blob-storage-uri-format)
## TiDB Lightning architecture