From 5af9c7d6c01edeb62fcd58ca14c19443df8786a4 Mon Sep 17 00:00:00 2001 From: xixirangrang <35301108+hfxsd@users.noreply.github.com> Date: Tue, 24 Oct 2023 09:55:39 +0800 Subject: [PATCH 01/15] Update dumpling-overview.md --- dumpling-overview.md | 70 ++++++++++++++++++++++++++++++++++++-------- 1 file changed, 57 insertions(+), 13 deletions(-) diff --git a/dumpling-overview.md b/dumpling-overview.md index a59c237bcad63..3bd06ba863324 100644 --- a/dumpling-overview.md +++ b/dumpling-overview.md @@ -95,19 +95,7 @@ dumpling -u root -P 4000 -h 127.0.0.1 --filetype sql -t 8 -o /tmp/test -r 200000 In the command above: + The `-h`, `-P`, and `-u` option respectively mean the address, the port, and the user. If a password is required for authentication, you can use `-p $YOUR_SECRET_PASSWORD` to pass the password to Dumpling. - - - -+ The `-o` (or `--output`) option specifies the export directory of the storage, which supports an absolute local file path or an [external storage URI](/br/backup-and-restore-storages.md#uri-format). - - - - - -+ The `-o` (or `--output`) option specifies the export directory of the storage, which supports an absolute local file path or an [external storage URI](https://docs.pingcap.com/tidb/stable/backup-and-restore-storages#uri-format). - - - ++ The `-o` (or `--output`) option specifies the export directory of the storage, which supports an absolute local file path or an [external storage URI](#uri-formats-of-the-storage-services). + The `-t` option specifies the number of threads for the export. Increasing the number of threads improves the concurrency of Dumpling and the export speed, and also increases the database's memory consumption. Therefore, it is not recommended to set the number too large. Usually, it's less than 64. + The `-r` option enables the in-table concurrency to speed up the export. The default value is `0`, which means disabled. A value greater than 0 means it is enabled, and the value is of `INT` type. When the source database is TiDB, a `-r` value greater than 0 indicates that the TiDB region information is used for splitting, and reduces the memory usage. The specific `-r` value does not affect the split algorithm. When the source database is MySQL and the primary key is of the `INT` type, specifying `-r` can also enable the in-table concurrency. + The `-F` option is used to specify the maximum size of a single file (the unit here is `MiB`; inputs like `5GiB` or `8KB` are also acceptable). It is recommended to keep its value to 256 MiB or less if you plan to use TiDB Lightning to load this file into a TiDB instance. @@ -116,6 +104,62 @@ In the command above: > > If the size of a single exported table exceeds 10 GB, it is **strongly recommended to use** the `-r` and `-F` options. +#### URI formats of the storage services + +This section describes the URI formats of the storage services, including Amazon S3, GCS, and Azure Blob Storage. The URI format is as follows: + +```shell +[scheme]://[host]/[path]?[parameters] +``` + + +
+ +- `scheme`: `s3` +- `host`: `bucket name` +- `parameters`: + + - `access-key`: Specifies the access key. + - `secret-access-key`: Specifies the secret access key. + - `session-token`: Specifies the temporary session token. BR does not support this parameter yet. + - `use-accelerate-endpoint`: Specifies whether to use the accelerate endpoint on Amazon S3 (defaults to `false`). + - `endpoint`: Specifies the URL of custom endpoint for S3-compatible services (for example, ``). + - `force-path-style`: Use path style access rather than virtual hosted style access (defaults to `true`). + - `storage-class`: Specifies the storage class of the uploaded objects (for example, `STANDARD` or `STANDARD_IA`). + - `sse`: Specifies the server-side encryption algorithm used to encrypt the uploaded objects (value options: ``, `AES256`, or `aws:kms`). + - `sse-kms-key-id`: Specifies the KMS ID if `sse` is set to `aws:kms`. + - `acl`: Specifies the canned ACL of the uploaded objects (for example, `private` or `authenticated-read`). + - `role-arn`: When you need to access Amazon S3 data from a third party using a specified [IAM role](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles.html), you can specify the corresponding [Amazon Resource Name (ARN)](https://docs.aws.amazon.com/general/latest/gr/aws-arns-and-namespaces.html) of the IAM role with the `role-arn` URL query parameter, such as `arn:aws:iam::888888888888:role/my-role`. For more information about using an IAM role to access Amazon S3 data from a third party, see [AWS documentation](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_common-scenarios_third-party.html). + - `external-id`: When you access Amazon S3 data from a third party, you might need to specify a correct [external ID](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-user_externalid.html) to assume [the IAM role](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles.html). In this case, you can use this `external-id` URL query parameter to specify the external ID and make sure that you can assume the IAM role. An external ID is an arbitrary string provided by the third party together with the IAM role ARN to access the Amazon S3 data. Providing an external ID is optional when assuming an IAM role, which means if the third party does not require an external ID for the IAM role, you can assume the IAM role and access the corresponding Amazon S3 data without providing this parameter. + +
+
+ +- `scheme`: `gcs` or `gs` +- `host`: `bucket name` +- `parameters`: + + - `credentials-file`: Specifies the path to the credentials JSON file on the migration tool node. + - `storage-class`: Specifies the storage class of the uploaded objects (for example, `STANDARD` or `COLDLINE`) + - `predefined-acl`: Specifies the predefined ACL of the uploaded objects (for example, `private` or `project-private`) + +
+
+ +- `scheme`: `azure` or `azblob` +- `host`: `container name` +- `parameters`: + + - `account-name`: Specifies the account name of the storage. + - `account-key`: Specifies the access key. + - `sas-token`: Specifies the shared access signature (SAS) token. + - `access-tier`: Specifies the access tier of the uploaded objects, for example, `Hot`, `Cool`, or `Archive`. The default value is the default access tier of the storage account. + - `encryption-scope`: Specifies the [encryption scope](https://learn.microsoft.com/en-us/azure/storage/blobs/encryption-scope-manage?tabs=powershell#upload-a-blob-with-an-encryption-scope) for server-side encryption. + - `encryption-key`: Specifies the [encryption key](https://learn.microsoft.com/en-us/azure/storage/blobs/encryption-customer-provided-keys) for server-side encryption, which uses the AES256 encryption algorithm. + +
+
+ ### Export to CSV files You can export data to CSV files by adding the `--filetype csv` argument. From f10ddf6f3ecf847cd061e906465de088ff42f4eb Mon Sep 17 00:00:00 2001 From: xixirangrang Date: Wed, 25 Oct 2023 13:21:34 +0800 Subject: [PATCH 02/15] Update dumpling-overview.md --- dumpling-overview.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/dumpling-overview.md b/dumpling-overview.md index 3bd06ba863324..dad6cb83f283d 100644 --- a/dumpling-overview.md +++ b/dumpling-overview.md @@ -121,7 +121,7 @@ This section describes the URI formats of the storage services, including Amazon - `access-key`: Specifies the access key. - `secret-access-key`: Specifies the secret access key. - - `session-token`: Specifies the temporary session token. BR does not support this parameter yet. + - `session-token`: Specifies the temporary session token. - `use-accelerate-endpoint`: Specifies whether to use the accelerate endpoint on Amazon S3 (defaults to `false`). - `endpoint`: Specifies the URL of custom endpoint for S3-compatible services (for example, ``). - `force-path-style`: Use path style access rather than virtual hosted style access (defaults to `true`). From f882c40040abc4fbfc3c147dcb85566e00e39aff Mon Sep 17 00:00:00 2001 From: xixirangrang <35301108+hfxsd@users.noreply.github.com> Date: Wed, 25 Oct 2023 16:58:38 +0800 Subject: [PATCH 03/15] Update dumpling-overview.md --- dumpling-overview.md | 18 ++---------------- 1 file changed, 2 insertions(+), 16 deletions(-) diff --git a/dumpling-overview.md b/dumpling-overview.md index dad6cb83f283d..3a2468fb2d1a7 100644 --- a/dumpling-overview.md +++ b/dumpling-overview.md @@ -277,19 +277,7 @@ export AWS_ACCESS_KEY_ID=${AccessKey} export AWS_SECRET_ACCESS_KEY=${SecretKey} ``` - - -Dumpling also supports reading credential files from `~/.aws/credentials`. Parameters for exporting data to Amazon S3 using Dumpling are the same as the parameters used in BR. For more parameter descriptions, see [external storage URI](/br/backup-and-restore-storages.md#uri-format). - - - - - -Dumpling also supports reading credential files from `~/.aws/credentials`. Parameters for exporting data to Amazon S3 using Dumpling are the same as the parameters used in BR. For more parameter descriptions, see [external storage URI](https://docs.pingcap.com/tidb/stable/backup-and-restore-storages#uri-format). - - - -{{< copyable "shell-regular" >}} +Dumpling also supports reading credential files from `~/.aws/credentials`. Parameters for exporting data to Amazon S3 using Dumpling are the same as the parameters used in BR. For more parameter descriptions, see [external storage URI](#uri-formats-of-the-storage-services). ```shell ./dumpling -u root -P 4000 -h 127.0.0.1 -r 200000 -o "s3://${Bucket}/${Folder}" @@ -301,8 +289,6 @@ Dumpling also supports reading credential files from `~/.aws/credentials`. Param By default, Dumpling exports all databases except system databases (including `mysql`, `sys`, `INFORMATION_SCHEMA`, `PERFORMANCE_SCHEMA`, `METRICS_SCHEMA`, and `INSPECTION_SCHEMA`). You can use `--where ` to select the records to be exported. -{{< copyable "shell-regular" >}} - ```shell ./dumpling -u root -P 4000 -h 127.0.0.1 -o /tmp/test --where "id < 100" ``` @@ -446,7 +432,7 @@ SET GLOBAL tidb_gc_life_time = '10m'; | `-s` or `--statement-size` | Control the size of the `INSERT` statements; the unit is bytes | | `-F` or `--filesize` | The file size of the divided tables. The unit must be specified such as `128B`, `64KiB`, `32MiB`, and `1.5GiB`. | | `--filetype` | Exported file type (csv/sql) | "sql" | -| `-o` or `--output` | Specify the absolute local file path or [external storage URI](https://docs.pingcap.com/tidb/stable/backup-and-restore-storages#uri-format) for exporting the data. | "./export-${time}" | +| `-o` or `--output` | Specify the absolute local file path or [external storage URI](#uri-formats-of-the-storage-services) for exporting the data. | "./export-${time}" | | `-S` or `--sql` | Export data according to the specified SQL statement. This command does not support concurrent export. | | `--consistency` | flush: use FTWRL before the dump
snapshot: dump the TiDB data of a specific snapshot of a TSO
lock: execute `lock tables read` on all tables to be dumped
none: dump without adding locks, which cannot guarantee consistency
auto: use --consistency flush for MySQL; use --consistency snapshot for TiDB | "auto" | | `--snapshot` | Snapshot TSO; valid only when `consistency=snapshot` | From c090a113880e56c6b9d2a934399c0967269218cc Mon Sep 17 00:00:00 2001 From: xixirangrang <35301108+hfxsd@users.noreply.github.com> Date: Wed, 25 Oct 2023 20:21:05 +0800 Subject: [PATCH 04/15] create a new article for uri, and updated all links --- TOC-tidb-cloud.md | 1 + TOC.md | 1 + br/backup-and-restore-overview.md | 2 +- br/br-incremental-guide.md | 2 +- br/br-pitr-manual.md | 10 +-- br/br-snapshot-guide.md | 2 +- br/use-br-command-line-tool.md | 2 +- dumpling-overview.md | 54 ++------------ error-codes.md | 2 +- external-storage-uri.md | 72 +++++++++++++++++++ migrate-large-mysql-shards-to-tidb.md | 2 +- migrate-large-mysql-to-tidb.md | 2 +- sql-statements/sql-statement-backup.md | 12 +--- sql-statements/sql-statement-import-into.md | 6 +- sql-statements/sql-statement-restore.md | 12 +--- system-variables.md | 6 +- ticdc/ticdc-sink-to-cloud-storage.md | 2 +- ticdc/ticdc-sink-to-kafka.md | 2 +- .../tidb-lightning-command-line-full.md | 2 +- .../tidb-lightning-configuration.md | 2 +- tidb-lightning/tidb-lightning-data-source.md | 2 +- .../tidb-lightning-distributed-import.md | 4 +- tidb-lightning/tidb-lightning-overview.md | 4 +- 23 files changed, 107 insertions(+), 99 deletions(-) create mode 100644 external-storage-uri.md diff --git a/TOC-tidb-cloud.md b/TOC-tidb-cloud.md index 17c96b8622416..6d556da03b292 100644 --- a/TOC-tidb-cloud.md +++ b/TOC-tidb-cloud.md @@ -620,6 +620,7 @@ - [TiDB Global Sort](/tidb-global-sort.md) - [DDL Execution Principles and Best Practices](/ddl-introduction.md) - [Troubleshoot Inconsistency Between Data and Indexes](/troubleshoot-data-inconsistency-errors.md) + - [URI Formats](/external-storage-uri.md) - [Support](/tidb-cloud/tidb-cloud-support.md) - [Glossary](/tidb-cloud/tidb-cloud-glossary.md) - FAQs diff --git a/TOC.md b/TOC.md index 7ce397aa7df2a..247073cab440e 100644 --- a/TOC.md +++ b/TOC.md @@ -995,6 +995,7 @@ - [Error Codes](/error-codes.md) - [Table Filter](/table-filter.md) - [Schedule Replicas by Topology Labels](/schedule-replicas-by-topology-labels.md) + - [URI Formats](/external-storage-uri.md) - Internal Components - [TiDB Backend Task Distributed Execution Framework](/tidb-distributed-execution-framework.md) - [TiDB Global Sort](/tidb-global-sort.md) diff --git a/br/backup-and-restore-overview.md b/br/backup-and-restore-overview.md index 28ca42cd6df35..5c0ad2eed447c 100644 --- a/br/backup-and-restore-overview.md +++ b/br/backup-and-restore-overview.md @@ -100,7 +100,7 @@ Corresponding to the backup features, you can perform two types of restore: full TiDB supports backing up data to Amazon S3, Google Cloud Storage (GCS), Azure Blob Storage, NFS, and other S3-compatible file storage services. For details, see the following documents: -- [Specify backup storage in URI](/br/backup-and-restore-storages.md#uri-format) +- [Specify backup storage in URI](/external-storage-uri.md) - [Configure access privileges to backup storages](/br/backup-and-restore-storages.md#authentication) ## Compatibility diff --git a/br/br-incremental-guide.md b/br/br-incremental-guide.md index 098547c63fa50..03e29d2f893d6 100644 --- a/br/br-incremental-guide.md +++ b/br/br-incremental-guide.md @@ -30,7 +30,7 @@ tiup br backup full --pd "${PD_IP}:2379" \ - `--lastbackupts`: The last backup timestamp. - `--ratelimit`: The maximum speed **per TiKV** performing backup tasks (in MiB/s). -- `storage`: The storage path of backup data. You need to save the incremental backup data under a different path from the previous snapshot backup. In the preceding example, incremental backup data is saved in the `incr` directory under the full backup data. For details, see [Backup storage URI configuration](/br/backup-and-restore-storages.md#uri-format). +- `storage`: The storage path of backup data. You need to save the incremental backup data under a different path from the previous snapshot backup. In the preceding example, incremental backup data is saved in the `incr` directory under the full backup data. For details, see [URI Formats of External Storage Services](/external-storage-uri.md). ## Restore incremental data diff --git a/br/br-pitr-manual.md b/br/br-pitr-manual.md index 291be5dbbfa78..1c2095d24620a 100644 --- a/br/br-pitr-manual.md +++ b/br/br-pitr-manual.md @@ -78,7 +78,7 @@ The example output only shows the common parameters. These parameters are descri - `task-name`: specifies the task name for the log backup. This name is also used to query, pause, and resume the backup task. - `--ca`, `--cert`, `--key`: specifies the mTLS encryption method to communicate with TiKV and PD. - `--pd`: specifies the PD address for the backup cluster. BR needs to access PD to start the log backup task. -- `--storage`: specifies the backup storage address. Currently, BR supports Amazon S3, Google Cloud Storage (GCS), or Azure Blob Storage as the storage for log backup. The preceding command uses Amazon S3 as an example. For details, see [URI format of backup storages](/br/backup-and-restore-storages.md#uri-format). +- `--storage`: specifies the backup storage address. Currently, BR supports Amazon S3, Google Cloud Storage (GCS), or Azure Blob Storage as the storage for log backup. The preceding command uses Amazon S3 as an example. For details, see [URI Formats of External Storage Services](/external-storage-uri.md). Usage example: @@ -284,7 +284,7 @@ This command only accesses the backup storage and does not access the TiDB clust - `--dry-run`: run the command but do not really delete the files. - `--until`: delete all log backup data before the specified timestamp. -- `--storage`: the backup storage address. Currently, BR supports Amazon S3, GCS, or Azure Blob Storage as the storage for log backup. For details, see [URI format of backup storages](/br/backup-and-restore-storages.md#uri-format). +- `--storage`: the backup storage address. Currently, BR supports Amazon S3, GCS, or Azure Blob Storage as the storage for log backup. For details, see [URI Formats of External Storage Services](/external-storage-uri.md). Usage example: @@ -325,7 +325,7 @@ Global Flags: This command only accesses the backup storage and does not access the TiDB cluster. -The `--storage` parameter is used to specify the backup storage address. Currently, BR supports Amazon S3, GCS, or Azure Blob Storage as the storage for log backup. For details, see [URI format of backup storages](/br/backup-and-restore-storages.md#uri-format). +The `--storage` parameter is used to specify the backup storage address. Currently, BR supports Amazon S3, GCS, or Azure Blob Storage as the storage for log backup. For details, see [URI Formats of External Storage Services](/external-storage-uri.md). Usage example: @@ -369,12 +369,12 @@ Global Flags: The example output only shows the common parameters. These parameters are described as follows: -- `--full-backup-storage`: the storage address for the snapshot (full) backup. To use PITR, specify this parameter and choose the latest snapshot backup before the restore timestamp. To restore only log backup data, you can omit this parameter. Currently, BR supports Amazon S3, GCS, or Azure Blob Storage as the storage for log backup. For details, see [URI format of backup storages](/br/backup-and-restore-storages.md#uri-format). +- `--full-backup-storage`: the storage address for the snapshot (full) backup. To use PITR, specify this parameter and choose the latest snapshot backup before the restore timestamp. To restore only log backup data, you can omit this parameter. Currently, BR supports Amazon S3, GCS, or Azure Blob Storage as the storage for log backup. For details, see [URI Formats of External Storage Services](/external-storage-uri.md). - `--restored-ts`: the timestamp that you want to restore data to. If this parameter is not specified, BR restores data to the latest timestamp available in the log backup, that is, the checkpoint of the backup data. - `--start-ts`: the start timestamp that you want to restore log backup data from. If you only need to restore log backup data, you must specify this parameter. - `--pd`: the PD address of the restore cluster. - `--ca`, `--cert`, `--key`: specify the mTLS encryption method to communicate with TiKV and PD. -- `--storage`: the storage address for the log backup. Currently, BR supports Amazon S3, GCS, or Azure Blob Storage as the storage for log backup. For details, see [URI format of backup storages](/br/backup-and-restore-storages.md#uri-format). +- `--storage`: the storage address for the log backup. Currently, BR supports Amazon S3, GCS, or Azure Blob Storage as the storage for log backup. For details, see [URI Formats of External Storage Services](/external-storage-uri.md). Usage example: diff --git a/br/br-snapshot-guide.md b/br/br-snapshot-guide.md index 7d9152b1f9751..13073355b8ba5 100644 --- a/br/br-snapshot-guide.md +++ b/br/br-snapshot-guide.md @@ -34,7 +34,7 @@ tiup br backup full --pd "${PD_IP}:2379" \ In the preceding command: - `--backupts`: The time point of the snapshot. The format can be [TSO](/glossary.md#tso) or timestamp, such as `400036290571534337` or `2018-05-11 01:42:23`. If the data of this snapshot is garbage collected, the `br backup` command returns an error and `br` exits. If you leave this parameter unspecified, `br` picks the snapshot corresponding to the backup start time. -- `--storage`: The storage address of the backup data. Snapshot backup supports Amazon S3, Google Cloud Storage, and Azure Blob Storage as backup storage. The preceding command uses Amazon S3 as an example. For more details, see [URI format of backup storages](/br/backup-and-restore-storages.md#uri-format). +- `--storage`: The storage address of the backup data. Snapshot backup supports Amazon S3, Google Cloud Storage, and Azure Blob Storage as backup storage. The preceding command uses Amazon S3 as an example. For more details, see [URI Formats of External Storage Services](/external-storage-uri.md). - `--ratelimit`: The maximum speed **per TiKV** performing backup tasks. The unit is in MiB/s. During backup, a progress bar is displayed in the terminal as shown below. When the progress bar advances to 100%, the backup task is completed and statistics such as total backup time, average backup speed, and backup data size are displayed. diff --git a/br/use-br-command-line-tool.md b/br/use-br-command-line-tool.md index 15dbbef486578..c9bd3da4f16d9 100644 --- a/br/use-br-command-line-tool.md +++ b/br/use-br-command-line-tool.md @@ -42,7 +42,7 @@ A `br` command consists of multiple layers of sub-commands. Currently, br comman ### Common options * `--pd`: specifies the PD service address. For example, `"${PD_IP}:2379"`. -* `-s` (or `--storage`): specifies the path where the backup files are stored. Amazon S3, Google Cloud Storage (GCS), Azure Blob Storage, and NFS are supported to store backup data. For more details, refer to [URI format of backup storages](/br/backup-and-restore-storages.md#uri-format). +* `-s` (or `--storage`): specifies the path where the backup files are stored. Amazon S3, Google Cloud Storage (GCS), Azure Blob Storage, and NFS are supported to store backup data. For more details, refer to [URI Formats of External Storage Services](/external-storage-uri.md). * `--ca`: specifies the path to the trusted CA certificate in the PEM format. * `--cert`: specifies the path to the SSL certificate in the PEM format. * `--key`: specifies the path to the SSL certificate key in the PEM format. diff --git a/dumpling-overview.md b/dumpling-overview.md index 3a2468fb2d1a7..4d3ae38f1ee7d 100644 --- a/dumpling-overview.md +++ b/dumpling-overview.md @@ -95,7 +95,7 @@ dumpling -u root -P 4000 -h 127.0.0.1 --filetype sql -t 8 -o /tmp/test -r 200000 In the command above: + The `-h`, `-P`, and `-u` option respectively mean the address, the port, and the user. If a password is required for authentication, you can use `-p $YOUR_SECRET_PASSWORD` to pass the password to Dumpling. -+ The `-o` (or `--output`) option specifies the export directory of the storage, which supports an absolute local file path or an [external storage URI](#uri-formats-of-the-storage-services). ++ The `-o` (or `--output`) option specifies the export directory of the storage, which supports an absolute local file path or an [external storage URI](/external-storage-uri.md). + The `-t` option specifies the number of threads for the export. Increasing the number of threads improves the concurrency of Dumpling and the export speed, and also increases the database's memory consumption. Therefore, it is not recommended to set the number too large. Usually, it's less than 64. + The `-r` option enables the in-table concurrency to speed up the export. The default value is `0`, which means disabled. A value greater than 0 means it is enabled, and the value is of `INT` type. When the source database is TiDB, a `-r` value greater than 0 indicates that the TiDB region information is used for splitting, and reduces the memory usage. The specific `-r` value does not affect the split algorithm. When the source database is MySQL and the primary key is of the `INT` type, specifying `-r` can also enable the in-table concurrency. + The `-F` option is used to specify the maximum size of a single file (the unit here is `MiB`; inputs like `5GiB` or `8KB` are also acceptable). It is recommended to keep its value to 256 MiB or less if you plan to use TiDB Lightning to load this file into a TiDB instance. @@ -112,53 +112,7 @@ This section describes the URI formats of the storage services, including Amazon [scheme]://[host]/[path]?[parameters] ``` - -
- -- `scheme`: `s3` -- `host`: `bucket name` -- `parameters`: - - - `access-key`: Specifies the access key. - - `secret-access-key`: Specifies the secret access key. - - `session-token`: Specifies the temporary session token. - - `use-accelerate-endpoint`: Specifies whether to use the accelerate endpoint on Amazon S3 (defaults to `false`). - - `endpoint`: Specifies the URL of custom endpoint for S3-compatible services (for example, ``). - - `force-path-style`: Use path style access rather than virtual hosted style access (defaults to `true`). - - `storage-class`: Specifies the storage class of the uploaded objects (for example, `STANDARD` or `STANDARD_IA`). - - `sse`: Specifies the server-side encryption algorithm used to encrypt the uploaded objects (value options: ``, `AES256`, or `aws:kms`). - - `sse-kms-key-id`: Specifies the KMS ID if `sse` is set to `aws:kms`. - - `acl`: Specifies the canned ACL of the uploaded objects (for example, `private` or `authenticated-read`). - - `role-arn`: When you need to access Amazon S3 data from a third party using a specified [IAM role](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles.html), you can specify the corresponding [Amazon Resource Name (ARN)](https://docs.aws.amazon.com/general/latest/gr/aws-arns-and-namespaces.html) of the IAM role with the `role-arn` URL query parameter, such as `arn:aws:iam::888888888888:role/my-role`. For more information about using an IAM role to access Amazon S3 data from a third party, see [AWS documentation](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_common-scenarios_third-party.html). - - `external-id`: When you access Amazon S3 data from a third party, you might need to specify a correct [external ID](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-user_externalid.html) to assume [the IAM role](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles.html). In this case, you can use this `external-id` URL query parameter to specify the external ID and make sure that you can assume the IAM role. An external ID is an arbitrary string provided by the third party together with the IAM role ARN to access the Amazon S3 data. Providing an external ID is optional when assuming an IAM role, which means if the third party does not require an external ID for the IAM role, you can assume the IAM role and access the corresponding Amazon S3 data without providing this parameter. - -
-
- -- `scheme`: `gcs` or `gs` -- `host`: `bucket name` -- `parameters`: - - - `credentials-file`: Specifies the path to the credentials JSON file on the migration tool node. - - `storage-class`: Specifies the storage class of the uploaded objects (for example, `STANDARD` or `COLDLINE`) - - `predefined-acl`: Specifies the predefined ACL of the uploaded objects (for example, `private` or `project-private`) - -
-
- -- `scheme`: `azure` or `azblob` -- `host`: `container name` -- `parameters`: - - - `account-name`: Specifies the account name of the storage. - - `account-key`: Specifies the access key. - - `sas-token`: Specifies the shared access signature (SAS) token. - - `access-tier`: Specifies the access tier of the uploaded objects, for example, `Hot`, `Cool`, or `Archive`. The default value is the default access tier of the storage account. - - `encryption-scope`: Specifies the [encryption scope](https://learn.microsoft.com/en-us/azure/storage/blobs/encryption-scope-manage?tabs=powershell#upload-a-blob-with-an-encryption-scope) for server-side encryption. - - `encryption-key`: Specifies the [encryption key](https://learn.microsoft.com/en-us/azure/storage/blobs/encryption-customer-provided-keys) for server-side encryption, which uses the AES256 encryption algorithm. - -
-
+For more information, see [URI Formats of External Storage Services](/external-storage-uri.md). ### Export to CSV files @@ -277,7 +231,7 @@ export AWS_ACCESS_KEY_ID=${AccessKey} export AWS_SECRET_ACCESS_KEY=${SecretKey} ``` -Dumpling also supports reading credential files from `~/.aws/credentials`. Parameters for exporting data to Amazon S3 using Dumpling are the same as the parameters used in BR. For more parameter descriptions, see [external storage URI](#uri-formats-of-the-storage-services). +Dumpling also supports reading credential files from `~/.aws/credentials`. Parameters for exporting data to Amazon S3 using Dumpling are the same as the parameters used in BR. For more parameter descriptions, see [external storage URI](/external-storage-uri.md). ```shell ./dumpling -u root -P 4000 -h 127.0.0.1 -r 200000 -o "s3://${Bucket}/${Folder}" @@ -432,7 +386,7 @@ SET GLOBAL tidb_gc_life_time = '10m'; | `-s` or `--statement-size` | Control the size of the `INSERT` statements; the unit is bytes | | `-F` or `--filesize` | The file size of the divided tables. The unit must be specified such as `128B`, `64KiB`, `32MiB`, and `1.5GiB`. | | `--filetype` | Exported file type (csv/sql) | "sql" | -| `-o` or `--output` | Specify the absolute local file path or [external storage URI](#uri-formats-of-the-storage-services) for exporting the data. | "./export-${time}" | +| `-o` or `--output` | Specify the absolute local file path or [external storage URI](/external-storage-uri.md) for exporting the data. | "./export-${time}" | | `-S` or `--sql` | Export data according to the specified SQL statement. This command does not support concurrent export. | | `--consistency` | flush: use FTWRL before the dump
snapshot: dump the TiDB data of a specific snapshot of a TSO
lock: execute `lock tables read` on all tables to be dumped
none: dump without adding locks, which cannot guarantee consistency
auto: use --consistency flush for MySQL; use --consistency snapshot for TiDB | "auto" | | `--snapshot` | Snapshot TSO; valid only when `consistency=snapshot` | diff --git a/error-codes.md b/error-codes.md index 1b08664077024..a16516d54d09c 100644 --- a/error-codes.md +++ b/error-codes.md @@ -372,7 +372,7 @@ TiDB is compatible with the error codes in MySQL, and in most cases returns the * Error Number: 8158 - The provided path is invalid. Refer to the specific error message for actions. For Amazon S3 or GCS path settings, see [External storage](/br/backup-and-restore-storages.md#uri-format). + The provided path is invalid. Refer to the specific error message for actions. For Amazon S3 or GCS path settings, see [External storage](/external-storage-uri.md). * Error Number: 8159 diff --git a/external-storage-uri.md b/external-storage-uri.md new file mode 100644 index 0000000000000..949b1abf1c4e2 --- /dev/null +++ b/external-storage-uri.md @@ -0,0 +1,72 @@ +--- +title: URI Formats of External Storage Services +summary: Describes the storage URI formats of external storage services, including Amazon S3, GCS, and Azure Blob Storage. +--- + +## URI Formats of External Storage Services + +This document describes the URI formats of the storage services. The basic format of the URI is as follows: + +```shell +[scheme]://[host]/[path]?[parameters] +``` + +## Amazon S3 URI format + +- `scheme`: `s3` +- `host`: `bucket name` +- `parameters`: + + - `access-key`: Specifies the access key. + - `secret-access-key`: Specifies the secret access key. + - `session-token`: Specifies the temporary session token. + - `use-accelerate-endpoint`: Specifies whether to use the accelerate endpoint on Amazon S3 (defaults to `false`). + - `endpoint`: Specifies the URL of custom endpoint for S3-compatible services (for example, ``). + - `force-path-style`: Use path style access rather than virtual hosted style access (defaults to `true`). + - `storage-class`: Specifies the storage class of the uploaded objects (for example, `STANDARD` or `STANDARD_IA`). + - `sse`: Specifies the server-side encryption algorithm used to encrypt the uploaded objects (value options: ``, `AES256`, or `aws:kms`). + - `sse-kms-key-id`: Specifies the KMS ID if `sse` is set to `aws:kms`. + - `acl`: Specifies the canned ACL of the uploaded objects (for example, `private` or `authenticated-read`). + - `role-arn`: When you need to access Amazon S3 data from a third party using a specified [IAM role](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles.html), you can specify the corresponding [Amazon Resource Name (ARN)](https://docs.aws.amazon.com/general/latest/gr/aws-arns-and-namespaces.html) of the IAM role with the `role-arn` URL query parameter, such as `arn:aws:iam::888888888888:role/my-role`. For more information about using an IAM role to access Amazon S3 data from a third party, see [AWS documentation](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_common-scenarios_third-party.html). + - `external-id`: When you access Amazon S3 data from a third party, you might need to specify a correct [external ID](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-user_externalid.html) to assume [the IAM role](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles.html). In this case, you can use this `external-id` URL query parameter to specify the external ID and make sure that you can assume the IAM role. An external ID is an arbitrary string provided by the third party together with the IAM role ARN to access the Amazon S3 data. Providing an external ID is optional when assuming an IAM role, which means if the third party does not require an external ID for the IAM role, you can assume the IAM role and access the corresponding Amazon S3 data without providing this parameter. + +Example: + +```shell +s3://external/backup-20220915?access-key=${access-key}&secret-access-key=${secret-access-key} +``` + +## GCS URI format + +- `scheme`: `gcs` or `gs` +- `host`: `bucket name` +- `parameters`: + + - `credentials-file`: Specifies the path to the credentials JSON file on the migration tool node. + - `storage-class`: Specifies the storage class of the uploaded objects (for example, `STANDARD` or `COLDLINE`) + - `predefined-acl`: Specifies the predefined ACL of the uploaded objects (for example, `private` or `project-private`) + +Example: + +```shell +gcs://external/backup-20220915?credentials-file=${credentials-file-path} +``` + +## Azure Blob Storage URI format + +- `scheme`: `azure` or `azblob` +- `host`: `container name` +- `parameters`: + + - `account-name`: Specifies the account name of the storage. + - `account-key`: Specifies the access key. + - `sas-token`: Specifies the shared access signature (SAS) token. + - `access-tier`: Specifies the access tier of the uploaded objects, for example, `Hot`, `Cool`, or `Archive`. The default value is the default access tier of the storage account. + - `encryption-scope`: Specifies the [encryption scope](https://learn.microsoft.com/en-us/azure/storage/blobs/encryption-scope-manage?tabs=powershell#upload-a-blob-with-an-encryption-scope) for server-side encryption. + - `encryption-key`: Specifies the [encryption key](https://learn.microsoft.com/en-us/azure/storage/blobs/encryption-customer-provided-keys) for server-side encryption, which uses the AES256 encryption algorithm. + +Example: + +```shell +azure://external/backup-20220915?account-name=${account-name}&account-key=${account-key} +``` diff --git a/migrate-large-mysql-shards-to-tidb.md b/migrate-large-mysql-shards-to-tidb.md index b5ff486c3c747..b5f3513e322f6 100644 --- a/migrate-large-mysql-shards-to-tidb.md +++ b/migrate-large-mysql-shards-to-tidb.md @@ -94,7 +94,7 @@ The following table describes parameters in the command above. For more informat | `-p` or `--port` | Specifies the port to be used.| | `-h` or `--host` | Specifies the IP address of the data source. | | `-t` or `--thread` | Specifies the number of threads for the export. Increasing the number of threads improves the concurrency of Dumpling and the export speed, and increases the database's memory consumption. Therefore, it is not recommended to set the number too large. Usually, it's less than 64.| -| `-o` or `--output` | Specifies the export directory of the storage, which supports a local file path or an [external storage URI](/br/backup-and-restore-storages.md#uri-format).| +| `-o` or `--output` | Specifies the export directory of the storage, which supports a local file path or an [external storage URI](/external-storage-uri.md).| | `-r` or `--row` | Specifies the maximum number of rows in a single file. If you use this parameter, Dumpling enables the in-table concurrency to speed up the export and reduce the memory usage.| | `-F` | Specifies the maximum size of a single file. The unit is `MiB`. It is recommended to keep the value to 256 MiB. | | `-B` or `--database` | Specifies databases to be exported. | diff --git a/migrate-large-mysql-to-tidb.md b/migrate-large-mysql-to-tidb.md index cb21154d79649..c7d86294e7723 100644 --- a/migrate-large-mysql-to-tidb.md +++ b/migrate-large-mysql-to-tidb.md @@ -69,7 +69,7 @@ The target TiKV cluster must have enough disk space to store the imported data. |`-P` or `--port` |MySQL port| |`-h` or `--host` |MySQL IP address| |`-t` or `--thread` |The number of threads used for export| - |`-o` or `--output` |The directory that stores the exported file. Supports a local path or an [external storage URI](/br/backup-and-restore-storages.md#uri-format)| + |`-o` or `--output` |The directory that stores the exported file. Supports a local path or an [external storage URI](/external-storage-uri.md)| |`-r` or `--row` |The maximum number of rows in a single file| |`-F` |The maximum size of a single file, in MiB. Recommended value: 256 MiB.| |-`B` or `--database` |Specifies a database to be exported| diff --git a/sql-statements/sql-statement-backup.md b/sql-statements/sql-statement-backup.md index 3a24edb01262f..5b112ca8b51ac 100644 --- a/sql-statements/sql-statement-backup.md +++ b/sql-statements/sql-statement-backup.md @@ -112,17 +112,7 @@ BR supports backing up data to S3 or GCS: BACKUP DATABASE `test` TO 's3://example-bucket-2020/backup-05/?access-key={YOUR_ACCESS_KEY}&secret-access-key={YOUR_SECRET_KEY}'; ``` - - -The URL syntax is further explained in [external storage URI](/br/backup-and-restore-storages.md#uri-format). - - - - - -The URL syntax is further explained in [external storage URI](https://docs.pingcap.com/tidb/stable/backup-and-restore-storages#uri-format). - - +The URL syntax is further explained in [external storage URI](/external-storage-uri.md). When running on cloud environment where credentials should not be distributed, set the `SEND_CREDENTIALS_TO_TIKV` option to `FALSE`: diff --git a/sql-statements/sql-statement-import-into.md b/sql-statements/sql-statement-import-into.md index 72e708d73f9d7..2e2651c3ed79a 100644 --- a/sql-statements/sql-statement-import-into.md +++ b/sql-statements/sql-statement-import-into.md @@ -96,7 +96,7 @@ In the left side of the `SET` expression, you can only reference a column name t It specifies the storage location of the data file, which can be an Amazon S3 or GCS URI path, or a TiDB local file path. -- Amazon S3 or GCS URI path: for URI configuration details, see [External storage](/br/backup-and-restore-storages.md#uri-format). +- Amazon S3 or GCS URI path: for URI configuration details, see [URI Formats of External Storage Services](/external-storage-uri.md). - TiDB local file path: it must be an absolute path, and the file extension must be `.csv`, `.sql`, or `.parquet`. Make sure that the files corresponding to this path are stored on the TiDB node connected by the current user, and the user has the `FILE` privilege. > **Note:** @@ -137,7 +137,7 @@ The supported options are described as follows: | `MAX_WRITE_SPEED=''` | All formats | Controls the write speed to a TiKV node. By default, there is no speed limit. For example, you can specify this option as `1MiB` to limit the write speed to 1 MiB/s. | | `CHECKSUM_TABLE=''` | All formats | Configures whether to perform a checksum check on the target table after the import to validate the import integrity. The supported values include `"required"` (default), `"optional"`, and `"off"`. `"required"` means performing a checksum check after the import. If the checksum check fails, TiDB will return an error and the import will exit. `"optional"` means performing a checksum check after the import. If an error occurs, TiDB will return a warning and ignore the error. `"off"` means not performing a checksum check after the import. | | `DETACHED` | All Formats | Controls whether to execute `IMPORT INTO` asynchronously. When this option is enabled, executing `IMPORT INTO` immediately returns the information of the import job (such as the `Job_ID`), and the job is executed asynchronously in the backend. | -| `CLOUD_STORAGE_URI` | All formats | Specifies the target address where encoded KV data for [global sorting](#global-sorting) is stored. When `CLOUD_STORAGE_URI` is not specified, `IMPORT INTO` determines whether to use global sorting based on the value of the system variable [`tidb_cloud_storage_uri`](/system-variables.md#tidb_cloud_storage_uri-new-in-v740). If this system variable specifies a target storage address, `IMPORT INTO` uses this address for global sorting. When `CLOUD_STORAGE_URI` is specified with a non-empty value, `IMPORT INTO` uses that value as the target storage address. When `CLOUD_STORAGE_URI` is specified with an empty value, local sorting is enforced. Currently, the target storage address only supports S3. For details about the URI configuration, see [External storage](/br/backup-and-restore-storages.md#uri-format). When this feature is used, all TiDB nodes must have read and write access for the target S3 bucket. | +| `CLOUD_STORAGE_URI` | All formats | Specifies the target address where encoded KV data for [global sorting](#global-sorting) is stored. When `CLOUD_STORAGE_URI` is not specified, `IMPORT INTO` determines whether to use global sorting based on the value of the system variable [`tidb_cloud_storage_uri`](/system-variables.md#tidb_cloud_storage_uri-new-in-v740). If this system variable specifies a target storage address, `IMPORT INTO` uses this address for global sorting. When `CLOUD_STORAGE_URI` is specified with a non-empty value, `IMPORT INTO` uses that value as the target storage address. When `CLOUD_STORAGE_URI` is specified with an empty value, local sorting is enforced. Currently, the target storage address only supports S3. For details about the URI configuration, see [URI Formats of External Storage Services](/external-storage-uri.md). When this feature is used, all TiDB nodes must have read and write access for the target S3 bucket. | ## Compressed files @@ -250,7 +250,7 @@ IMPORT INTO t FROM '/path/to/file-*.csv' IMPORT INTO t FROM 'gs://bucket-name/test.csv'; ``` -For details about the URI path configuration for Amazon S3 or GCS, see [External storage](/br/backup-and-restore-storages.md#uri-format). +For details about the URI path configuration for Amazon S3 or GCS, see [URI Formats of External Storage Services](/external-storage-uri.md). ### Calculate column values using SetClause diff --git a/sql-statements/sql-statement-restore.md b/sql-statements/sql-statement-restore.md index 04d24dde2eb80..f15c96860ff6e 100644 --- a/sql-statements/sql-statement-restore.md +++ b/sql-statements/sql-statement-restore.md @@ -103,17 +103,7 @@ BR supports restoring data from S3 or GCS: RESTORE DATABASE * FROM 's3://example-bucket-2020/backup-05/'; ``` - - -The URL syntax is further explained in [external storage URI](/br/backup-and-restore-storages.md#uri-format). - - - - - -The URL syntax is further explained in [external storage URI](https://docs.pingcap.com/tidb/stable/backup-and-restore-storages#uri-format). - - +The URL syntax is further explained in [external storage URI](/external-storage-uri.md). When running on cloud environment where credentials should not be distributed, set the `SEND_CREDENTIALS_TO_TIKV` option to `FALSE`: diff --git a/system-variables.md b/system-variables.md index f75c8a4fea479..9c2cefc99097d 100644 --- a/system-variables.md +++ b/system-variables.md @@ -1580,7 +1580,7 @@ mysql> SELECT job_info FROM mysql.analyze_jobs ORDER BY end_time DESC LIMIT 1; -- This variable is used to specify the cloud storage URI to enable [Global Sort](/tidb-global-sort.md). After enabling the [distributed execution framework](/tidb-distributed-execution-framework.md), you can use the Global Sort feature by configuring the URI and pointing it to an appropriate cloud storage path with the necessary permissions to access the storage. For more details, see [URI format](/br/backup-and-restore-storages.md#uri-format). +- This variable is used to specify the cloud storage URI to enable [Global Sort](/tidb-global-sort.md). After enabling the [distributed execution framework](/tidb-distributed-execution-framework.md), you can use the Global Sort feature by configuring the URI and pointing it to an appropriate cloud storage path with the necessary permissions to access the storage. For more details, see [URI formats](/external-storage-uri.md). - The following statements can use the Global Sort feature. - The [`ADD INDEX`](/sql-statements/sql-statement-add-index.md) statement. - The [`IMPORT INTO`](/sql-statements/sql-statement-import-into.md) statement for import jobs of TiDB Self-Hosted. For TiDB Cloud, the `IMPORT INTO` statement is not applicable. @@ -1588,10 +1588,10 @@ mysql> SELECT job_info FROM mysql.analyze_jobs ORDER BY end_time DESC LIMIT 1; -- This variable is used to specify the cloud storage URI to enable [Global Sort](/tidb-global-sort.md). After enabling the [distributed execution framework](/tidb-distributed-execution-framework.md), you can use the Global Sort feature by configuring the URI and pointing it to an appropriate cloud storage path with the necessary permissions to access the storage. For more details, see [URI format](https://docs.pingcap.com/tidb/stable/backup-and-restore-storages#uri-format). +- This variable is used to specify the cloud storage URI to enable [Global Sort](/tidb-global-sort.md). After enabling the [distributed execution framework](/tidb-distributed-execution-framework.md), you can use the Global Sort feature by configuring the URI and pointing it to an appropriate cloud storage path with the necessary permissions to access the storage. For more details, see [URI formats](/external-storage-uri.md). - The following statements can use the Global Sort feature. - The [`ADD INDEX`](/sql-statements/sql-statement-add-index.md) statement. - - The [`IMPORT INTO`](https://docs.pingcap.com/tidb/stable/sql-statement-import-into) statement for import jobs of TiDB Self-Hosted. For TiDB Cloud, the `IMPORT INTO` statement is not applicable. + - The [`IMPORT INTO`](https://docs.pingcap.com/tidb/dev/sql-statement-import-into) statement for import jobs of TiDB Self-Hosted. For TiDB Cloud, the `IMPORT INTO` statement is not applicable. diff --git a/ticdc/ticdc-sink-to-cloud-storage.md b/ticdc/ticdc-sink-to-cloud-storage.md index 87928270c2ad3..cd8ad57f1afbf 100644 --- a/ticdc/ticdc-sink-to-cloud-storage.md +++ b/ticdc/ticdc-sink-to-cloud-storage.md @@ -79,7 +79,7 @@ The following is an example configuration for Azure Blob Storage: > **Tip:** > -> The URI parameters of Amazon S3, GCS, and Azure Blob Storage in TiCDC are the same as their URI parameters in BR. For details, see [Backup storage URI format](/br/backup-and-restore-storages.md#uri-format-description). +> The URI parameters of Amazon S3, GCS, and Azure Blob Storage in TiCDC are the same as their URI parameters in BR. For details, see [URI Formats of External Storage Services](/external-storage-uri.md). ### Configure sink URI for NFS diff --git a/ticdc/ticdc-sink-to-kafka.md b/ticdc/ticdc-sink-to-kafka.md index 3bf331d72340e..61d6d79c2b6c3 100644 --- a/ticdc/ticdc-sink-to-kafka.md +++ b/ticdc/ticdc-sink-to-kafka.md @@ -389,7 +389,7 @@ When `large-message-handle-option` is set to `"claim-check"`, `claim-check-stora > **Tip** > -> Currently, the external storage services supported by TiCDC are the same as BR. For detailed parameter descriptions, see [Backup storages URI format](/br/backup-and-restore-storages.md#uri-format-description). +> Currently, the external storage services supported by TiCDC are the same as BR. For detailed parameter descriptions, see [URI Formats of External Storage Services](/external-storage-uri.md). TiCDC does not clean up messages on external storage services. Data consumers need to manage external storage services on their own. diff --git a/tidb-lightning/tidb-lightning-command-line-full.md b/tidb-lightning/tidb-lightning-command-line-full.md index 6450a9a9d5aa2..cb6995ffd024d 100644 --- a/tidb-lightning/tidb-lightning-command-line-full.md +++ b/tidb-lightning/tidb-lightning-command-line-full.md @@ -17,7 +17,7 @@ You can configure the following parameters using `tidb-lightning`: | :---- | :---- | :---- | | `--config ` | Read the global configuration from the file. If this parameter is not specified, TiDB Lightning uses the default configuration. | | | `-V` | Print the program version. | | -| `-d ` | Local directory or [external storage URI](/br/backup-and-restore-storages.md#uri-format) of data files. | `mydumper.data-source-dir` | +| `-d ` | Local directory or [external storage URI](/external-storage-uri.md) of data files. | `mydumper.data-source-dir` | | `-L ` | Log level: `debug`, `info`, `warn`, `error`, or `fatal`. `info` by default.| `lightning.level` | | `-f ` | [Table filter rules](/table-filter.md). Can be specified multiple times. | `mydumper.filter` | | `--backend ` | Select an import mode. `local` refers to [physical import mode](/tidb-lightning/tidb-lightning-physical-import-mode.md); `tidb` refers to [logical import mode](/tidb-lightning/tidb-lightning-logical-import-mode.md). | `tikv-importer.backend` | diff --git a/tidb-lightning/tidb-lightning-configuration.md b/tidb-lightning/tidb-lightning-configuration.md index 443c8900df6f2..63be6b21f301f 100644 --- a/tidb-lightning/tidb-lightning-configuration.md +++ b/tidb-lightning/tidb-lightning-configuration.md @@ -456,7 +456,7 @@ log-progress = "5m" |:----|:----|:----| | --config *file* | Reads global configuration from *file*. If not specified, the default configuration would be used. | | | -V | Prints program version | | -| -d *directory* | Directory or [external storage URI](/br/backup-and-restore-storages.md#uri-format) of the data dump to read from | `mydumper.data-source-dir` | +| -d *directory* | Directory or [external storage URI](/external-storage-uri.md) of the data dump to read from | `mydumper.data-source-dir` | | -L *level* | Log level: debug, info, warn, error, fatal (default = info) | `lightning.log-level` | | -f *rule* | [Table filter rules](/table-filter.md) (can be specified multiple times) | `mydumper.filter` | | --backend *[backend](/tidb-lightning/tidb-lightning-overview.md)* | Select an import mode. `local` refers to the physical import mode; `tidb` refers to the logical import mode. | `local` | diff --git a/tidb-lightning/tidb-lightning-data-source.md b/tidb-lightning/tidb-lightning-data-source.md index 229394af10143..e4073ad33fcdc 100644 --- a/tidb-lightning/tidb-lightning-data-source.md +++ b/tidb-lightning/tidb-lightning-data-source.md @@ -334,7 +334,7 @@ type = '$3' ## Import data from Amazon S3 -The following examples show how to import data from Amazon S3 using TiDB Lightning. For more parameter configurations, see [external storage URI](/br/backup-and-restore-storages.md#uri-format). +The following examples show how to import data from Amazon S3 using TiDB Lightning. For more parameter configurations, see [URI Formats of External Storage Services](/external-storage-uri.md). + Use the locally configured permissions to access S3 data: diff --git a/tidb-lightning/tidb-lightning-distributed-import.md b/tidb-lightning/tidb-lightning-distributed-import.md index 5ca0828cbd5b0..20d6219d0ea09 100644 --- a/tidb-lightning/tidb-lightning-distributed-import.md +++ b/tidb-lightning/tidb-lightning-distributed-import.md @@ -111,7 +111,7 @@ If the data source is stored in external storage such as Amazon S3 or GCS, you n -d 's3://my-bucket/sql-backup' ``` -For more parameter descriptions, see [external storage URI](/br/backup-and-restore-storages.md#uri-format). +For more parameter descriptions, see [URI Formats of External Storage Services](/external-storage-uri.md). ### Step 3: Start TiDB Lightning to import data @@ -143,7 +143,7 @@ Wait for all TiDB Lightning instances to finish, then the entire import is compl ## Example 2: Import single tables in parallel -TiDB Lightning also supports parallel import of single tables. For example, import multiple single tables stored in Amazon S3 by different TiDB Lightning instances into the downstream TiDB cluster in parallel. This method can speed up the overall import speed. When remote storages such as Amazon S3 is used, the configuration parameters of TiDB Lightning are the same as those of BR. For more details, see [external storage URI](/br/backup-and-restore-storages.md#uri-format). +TiDB Lightning also supports parallel import of single tables. For example, import multiple single tables stored in Amazon S3 by different TiDB Lightning instances into the downstream TiDB cluster in parallel. This method can speed up the overall import speed. When remote storages such as Amazon S3 is used, the configuration parameters of TiDB Lightning are the same as those of BR. For more details, see [URI Formats of External Storage Services](/external-storage-uri.md). > **Note:** > diff --git a/tidb-lightning/tidb-lightning-overview.md b/tidb-lightning/tidb-lightning-overview.md index 16afb97a59c5f..d90cdd30fe071 100644 --- a/tidb-lightning/tidb-lightning-overview.md +++ b/tidb-lightning/tidb-lightning-overview.md @@ -17,8 +17,8 @@ TiDB Lightning supports the following file formats: TiDB Lightning can read data from the following sources: - Local -- [Amazon S3](/br/backup-and-restore-storages.md#uri-format) -- [Google Cloud Storage](/br/backup-and-restore-storages.md#uri-format) +- [Amazon S3](/external-storage-uri.md#amazon-s3-uri-format) +- [Google Cloud Storage](/external-storage-uri.md#gcs-uri-format) ## TiDB Lightning architecture From 726ceff00529514993b580c7288e5cb690ae44e6 Mon Sep 17 00:00:00 2001 From: xixirangrang Date: Wed, 25 Oct 2023 20:26:47 +0800 Subject: [PATCH 05/15] Apply suggestions from code review --- sql-statements/sql-statement-restore.md | 2 +- system-variables.md | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/sql-statements/sql-statement-restore.md b/sql-statements/sql-statement-restore.md index f15c96860ff6e..6b94416591939 100644 --- a/sql-statements/sql-statement-restore.md +++ b/sql-statements/sql-statement-restore.md @@ -103,7 +103,7 @@ BR supports restoring data from S3 or GCS: RESTORE DATABASE * FROM 's3://example-bucket-2020/backup-05/'; ``` -The URL syntax is further explained in [external storage URI](/external-storage-uri.md). +The URI syntax is further explained in [URI Formats of External Storage Services](/external-storage-uri.md). When running on cloud environment where credentials should not be distributed, set the `SEND_CREDENTIALS_TO_TIKV` option to `FALSE`: diff --git a/system-variables.md b/system-variables.md index 9c2cefc99097d..11c4bb8ab1ee2 100644 --- a/system-variables.md +++ b/system-variables.md @@ -1580,7 +1580,7 @@ mysql> SELECT job_info FROM mysql.analyze_jobs ORDER BY end_time DESC LIMIT 1; -- This variable is used to specify the cloud storage URI to enable [Global Sort](/tidb-global-sort.md). After enabling the [distributed execution framework](/tidb-distributed-execution-framework.md), you can use the Global Sort feature by configuring the URI and pointing it to an appropriate cloud storage path with the necessary permissions to access the storage. For more details, see [URI formats](/external-storage-uri.md). +- This variable is used to specify the cloud storage URI to enable [Global Sort](/tidb-global-sort.md). After enabling the [distributed execution framework](/tidb-distributed-execution-framework.md), you can use the Global Sort feature by configuring the URI and pointing it to an appropriate cloud storage path with the necessary permissions to access the storage. For more details, see [URI Formats of External Storage Services](/external-storage-uri.md). - The following statements can use the Global Sort feature. - The [`ADD INDEX`](/sql-statements/sql-statement-add-index.md) statement. - The [`IMPORT INTO`](/sql-statements/sql-statement-import-into.md) statement for import jobs of TiDB Self-Hosted. For TiDB Cloud, the `IMPORT INTO` statement is not applicable. @@ -1588,7 +1588,7 @@ mysql> SELECT job_info FROM mysql.analyze_jobs ORDER BY end_time DESC LIMIT 1; -- This variable is used to specify the cloud storage URI to enable [Global Sort](/tidb-global-sort.md). After enabling the [distributed execution framework](/tidb-distributed-execution-framework.md), you can use the Global Sort feature by configuring the URI and pointing it to an appropriate cloud storage path with the necessary permissions to access the storage. For more details, see [URI formats](/external-storage-uri.md). +- This variable is used to specify the cloud storage URI to enable [Global Sort](/tidb-global-sort.md). After enabling the [distributed execution framework](/tidb-distributed-execution-framework.md), you can use the Global Sort feature by configuring the URI and pointing it to an appropriate cloud storage path with the necessary permissions to access the storage. For more details, see [URI Formats of External Storage Services](/external-storage-uri.md). - The following statements can use the Global Sort feature. - The [`ADD INDEX`](/sql-statements/sql-statement-add-index.md) statement. - The [`IMPORT INTO`](https://docs.pingcap.com/tidb/dev/sql-statement-import-into) statement for import jobs of TiDB Self-Hosted. For TiDB Cloud, the `IMPORT INTO` statement is not applicable. From a7d2f2fbeeee3bc2502a1c5d36aa1c88ca45aae8 Mon Sep 17 00:00:00 2001 From: xixirangrang Date: Thu, 26 Oct 2023 11:39:21 +0800 Subject: [PATCH 06/15] add azure blob storage --- sql-statements/sql-statement-import-into.md | 20 +++++++++++++------- 1 file changed, 13 insertions(+), 7 deletions(-) diff --git a/sql-statements/sql-statement-import-into.md b/sql-statements/sql-statement-import-into.md index 2e2651c3ed79a..05574fa730c22 100644 --- a/sql-statements/sql-statement-import-into.md +++ b/sql-statements/sql-statement-import-into.md @@ -19,7 +19,7 @@ This TiDB statement is not applicable to TiDB Cloud. `IMPORT INTO` supports importing data from files stored in Amazon S3, GCS, and the TiDB local storage. -- For data files stored in Amazon S3 or GCS, `IMPORT INTO` supports running in the [TiDB backend task distributed execution framework](/tidb-distributed-execution-framework.md). +- For data files stored in Amazon S3, GCS, or Azure Blob Storage, `IMPORT INTO` supports running in the [TiDB backend task distributed execution framework](/tidb-distributed-execution-framework.md). - When this framework is enabled ([tidb_enable_dist_task](/system-variables.md#tidb_enable_dist_task-new-in-v710) is `ON`), `IMPORT INTO` splits a data import job into multiple sub-jobs and distributes these sub-jobs to different TiDB nodes for execution to improve the import efficiency. - When this framework is disabled, `IMPORT INTO` only supports running on the TiDB node where the current user is connected. @@ -94,9 +94,9 @@ In the left side of the `SET` expression, you can only reference a column name t ### fileLocation -It specifies the storage location of the data file, which can be an Amazon S3 or GCS URI path, or a TiDB local file path. +It specifies the storage location of the data file, which can be an Amazon S3, GCS, or Azure Blob Storage URI path, or a TiDB local file path. -- Amazon S3 or GCS URI path: for URI configuration details, see [URI Formats of External Storage Services](/external-storage-uri.md). +- Amazon S3, GCS, or Azure Blob Storage URI path: for URI configuration details, see [URI Formats of External Storage Services](/external-storage-uri.md). - TiDB local file path: it must be an absolute path, and the file extension must be `.csv`, `.sql`, or `.parquet`. Make sure that the files corresponding to this path are stored on the TiDB node connected by the current user, and the user has the `FILE` privilege. > **Note:** @@ -137,7 +137,7 @@ The supported options are described as follows: | `MAX_WRITE_SPEED=''` | All formats | Controls the write speed to a TiKV node. By default, there is no speed limit. For example, you can specify this option as `1MiB` to limit the write speed to 1 MiB/s. | | `CHECKSUM_TABLE=''` | All formats | Configures whether to perform a checksum check on the target table after the import to validate the import integrity. The supported values include `"required"` (default), `"optional"`, and `"off"`. `"required"` means performing a checksum check after the import. If the checksum check fails, TiDB will return an error and the import will exit. `"optional"` means performing a checksum check after the import. If an error occurs, TiDB will return a warning and ignore the error. `"off"` means not performing a checksum check after the import. | | `DETACHED` | All Formats | Controls whether to execute `IMPORT INTO` asynchronously. When this option is enabled, executing `IMPORT INTO` immediately returns the information of the import job (such as the `Job_ID`), and the job is executed asynchronously in the backend. | -| `CLOUD_STORAGE_URI` | All formats | Specifies the target address where encoded KV data for [global sorting](#global-sorting) is stored. When `CLOUD_STORAGE_URI` is not specified, `IMPORT INTO` determines whether to use global sorting based on the value of the system variable [`tidb_cloud_storage_uri`](/system-variables.md#tidb_cloud_storage_uri-new-in-v740). If this system variable specifies a target storage address, `IMPORT INTO` uses this address for global sorting. When `CLOUD_STORAGE_URI` is specified with a non-empty value, `IMPORT INTO` uses that value as the target storage address. When `CLOUD_STORAGE_URI` is specified with an empty value, local sorting is enforced. Currently, the target storage address only supports S3. For details about the URI configuration, see [URI Formats of External Storage Services](/external-storage-uri.md). When this feature is used, all TiDB nodes must have read and write access for the target S3 bucket. | +| `CLOUD_STORAGE_URI` | All formats | Specifies the target address where encoded KV data for [global sorting](#global-sorting) is stored. When `CLOUD_STORAGE_URI` is not specified, `IMPORT INTO` determines whether to use global sorting based on the value of the system variable [`tidb_cloud_storage_uri`](/system-variables.md#tidb_cloud_storage_uri-new-in-v740). If this system variable specifies a target storage address, `IMPORT INTO` uses this address for global sorting. When `CLOUD_STORAGE_URI` is specified with a non-empty value, `IMPORT INTO` uses that value as the target storage address. When `CLOUD_STORAGE_URI` is specified with an empty value, local sorting is enforced. Currently, the target storage address only supports S3. For details about the URI configuration, see [Amazon S3 URI format](/external-storage-uri.md#amazon-s3-uri--format). When this feature is used, all TiDB nodes must have read and write access for the target S3 bucket. | ## Compressed files @@ -236,7 +236,7 @@ Assume that there are three files named `file-01.csv`, `file-02.csv`, and `file- IMPORT INTO t FROM '/path/to/file-*.csv' ``` -### Import data files from Amazon S3 or GCS +### Import data files from Amazon S3, GCS, or Azure Blob Storage - Import data files from Amazon S3: @@ -247,10 +247,16 @@ IMPORT INTO t FROM '/path/to/file-*.csv' - Import data files from GCS: ```sql - IMPORT INTO t FROM 'gs://bucket-name/test.csv'; + IMPORT INTO t FROM 'gs://bucket-name/test.csv?credentials-file=${credentials-file-path}'; ``` -For details about the URI path configuration for Amazon S3 or GCS, see [URI Formats of External Storage Services](/external-storage-uri.md). +- Import data files from Azure Blob Storage: + + ```sql + IMPORT INTO t FROM ''azure://bucket-name/test.csv?credentials-file=${credentials-file-path}';'; + ``` + +For details about the URI path configuration for Amazon S3, GCS, or Azure Blob Storage, see [URI Formats of External Storage Services](/external-storage-uri.md). ### Calculate column values using SetClause From 6609648740059cce62aefb0692f956fe3f28d510 Mon Sep 17 00:00:00 2001 From: xixirangrang Date: Fri, 27 Oct 2023 09:08:05 +0800 Subject: [PATCH 07/15] Apply suggestions from code review --- TOC-tidb-cloud.md | 2 +- TOC.md | 2 +- dumpling-overview.md | 2 +- external-storage-uri.md | 2 +- sql-statements/sql-statement-import-into.md | 6 +++--- system-variables.md | 2 +- ticdc/ticdc-sink-to-cloud-storage.md | 2 +- ticdc/ticdc-sink-to-kafka.md | 2 +- tidb-lightning/tidb-lightning-overview.md | 1 + 9 files changed, 11 insertions(+), 10 deletions(-) diff --git a/TOC-tidb-cloud.md b/TOC-tidb-cloud.md index 6d556da03b292..e49b7ecfaefb7 100644 --- a/TOC-tidb-cloud.md +++ b/TOC-tidb-cloud.md @@ -620,7 +620,7 @@ - [TiDB Global Sort](/tidb-global-sort.md) - [DDL Execution Principles and Best Practices](/ddl-introduction.md) - [Troubleshoot Inconsistency Between Data and Indexes](/troubleshoot-data-inconsistency-errors.md) - - [URI Formats](/external-storage-uri.md) + - [URI Formats of External Storage Services](/external-storage-uri.md) - [Support](/tidb-cloud/tidb-cloud-support.md) - [Glossary](/tidb-cloud/tidb-cloud-glossary.md) - FAQs diff --git a/TOC.md b/TOC.md index 247073cab440e..1bcd36f33dbc5 100644 --- a/TOC.md +++ b/TOC.md @@ -995,7 +995,7 @@ - [Error Codes](/error-codes.md) - [Table Filter](/table-filter.md) - [Schedule Replicas by Topology Labels](/schedule-replicas-by-topology-labels.md) - - [URI Formats](/external-storage-uri.md) + - [URI Formats of External Storage Services](/external-storage-uri.md) - Internal Components - [TiDB Backend Task Distributed Execution Framework](/tidb-distributed-execution-framework.md) - [TiDB Global Sort](/tidb-global-sort.md) diff --git a/dumpling-overview.md b/dumpling-overview.md index 4d3ae38f1ee7d..f1b2dbf087788 100644 --- a/dumpling-overview.md +++ b/dumpling-overview.md @@ -231,7 +231,7 @@ export AWS_ACCESS_KEY_ID=${AccessKey} export AWS_SECRET_ACCESS_KEY=${SecretKey} ``` -Dumpling also supports reading credential files from `~/.aws/credentials`. Parameters for exporting data to Amazon S3 using Dumpling are the same as the parameters used in BR. For more parameter descriptions, see [external storage URI](/external-storage-uri.md). +Dumpling also supports reading credential files from `~/.aws/credentials`. For more information about URI parameter descriptions, see [external storage URI](/external-storage-uri.md). ```shell ./dumpling -u root -P 4000 -h 127.0.0.1 -r 200000 -o "s3://${Bucket}/${Folder}" diff --git a/external-storage-uri.md b/external-storage-uri.md index 949b1abf1c4e2..45165100cb9d3 100644 --- a/external-storage-uri.md +++ b/external-storage-uri.md @@ -19,7 +19,7 @@ This document describes the URI formats of the storage services. The basic forma - `access-key`: Specifies the access key. - `secret-access-key`: Specifies the secret access key. - - `session-token`: Specifies the temporary session token. + - `session-token`: Specifies the temporary session token. BR does not support this parameter yet. - `use-accelerate-endpoint`: Specifies whether to use the accelerate endpoint on Amazon S3 (defaults to `false`). - `endpoint`: Specifies the URL of custom endpoint for S3-compatible services (for example, ``). - `force-path-style`: Use path style access rather than virtual hosted style access (defaults to `true`). diff --git a/sql-statements/sql-statement-import-into.md b/sql-statements/sql-statement-import-into.md index 05574fa730c22..4e78ec03a7e48 100644 --- a/sql-statements/sql-statement-import-into.md +++ b/sql-statements/sql-statement-import-into.md @@ -137,7 +137,7 @@ The supported options are described as follows: | `MAX_WRITE_SPEED=''` | All formats | Controls the write speed to a TiKV node. By default, there is no speed limit. For example, you can specify this option as `1MiB` to limit the write speed to 1 MiB/s. | | `CHECKSUM_TABLE=''` | All formats | Configures whether to perform a checksum check on the target table after the import to validate the import integrity. The supported values include `"required"` (default), `"optional"`, and `"off"`. `"required"` means performing a checksum check after the import. If the checksum check fails, TiDB will return an error and the import will exit. `"optional"` means performing a checksum check after the import. If an error occurs, TiDB will return a warning and ignore the error. `"off"` means not performing a checksum check after the import. | | `DETACHED` | All Formats | Controls whether to execute `IMPORT INTO` asynchronously. When this option is enabled, executing `IMPORT INTO` immediately returns the information of the import job (such as the `Job_ID`), and the job is executed asynchronously in the backend. | -| `CLOUD_STORAGE_URI` | All formats | Specifies the target address where encoded KV data for [global sorting](#global-sorting) is stored. When `CLOUD_STORAGE_URI` is not specified, `IMPORT INTO` determines whether to use global sorting based on the value of the system variable [`tidb_cloud_storage_uri`](/system-variables.md#tidb_cloud_storage_uri-new-in-v740). If this system variable specifies a target storage address, `IMPORT INTO` uses this address for global sorting. When `CLOUD_STORAGE_URI` is specified with a non-empty value, `IMPORT INTO` uses that value as the target storage address. When `CLOUD_STORAGE_URI` is specified with an empty value, local sorting is enforced. Currently, the target storage address only supports S3. For details about the URI configuration, see [Amazon S3 URI format](/external-storage-uri.md#amazon-s3-uri--format). When this feature is used, all TiDB nodes must have read and write access for the target S3 bucket. | +| `CLOUD_STORAGE_URI` | All formats | Specifies the target address where encoded KV data for [global sorting](#global-sorting) is stored. When `CLOUD_STORAGE_URI` is not specified, `IMPORT INTO` determines whether to use global sorting based on the value of the system variable [`tidb_cloud_storage_uri`](/system-variables.md#tidb_cloud_storage_uri-new-in-v740). If this system variable specifies a target storage address, `IMPORT INTO` uses this address for global sorting. When `CLOUD_STORAGE_URI` is specified with a non-empty value, `IMPORT INTO` uses that value as the target storage address. When `CLOUD_STORAGE_URI` is specified with an empty value, local sorting is enforced. Currently, the target storage address only supports S3. For details about the URI configuration, see [Amazon S3 URI format](/external-storage-uri.md#amazon-s3-uri-format). When this feature is used, all TiDB nodes must have read and write access for the target S3 bucket. | ## Compressed files @@ -247,13 +247,13 @@ IMPORT INTO t FROM '/path/to/file-*.csv' - Import data files from GCS: ```sql - IMPORT INTO t FROM 'gs://bucket-name/test.csv?credentials-file=${credentials-file-path}'; + IMPORT INTO t FROM 'gs://import/test.csv?credentials-file=${credentials-file-path}'; ``` - Import data files from Azure Blob Storage: ```sql - IMPORT INTO t FROM ''azure://bucket-name/test.csv?credentials-file=${credentials-file-path}';'; + IMPORT INTO t FROM 'azure://import/test.csv?credentials-file=${credentials-file-path}'; ``` For details about the URI path configuration for Amazon S3, GCS, or Azure Blob Storage, see [URI Formats of External Storage Services](/external-storage-uri.md). diff --git a/system-variables.md b/system-variables.md index 11c4bb8ab1ee2..292f4822b199b 100644 --- a/system-variables.md +++ b/system-variables.md @@ -1580,7 +1580,7 @@ mysql> SELECT job_info FROM mysql.analyze_jobs ORDER BY end_time DESC LIMIT 1; -- This variable is used to specify the cloud storage URI to enable [Global Sort](/tidb-global-sort.md). After enabling the [distributed execution framework](/tidb-distributed-execution-framework.md), you can use the Global Sort feature by configuring the URI and pointing it to an appropriate cloud storage path with the necessary permissions to access the storage. For more details, see [URI Formats of External Storage Services](/external-storage-uri.md). +- This variable is used to specify the Amazon S3 cloud storage URI to enable [Global Sort](/tidb-global-sort.md). After enabling the [distributed execution framework](/tidb-distributed-execution-framework.md), you can use the Global Sort feature by configuring the URI and pointing it to an appropriate cloud storage path with the necessary permissions to access the storage. For more details, see [Amazon S3 URI format](/external-storage-uri.md#amazon-s3-uri-format). - The following statements can use the Global Sort feature. - The [`ADD INDEX`](/sql-statements/sql-statement-add-index.md) statement. - The [`IMPORT INTO`](/sql-statements/sql-statement-import-into.md) statement for import jobs of TiDB Self-Hosted. For TiDB Cloud, the `IMPORT INTO` statement is not applicable. diff --git a/ticdc/ticdc-sink-to-cloud-storage.md b/ticdc/ticdc-sink-to-cloud-storage.md index cd8ad57f1afbf..3d5fa0a831d61 100644 --- a/ticdc/ticdc-sink-to-cloud-storage.md +++ b/ticdc/ticdc-sink-to-cloud-storage.md @@ -79,7 +79,7 @@ The following is an example configuration for Azure Blob Storage: > **Tip:** > -> The URI parameters of Amazon S3, GCS, and Azure Blob Storage in TiCDC are the same as their URI parameters in BR. For details, see [URI Formats of External Storage Services](/external-storage-uri.md). +> For more information about the URI parameters of Amazon S3, GCS, and Azure Blob Storage in TiCDC, see [URI Formats of External Storage Services](/external-storage-uri.md). ### Configure sink URI for NFS diff --git a/ticdc/ticdc-sink-to-kafka.md b/ticdc/ticdc-sink-to-kafka.md index 61d6d79c2b6c3..ff3331756ee66 100644 --- a/ticdc/ticdc-sink-to-kafka.md +++ b/ticdc/ticdc-sink-to-kafka.md @@ -389,7 +389,7 @@ When `large-message-handle-option` is set to `"claim-check"`, `claim-check-stora > **Tip** > -> Currently, the external storage services supported by TiCDC are the same as BR. For detailed parameter descriptions, see [URI Formats of External Storage Services](/external-storage-uri.md). +> For more information about the URI parameters of Amazon S3, GCS, and Azure Blob Storage in TiCDC, see [URI Formats of External Storage Services](/external-storage-uri.md). TiCDC does not clean up messages on external storage services. Data consumers need to manage external storage services on their own. diff --git a/tidb-lightning/tidb-lightning-overview.md b/tidb-lightning/tidb-lightning-overview.md index d90cdd30fe071..0e935697d43e5 100644 --- a/tidb-lightning/tidb-lightning-overview.md +++ b/tidb-lightning/tidb-lightning-overview.md @@ -19,6 +19,7 @@ TiDB Lightning can read data from the following sources: - Local - [Amazon S3](/external-storage-uri.md#amazon-s3-uri-format) - [Google Cloud Storage](/external-storage-uri.md#gcs-uri-format) +- [Azure Blob Storage](/external-storage-uri.md#azure-blob-storage-uri-format) ## TiDB Lightning architecture From 3b8c2f75a20b0712e682e2c407f703955862898c Mon Sep 17 00:00:00 2001 From: xixirangrang <35301108+hfxsd@users.noreply.github.com> Date: Fri, 27 Oct 2023 10:21:15 +0800 Subject: [PATCH 08/15] Update pd-scheduling-best-practices.md --- best-practices/pd-scheduling-best-practices.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/best-practices/pd-scheduling-best-practices.md b/best-practices/pd-scheduling-best-practices.md index 8f9253c8e1543..d8add90c1688d 100644 --- a/best-practices/pd-scheduling-best-practices.md +++ b/best-practices/pd-scheduling-best-practices.md @@ -229,7 +229,7 @@ This scenario requires examining the generation and execution of operators throu If operators are successfully generated but the scheduling process is slow, possible reasons are: -- The scheduling speed is limited by default. You can adjust `leader-schedule-limit` or `replica-schedule-limit` to larger value.s Similarly, you can consider loosening the limits on `max-pending-peer-count` and `max-snapshot-count`. +- The scheduling speed is limited by default. You can adjust `leader-schedule-limit` or `replica-schedule-limit` to larger values. Similarly, you can consider loosening the limits on `max-pending-peer-count` and `max-snapshot-count`. - Other scheduling tasks are running concurrently and racing for resources in the system. You can refer to the solution in [Leaders/regions are not evenly distributed](#leadersregions-are-not-evenly-distributed). - When you take a single node offline, a number of region leaders to be processed (around 1/3 under the configuration of 3 replicas) are distributed on the node to remove. Therefore, the speed is limited by the speed at which snapshots are generated by this single node. You can speed it up by manually adding an `evict-leader-scheduler` to migrate leaders. From 0f990fe16d77e0b845b3ebff0f7a0758b7487eab Mon Sep 17 00:00:00 2001 From: Lilian Lee Date: Fri, 27 Oct 2023 11:54:39 +0800 Subject: [PATCH 09/15] *: refine wording to make it consistent --- dumpling-overview.md | 2 +- error-codes.md | 2 +- external-storage-uri.md | 6 ++++-- sql-statements/sql-statement-backup.md | 2 +- 4 files changed, 7 insertions(+), 5 deletions(-) diff --git a/dumpling-overview.md b/dumpling-overview.md index f1b2dbf087788..6697372356c89 100644 --- a/dumpling-overview.md +++ b/dumpling-overview.md @@ -231,7 +231,7 @@ export AWS_ACCESS_KEY_ID=${AccessKey} export AWS_SECRET_ACCESS_KEY=${SecretKey} ``` -Dumpling also supports reading credential files from `~/.aws/credentials`. For more information about URI parameter descriptions, see [external storage URI](/external-storage-uri.md). +Dumpling also supports reading credential files from `~/.aws/credentials`. For more information about URI parameter descriptions, see [URI Formats of External Storage Services](/external-storage-uri.md). ```shell ./dumpling -u root -P 4000 -h 127.0.0.1 -r 200000 -o "s3://${Bucket}/${Folder}" diff --git a/error-codes.md b/error-codes.md index a16516d54d09c..735156891f207 100644 --- a/error-codes.md +++ b/error-codes.md @@ -372,7 +372,7 @@ TiDB is compatible with the error codes in MySQL, and in most cases returns the * Error Number: 8158 - The provided path is invalid. Refer to the specific error message for actions. For Amazon S3 or GCS path settings, see [External storage](/external-storage-uri.md). + The provided path is invalid. Refer to the specific error message for actions. For Amazon S3 or GCS path settings, see [URI Formats of External Storage Services](/external-storage-uri.md). * Error Number: 8159 diff --git a/external-storage-uri.md b/external-storage-uri.md index 45165100cb9d3..e8a5a5ce81b29 100644 --- a/external-storage-uri.md +++ b/external-storage-uri.md @@ -1,11 +1,13 @@ --- title: URI Formats of External Storage Services -summary: Describes the storage URI formats of external storage services, including Amazon S3, GCS, and Azure Blob Storage. +summary: Learn about the storage URI formats of external storage services, including Amazon S3, GCS, and Azure Blob Storage. --- ## URI Formats of External Storage Services -This document describes the URI formats of the storage services. The basic format of the URI is as follows: +This document describes the URI formats of external storage services, including Amazon S3, GCS, and Azure Blob Storage. + +The basic format of the URI is as follows: ```shell [scheme]://[host]/[path]?[parameters] diff --git a/sql-statements/sql-statement-backup.md b/sql-statements/sql-statement-backup.md index 5b112ca8b51ac..098b98d2fd716 100644 --- a/sql-statements/sql-statement-backup.md +++ b/sql-statements/sql-statement-backup.md @@ -112,7 +112,7 @@ BR supports backing up data to S3 or GCS: BACKUP DATABASE `test` TO 's3://example-bucket-2020/backup-05/?access-key={YOUR_ACCESS_KEY}&secret-access-key={YOUR_SECRET_KEY}'; ``` -The URL syntax is further explained in [external storage URI](/external-storage-uri.md). +The URI syntax is further explained in [URI Formats of External Storage Services](/external-storage-uri.md). When running on cloud environment where credentials should not be distributed, set the `SEND_CREDENTIALS_TO_TIKV` option to `FALSE`: From e9657d4493c2a46277a05de7e2e5123280300fb3 Mon Sep 17 00:00:00 2001 From: xixirangrang Date: Fri, 27 Oct 2023 12:10:30 +0800 Subject: [PATCH 10/15] Update TOC-tidb-cloud.md --- TOC-tidb-cloud.md | 1 - 1 file changed, 1 deletion(-) diff --git a/TOC-tidb-cloud.md b/TOC-tidb-cloud.md index e49b7ecfaefb7..17c96b8622416 100644 --- a/TOC-tidb-cloud.md +++ b/TOC-tidb-cloud.md @@ -620,7 +620,6 @@ - [TiDB Global Sort](/tidb-global-sort.md) - [DDL Execution Principles and Best Practices](/ddl-introduction.md) - [Troubleshoot Inconsistency Between Data and Indexes](/troubleshoot-data-inconsistency-errors.md) - - [URI Formats of External Storage Services](/external-storage-uri.md) - [Support](/tidb-cloud/tidb-cloud-support.md) - [Glossary](/tidb-cloud/tidb-cloud-glossary.md) - FAQs From c57277c04bb30d8db7128c83164559233c396dea Mon Sep 17 00:00:00 2001 From: xixirangrang Date: Fri, 27 Oct 2023 12:52:24 +0800 Subject: [PATCH 11/15] Apply suggestions from code review --- sql-statements/sql-statement-backup.md | 12 +++++++++++- sql-statements/sql-statement-restore.md | 12 +++++++++++- 2 files changed, 22 insertions(+), 2 deletions(-) diff --git a/sql-statements/sql-statement-backup.md b/sql-statements/sql-statement-backup.md index 098b98d2fd716..9bbfb4d0c0544 100644 --- a/sql-statements/sql-statement-backup.md +++ b/sql-statements/sql-statement-backup.md @@ -112,7 +112,17 @@ BR supports backing up data to S3 or GCS: BACKUP DATABASE `test` TO 's3://example-bucket-2020/backup-05/?access-key={YOUR_ACCESS_KEY}&secret-access-key={YOUR_SECRET_KEY}'; ``` -The URI syntax is further explained in [URI Formats of External Storage Services](/external-storage-uri.md). + + +The URL syntax is further explained in [external storage URI](/br/backup-and-restore-storages.md#uri-format). + + + + + +The URL syntax is further explained in [external storage URI](https://docs.pingcap.com/tidb/stable/external-storage-uri). + + When running on cloud environment where credentials should not be distributed, set the `SEND_CREDENTIALS_TO_TIKV` option to `FALSE`: diff --git a/sql-statements/sql-statement-restore.md b/sql-statements/sql-statement-restore.md index 6b94416591939..6ca8ecf919319 100644 --- a/sql-statements/sql-statement-restore.md +++ b/sql-statements/sql-statement-restore.md @@ -103,7 +103,17 @@ BR supports restoring data from S3 or GCS: RESTORE DATABASE * FROM 's3://example-bucket-2020/backup-05/'; ``` -The URI syntax is further explained in [URI Formats of External Storage Services](/external-storage-uri.md). + + +The URL syntax is further explained in [external storage URI](/br/backup-and-restore-storages.md#uri-format). + + + + + +The URL syntax is further explained in [external storage URI](https://docs.pingcap.com/tidb/stable/external-storage-uri). + + When running on cloud environment where credentials should not be distributed, set the `SEND_CREDENTIALS_TO_TIKV` option to `FALSE`: From bda7fc42b79c572b3dedc8c7deafbac989364257 Mon Sep 17 00:00:00 2001 From: xixirangrang Date: Fri, 27 Oct 2023 12:55:02 +0800 Subject: [PATCH 12/15] Update system-variables.md Co-authored-by: Grace Cai --- system-variables.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/system-variables.md b/system-variables.md index 292f4822b199b..dc0dd605dcefa 100644 --- a/system-variables.md +++ b/system-variables.md @@ -1588,7 +1588,7 @@ mysql> SELECT job_info FROM mysql.analyze_jobs ORDER BY end_time DESC LIMIT 1; -- This variable is used to specify the cloud storage URI to enable [Global Sort](/tidb-global-sort.md). After enabling the [distributed execution framework](/tidb-distributed-execution-framework.md), you can use the Global Sort feature by configuring the URI and pointing it to an appropriate cloud storage path with the necessary permissions to access the storage. For more details, see [URI Formats of External Storage Services](/external-storage-uri.md). +- This variable is used to specify the cloud storage URI to enable [Global Sort](/tidb-global-sort.md). After enabling the [distributed execution framework](/tidb-distributed-execution-framework.md), you can use the Global Sort feature by configuring the URI and pointing it to an appropriate cloud storage path with the necessary permissions to access the storage. For more details, see [URI Formats of External Storage Services](https://docs.pingcap.com/tidb/stable/external-storage-uri). - The following statements can use the Global Sort feature. - The [`ADD INDEX`](/sql-statements/sql-statement-add-index.md) statement. - The [`IMPORT INTO`](https://docs.pingcap.com/tidb/dev/sql-statement-import-into) statement for import jobs of TiDB Self-Hosted. For TiDB Cloud, the `IMPORT INTO` statement is not applicable. From 9adb59c0cfa77fb351588db9e6da10fa3c132351 Mon Sep 17 00:00:00 2001 From: xixirangrang <35301108+hfxsd@users.noreply.github.com> Date: Fri, 27 Oct 2023 12:59:33 +0800 Subject: [PATCH 13/15] fix links in tidb --- sql-statements/sql-statement-backup.md | 2 +- sql-statements/sql-statement-restore.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/sql-statements/sql-statement-backup.md b/sql-statements/sql-statement-backup.md index 9bbfb4d0c0544..030f4689c16b5 100644 --- a/sql-statements/sql-statement-backup.md +++ b/sql-statements/sql-statement-backup.md @@ -114,7 +114,7 @@ BACKUP DATABASE `test` TO 's3://example-bucket-2020/backup-05/?access-key={YOUR_ -The URL syntax is further explained in [external storage URI](/br/backup-and-restore-storages.md#uri-format). +The URL syntax is further explained in [URI Formats of External Storage Services](/external-storage-uri.md). diff --git a/sql-statements/sql-statement-restore.md b/sql-statements/sql-statement-restore.md index 6ca8ecf919319..1b817a911c019 100644 --- a/sql-statements/sql-statement-restore.md +++ b/sql-statements/sql-statement-restore.md @@ -105,7 +105,7 @@ RESTORE DATABASE * FROM 's3://example-bucket-2020/backup-05/'; -The URL syntax is further explained in [external storage URI](/br/backup-and-restore-storages.md#uri-format). +The URL syntax is further explained in [URI Formats of External Storage Services](/external-storage-uri.md). From 3523006b613f3c102ddb237febd037879d5094da Mon Sep 17 00:00:00 2001 From: xixirangrang <35301108+hfxsd@users.noreply.github.com> Date: Fri, 27 Oct 2023 16:50:24 +0800 Subject: [PATCH 14/15] add two examples for each storage service --- external-storage-uri.md | 30 ++++++++++++++++++++++++------ 1 file changed, 24 insertions(+), 6 deletions(-) diff --git a/external-storage-uri.md b/external-storage-uri.md index e8a5a5ce81b29..2fea54d0b2113 100644 --- a/external-storage-uri.md +++ b/external-storage-uri.md @@ -32,10 +32,16 @@ The basic format of the URI is as follows: - `role-arn`: When you need to access Amazon S3 data from a third party using a specified [IAM role](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles.html), you can specify the corresponding [Amazon Resource Name (ARN)](https://docs.aws.amazon.com/general/latest/gr/aws-arns-and-namespaces.html) of the IAM role with the `role-arn` URL query parameter, such as `arn:aws:iam::888888888888:role/my-role`. For more information about using an IAM role to access Amazon S3 data from a third party, see [AWS documentation](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_common-scenarios_third-party.html). - `external-id`: When you access Amazon S3 data from a third party, you might need to specify a correct [external ID](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-user_externalid.html) to assume [the IAM role](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles.html). In this case, you can use this `external-id` URL query parameter to specify the external ID and make sure that you can assume the IAM role. An external ID is an arbitrary string provided by the third party together with the IAM role ARN to access the Amazon S3 data. Providing an external ID is optional when assuming an IAM role, which means if the third party does not require an external ID for the IAM role, you can assume the IAM role and access the corresponding Amazon S3 data without providing this parameter. -Example: +The following is an example of an Amazon S3 URI for TiDB Lightning and BR. In this example, you need to specify a specific file path `testfolder`. ```shell -s3://external/backup-20220915?access-key=${access-key}&secret-access-key=${secret-access-key} +s3://external/testfolder?access-key=${access-key}&secret-access-key=${secret-access-key} +``` + +The following is an example of an Amazon S3 URI for [IMPORT INTO](/sql-statements/sql-statement-import-into.md). In this example, you need to specify a specific filename `test.csv`. + +```shell +s3://external/test.csv?access-key=${access-key}&secret-access-key=${secret-access-key}" ``` ## GCS URI format @@ -48,10 +54,16 @@ s3://external/backup-20220915?access-key=${access-key}&secret-access-key=${secre - `storage-class`: Specifies the storage class of the uploaded objects (for example, `STANDARD` or `COLDLINE`) - `predefined-acl`: Specifies the predefined ACL of the uploaded objects (for example, `private` or `project-private`) -Example: +The following is an example of a GCS URI for TiDB Lightning and BR. In this example, you need to specify a specific file path `testfolder`. ```shell -gcs://external/backup-20220915?credentials-file=${credentials-file-path} +gcs://external/testfolder?credentials-file=${credentials-file-path} +``` + +The following is an example of a GCS URI for [IMPORT INTO](/sql-statements/sql-statement-import-into.md). In this example, you need to specify a specific filename `test.csv`. + +```shell +gcs://external/test.csv?credentials-file=${credentials-file-path} ``` ## Azure Blob Storage URI format @@ -67,8 +79,14 @@ gcs://external/backup-20220915?credentials-file=${credentials-file-path} - `encryption-scope`: Specifies the [encryption scope](https://learn.microsoft.com/en-us/azure/storage/blobs/encryption-scope-manage?tabs=powershell#upload-a-blob-with-an-encryption-scope) for server-side encryption. - `encryption-key`: Specifies the [encryption key](https://learn.microsoft.com/en-us/azure/storage/blobs/encryption-customer-provided-keys) for server-side encryption, which uses the AES256 encryption algorithm. -Example: +The following is an example of an Azure Blob Storage URI for TiDB Lightning and BR. In this example, you need to specify a specific file path `testfolder`. ```shell -azure://external/backup-20220915?account-name=${account-name}&account-key=${account-key} +azure://external/testfolder?account-name=${account-name}&account-key=${account-key} ``` + +The following is an example of an Azure Blob Storage URI for [IMPORT INTO](/sql-statements/sql-statement-import-into.md). In this example, you need to specify a specific filename `test.csv`. + +```shell +azure://external/test.csv?account-name=${account-name}&account-key=${account-key} +``` \ No newline at end of file From 9b3e01611920b0d28734282a5981e79461575482 Mon Sep 17 00:00:00 2001 From: Lilian Lee Date: Fri, 27 Oct 2023 18:07:01 +0800 Subject: [PATCH 15/15] Update format --- external-storage-uri.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/external-storage-uri.md b/external-storage-uri.md index 2fea54d0b2113..b9ddf28d1c007 100644 --- a/external-storage-uri.md +++ b/external-storage-uri.md @@ -38,7 +38,7 @@ The following is an example of an Amazon S3 URI for TiDB Lightning and BR. In th s3://external/testfolder?access-key=${access-key}&secret-access-key=${secret-access-key} ``` -The following is an example of an Amazon S3 URI for [IMPORT INTO](/sql-statements/sql-statement-import-into.md). In this example, you need to specify a specific filename `test.csv`. +The following is an example of an Amazon S3 URI for [`IMPORT INTO`](/sql-statements/sql-statement-import-into.md). In this example, you need to specify a specific filename `test.csv`. ```shell s3://external/test.csv?access-key=${access-key}&secret-access-key=${secret-access-key}" @@ -60,7 +60,7 @@ The following is an example of a GCS URI for TiDB Lightning and BR. In this exam gcs://external/testfolder?credentials-file=${credentials-file-path} ``` -The following is an example of a GCS URI for [IMPORT INTO](/sql-statements/sql-statement-import-into.md). In this example, you need to specify a specific filename `test.csv`. +The following is an example of a GCS URI for [`IMPORT INTO`](/sql-statements/sql-statement-import-into.md). In this example, you need to specify a specific filename `test.csv`. ```shell gcs://external/test.csv?credentials-file=${credentials-file-path} @@ -85,7 +85,7 @@ The following is an example of an Azure Blob Storage URI for TiDB Lightning and azure://external/testfolder?account-name=${account-name}&account-key=${account-key} ``` -The following is an example of an Azure Blob Storage URI for [IMPORT INTO](/sql-statements/sql-statement-import-into.md). In this example, you need to specify a specific filename `test.csv`. +The following is an example of an Azure Blob Storage URI for [`IMPORT INTO`](/sql-statements/sql-statement-import-into.md). In this example, you need to specify a specific filename `test.csv`. ```shell azure://external/test.csv?account-name=${account-name}&account-key=${account-key}