From 7bbe54f0520ccc927edfdff6f179e217a603a01e Mon Sep 17 00:00:00 2001 From: xixirangrang <35301108+hfxsd@users.noreply.github.com> Date: Tue, 14 May 2024 16:28:23 +0800 Subject: [PATCH] Update ticdc-sink-to-cloud-storage.md --- ticdc/ticdc-sink-to-cloud-storage.md | 59 ++++++++++++++++++++++++++++ 1 file changed, 59 insertions(+) diff --git a/ticdc/ticdc-sink-to-cloud-storage.md b/ticdc/ticdc-sink-to-cloud-storage.md index 15ca22c11cf9a..a595ab9c9bbd0 100644 --- a/ticdc/ticdc-sink-to-cloud-storage.md +++ b/ticdc/ticdc-sink-to-cloud-storage.md @@ -59,24 +59,83 @@ For `[query_parameters]` in the URI, the following parameters can be configured: ### Configure sink URI for external storage +When you store data into a cloud storage system, you need to set different authentication parameters depending on the cloud service provider. This section describes the authentication methods for storage services when you use Amazon S3, Google Cloud Storage (GCS), and Azure Blob Storage, and how to configure accounts for accessing the corresponding storage services. + + +
+ The following is an example configuration for Amazon S3: ```shell --sink-uri="s3://bucket/prefix?protocol=canal-json" ``` +Before replicating data, you need to set appropriate access privileges for the directory in Amazon S3: + +- The minimum privileges required by TiCDC: `s3:ListBucket`, `s3:PutObject`, and `s3:GetObject`. +- If the parameter `sink.cloud-storage-config.flush-concurrency` of changefeed is greater than 1, it means that parallel uploading of single files is enabled. In this case, you need to additionally add [ListParts](https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListParts.html) privileges: + - `s3:AbortMultipartUpload` + - `s3:ListMultipartUploadParts` + - `s3:ListBucketMultipartUploads` + +If you have not created a synchronization data storage directory, you can refer to [Create a Bucket](https://docs.aws.amazon.com/AmazonS3/latest/user-guide/create-bucket.html) to create an S3 storage bucket in the specified region. If you need to use a folder, you can refer to [Organizing objects in the Amazon S3 console by using folders](https://docs.aws.amazon.com/AmazonS3/latest/user-guide/create-folder.html) to create a folder in the storage bucket. + +You can configure an account to access Amazon S3 in either of the following two ways: + +- Method 1: Specify the access key + + If you specify an access key and a secret access key, authentication will be performed according to them. In addition to specifying the key in the URI, the following methods are supported: + + - Read the `$AWS_ACCESS_KEY_ID` and `$AWS_SECRET_ACCESS_KEY` environment variables + - Reading the `$AWS_ACCESS_KEY` and `$AWS_SECRET_KEY` environment variables + - Read the shared credentials file, with the path specified by the `$AWS_SHARED_CREDENTIALS_FILE` environment variable + - Read the shared credentials file, with the path `~/.aws/credentials` + +- Method 2: Access based on the IAM Role + + Associate an [IAM role configured to access S3 access](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_switch-role-ec2.html) to the EC2 instance running TiCDC server. After successful setup, TiCDC can directly access the corresponding backup catalog in S3 without additional setup. + +
+
+ The following is an example configuration for GCS: ```shell --sink-uri="gcs://bucket/prefix?protocol=canal-json" ``` +You can configure the accounts that have access to GCS by specifying an access key. If the `credentials-file` parameter is specified, authentication is performed according to the specified `credentials-file`. In addition to specifying the key file in the URI, the following methods are supported: + +- Read the contents of a file located in the path specified by the `$GOOGLE_APPLICATION_CREDENTIALS` environment variable. +- Retrieve the contents of the file in `~/.config/gcloud/application_default_credentials.json +- Get credentials from a metadata server when running in GCE or GAE + +
+
+ The following is an example configuration for Azure Blob Storage: ```shell --sink-uri="azure://bucket/prefix?protocol=canal-json" ``` +You can configure the accounts that access Azure Blob Storage in the following ways: + +- Method 1: Specify a shared access signature + + If you configure `account-name` and `sas-token` in the URI, the storage account name and shared access signature token specified by this parameter are used. Because the shared access signature token has the `&` character, you need to encode it as `%26` before adding it to the URI. You can also just encode the entire `sas-token` in percent. + +- Method 2: Specify the access key + + If you configure `account-name` and `account-key` in the URI, the storage account name and key specified by this parameter are used. In addition to specifying the key file in the URI, reading `$AZURE_STORAGE_KEY` is also supported. + +- Method 3: Use Azure AD to restore the backup + + Run the environment configurations `$AZURE_CLIENT_ID`, `$AZURE_TENANT_ID`, and `$AZURE_CLIENT_SECRET`. + +
+
+ > **Tip:** > > For more information about the URI parameters of Amazon S3, GCS, and Azure Blob Storage in TiCDC, see [URI Formats of External Storage Services](/external-storage-uri.md).