Skip to content

Commit

Permalink
Update ticdc-sink-to-cloud-storage.md
Browse files Browse the repository at this point in the history
  • Loading branch information
hfxsd committed May 14, 2024
1 parent 51ebd65 commit 7bbe54f
Showing 1 changed file with 59 additions and 0 deletions.
59 changes: 59 additions & 0 deletions ticdc/ticdc-sink-to-cloud-storage.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,24 +59,83 @@ For `[query_parameters]` in the URI, the following parameters can be configured:
### Configure sink URI for external storage

When you store data into a cloud storage system, you need to set different authentication parameters depending on the cloud service provider. This section describes the authentication methods for storage services when you use Amazon S3, Google Cloud Storage (GCS), and Azure Blob Storage, and how to configure accounts for accessing the corresponding storage services.

<SimpleTab groupId="storage">
<div label="Amazon S3" value="amazon">

The following is an example configuration for Amazon S3:

```shell
--sink-uri="s3://bucket/prefix?protocol=canal-json"
```

Before replicating data, you need to set appropriate access privileges for the directory in Amazon S3:

- The minimum privileges required by TiCDC: `s3:ListBucket`, `s3:PutObject`, and `s3:GetObject`.
- If the parameter `sink.cloud-storage-config.flush-concurrency` of changefeed is greater than 1, it means that parallel uploading of single files is enabled. In this case, you need to additionally add [ListParts](https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListParts.html) privileges:
- `s3:AbortMultipartUpload`
- `s3:ListMultipartUploadParts`
- `s3:ListBucketMultipartUploads`

If you have not created a synchronization data storage directory, you can refer to [Create a Bucket](https://docs.aws.amazon.com/AmazonS3/latest/user-guide/create-bucket.html) to create an S3 storage bucket in the specified region. If you need to use a folder, you can refer to [Organizing objects in the Amazon S3 console by using folders](https://docs.aws.amazon.com/AmazonS3/latest/user-guide/create-folder.html) to create a folder in the storage bucket.

You can configure an account to access Amazon S3 in either of the following two ways:

- Method 1: Specify the access key

If you specify an access key and a secret access key, authentication will be performed according to them. In addition to specifying the key in the URI, the following methods are supported:

- Read the `$AWS_ACCESS_KEY_ID` and `$AWS_SECRET_ACCESS_KEY` environment variables
- Reading the `$AWS_ACCESS_KEY` and `$AWS_SECRET_KEY` environment variables
- Read the shared credentials file, with the path specified by the `$AWS_SHARED_CREDENTIALS_FILE` environment variable
- Read the shared credentials file, with the path `~/.aws/credentials`

- Method 2: Access based on the IAM Role

Associate an [IAM role configured to access S3 access](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_switch-role-ec2.html) to the EC2 instance running TiCDC server. After successful setup, TiCDC can directly access the corresponding backup catalog in S3 without additional setup.

</div>
<div label="GCS" value="gcs">

The following is an example configuration for GCS:

```shell
--sink-uri="gcs://bucket/prefix?protocol=canal-json"
```

You can configure the accounts that have access to GCS by specifying an access key. If the `credentials-file` parameter is specified, authentication is performed according to the specified `credentials-file`. In addition to specifying the key file in the URI, the following methods are supported:

- Read the contents of a file located in the path specified by the `$GOOGLE_APPLICATION_CREDENTIALS` environment variable.
- Retrieve the contents of the file in `~/.config/gcloud/application_default_credentials.json
- Get credentials from a metadata server when running in GCE or GAE

</div>
<div label="Azure Blob Storage" value="azure">

The following is an example configuration for Azure Blob Storage:

```shell
--sink-uri="azure://bucket/prefix?protocol=canal-json"
```

You can configure the accounts that access Azure Blob Storage in the following ways:

- Method 1: Specify a shared access signature

If you configure `account-name` and `sas-token` in the URI, the storage account name and shared access signature token specified by this parameter are used. Because the shared access signature token has the `&` character, you need to encode it as `%26` before adding it to the URI. You can also just encode the entire `sas-token` in percent.

- Method 2: Specify the access key

If you configure `account-name` and `account-key` in the URI, the storage account name and key specified by this parameter are used. In addition to specifying the key file in the URI, reading `$AZURE_STORAGE_KEY` is also supported.

- Method 3: Use Azure AD to restore the backup

Run the environment configurations `$AZURE_CLIENT_ID`, `$AZURE_TENANT_ID`, and `$AZURE_CLIENT_SECRET`.

</div>
</SimpleTab>

> **Tip:**
>
> For more information about the URI parameters of Amazon S3, GCS, and Azure Blob Storage in TiCDC, see [URI Formats of External Storage Services](/external-storage-uri.md).
Expand Down

0 comments on commit 7bbe54f

Please sign in to comment.