Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ticdc: enhance storage sink uri config #17490

Merged
merged 5 commits into from
May 15, 2024
Merged
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
59 changes: 59 additions & 0 deletions ticdc/ticdc-sink-to-cloud-storage.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,24 +59,83 @@ For `[query_parameters]` in the URI, the following parameters can be configured:

### Configure sink URI for external storage

When you store data into a cloud storage system, you need to set different authentication parameters depending on the cloud service provider. This section describes the authentication methods for storage services when you use Amazon S3, Google Cloud Storage (GCS), and Azure Blob Storage, and how to configure accounts for accessing the corresponding storage services.
hfxsd marked this conversation as resolved.
Show resolved Hide resolved

<SimpleTab groupId="storage">
<div label="Amazon S3" value="amazon">

The following is an example configuration for Amazon S3:

```shell
--sink-uri="s3://bucket/prefix?protocol=canal-json"
```

Before replicating data, you need to set appropriate access privileges for the directory in Amazon S3:
hfxsd marked this conversation as resolved.
Show resolved Hide resolved

- The minimum privileges required by TiCDC: `s3:ListBucket`, `s3:PutObject`, and `s3:GetObject`.
- If the parameter `sink.cloud-storage-config.flush-concurrency` of changefeed is greater than 1, it means that parallel uploading of single files is enabled. In this case, you need to additionally add [ListParts](https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListParts.html) privileges:
hfxsd marked this conversation as resolved.
Show resolved Hide resolved
- `s3:AbortMultipartUpload`
- `s3:ListMultipartUploadParts`
- `s3:ListBucketMultipartUploads`

If you have not created a replication data storage directory, you can refer to [Create a Bucket](https://docs.aws.amazon.com/AmazonS3/latest/user-guide/create-bucket.html) to create an S3 storage bucket in the specified region. If you need to use a folder, you can refer to [Organizing objects in the Amazon S3 console by using folders](https://docs.aws.amazon.com/AmazonS3/latest/user-guide/create-folder.html) to create a folder in the storage bucket.
hfxsd marked this conversation as resolved.
Show resolved Hide resolved

You can configure an account to access Amazon S3 in either of the following two ways:
hfxsd marked this conversation as resolved.
Show resolved Hide resolved

- Method 1: Specify the access key

If you specify an access key and a secret access key, authentication will be performed according to them. In addition to specifying the key in the URI, the following methods are supported:
hfxsd marked this conversation as resolved.
Show resolved Hide resolved

- Read the `$AWS_ACCESS_KEY_ID` and `$AWS_SECRET_ACCESS_KEY` environment variables
- Reading the `$AWS_ACCESS_KEY` and `$AWS_SECRET_KEY` environment variables
- Read the shared credentials file, with the path specified by the `$AWS_SHARED_CREDENTIALS_FILE` environment variable
- Read the shared credentials file, with the path `~/.aws/credentials`
hfxsd marked this conversation as resolved.
Show resolved Hide resolved

- Method 2: Access based on an IAM role

Associate an [IAM role with configured permissions to access Amazon S3 ](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_switch-role-ec2.html) to the EC2 instance running TiCDC server. After successful setup, TiCDC can directly access the corresponding backup catalog in Amazon S3 without additional setup.
hfxsd marked this conversation as resolved.
Show resolved Hide resolved

</div>
<div label="GCS" value="gcs">

The following is an example configuration for GCS:

```shell
--sink-uri="gcs://bucket/prefix?protocol=canal-json"
```

You can configure the accounts that have access to GCS by specifying an access key. Authentication is performed according to the specified `credentials-file`. In addition to specifying the key file in the URI, the following methods are supported:
hfxsd marked this conversation as resolved.
Show resolved Hide resolved

- Read the contents of a file located in the path specified by the `$GOOGLE_APPLICATION_CREDENTIALS` environment variable.
- Retrieve the contents of the file in `~/.config/gcloud/application_default_credentials.json
- Get credentials from a metadata server when running in GCE or GAE
hfxsd marked this conversation as resolved.
Show resolved Hide resolved

</div>
<div label="Azure Blob Storage" value="azure">

The following is an example configuration for Azure Blob Storage:

```shell
--sink-uri="azure://bucket/prefix?protocol=canal-json"
```

You can configure the accounts that access Azure Blob Storage in the following ways:
hfxsd marked this conversation as resolved.
Show resolved Hide resolved

- Method 1: Specify a shared access signature

If you configure `account-name` and `sas-token` in the URI, the storage account name and shared access signature token specified by this parameter are used. Because the shared access signature token has the `&` character, you need to encode it as `%26` before adding it to the URI. You can also just encode the entire `sas-token` in percent.
hfxsd marked this conversation as resolved.
Show resolved Hide resolved

- Method 2: Specify the access key

If you configure `account-name` and `account-key` in the URI, the storage account name and key specified by this parameter are used. In addition to specifying the key file in the URI, reading `$AZURE_STORAGE_KEY` is also supported.
hfxsd marked this conversation as resolved.
Show resolved Hide resolved

- Method 3: Use Azure AD to restore the backup

Run the environment configurations `$AZURE_CLIENT_ID`, `$AZURE_TENANT_ID`, and `$AZURE_CLIENT_SECRET`.
hfxsd marked this conversation as resolved.
Show resolved Hide resolved

</div>
</SimpleTab>

> **Tip:**
>
> For more information about the URI parameters of Amazon S3, GCS, and Azure Blob Storage in TiCDC, see [URI Formats of External Storage Services](/external-storage-uri.md).
Expand Down
Loading