Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Docs][Connector-V2][SelectDB-Cloud]Reconstruct the SelectDB-Cloud connector document #5130

Merged
merged 2 commits into from
Aug 14, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
228 changes: 129 additions & 99 deletions docs/en/connector-v2/sink/SelectDB-Cloud.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,139 +2,169 @@

> SelectDB Cloud sink connector

## Description
## Support Those Engines

Used to send data to SelectDB Cloud. Both support streaming and batch mode.
The internal implementation of SelectDB Cloud sink connector upload after batch caching and commit the CopyInto sql to load data into the table.
> Spark<br/>
> Flink<br/>
> SeaTunnel Zeta<br/>

:::tip

Version Supported

* supported `SelectDB Cloud version is >= 2.2.x`

:::

## Key features
## Key Features

- [x] [exactly-once](../../concept/connector-v2-features.md)
- [x] [cdc](../../concept/connector-v2-features.md)

## Options

| name | type | required | default value |
|--------------------|--------|----------|------------------------|
| load-url | string | yes | - |
| jdbc-url | string | yes | - |
| cluster-name | string | yes | - |
| username | string | yes | - |
| password | string | yes | - |
| table.identifier | string | yes | - |
| sink.enable-delete | bool | no | false |
| selectdb.config | map | yes | - |
| sink.buffer-size | int | no | 10 * 1024 * 1024 (1MB) |
| sink.buffer-count | int | no | 10000 |
| sink.max-retries | int | no | 3 |

### load-url [string]

`SelectDB Cloud` warehouse http address, the format is `warehouse_ip:http_port`

### jdbc-url [string]

`SelectDB Cloud` warehouse jdbc address, the format is `warehouse_ip:mysql_port`

### cluster-name [string]

`SelectDB Cloud` cluster name

### username [string]

`SelectDB Cloud` user username

### password [string]

`SelectDB Cloud` user password

### table.identifier [string]

The name of `SelectDB Cloud` table, the format is `database.table`
## Description

### sink.enable-delete [string]
Used to send data to SelectDB Cloud. Both support streaming and batch mode.
The internal implementation of SelectDB Cloud sink connector upload after batch caching and commit the CopyInto sql to load data into the table.

Whether to enable deletion. This option requires SelectDB Cloud table to enable batch delete function, and only supports Unique model.
## Supported DataSource Info

`ALTER TABLE example_db.my_table ENABLE FEATURE "BATCH_DELETE";`
:::tip

### selectdb.config [map]
Version Supported

Write property configuration
* supported `SelectDB Cloud version is >= 2.2.x`

CSV Write:
:::

```
selectdb.config {
file.type="csv"
file.column_separator=","
file.line_delimiter="\n"
## Sink Options

| Name | Type | Required | Default | Description |
|--------------------|--------|----------|------------------------|-------------------------------------------------------------------------------------------------------------------------------------------|
| load-url | String | Yes | - | `SelectDB Cloud` warehouse http address, the format is `warehouse_ip:http_port` |
| jdbc-url | String | Yes | - | `SelectDB Cloud` warehouse jdbc address, the format is `warehouse_ip:mysql_port` |
| cluster-name | String | Yes | - | `SelectDB Cloud` cluster name |
| username | String | Yes | - | `SelectDB Cloud` user username |
| password | String | Yes | - | `SelectDB Cloud` user password |
| table.identifier | String | Yes | - | The name of `SelectDB Cloud` table, the format is `database.table` |
| sink.enable-delete | bool | No | false | Whether to enable deletion. This option requires SelectDB Cloud table to enable batch delete function, and only supports Unique model. |
| sink.max-retries | int | No | 3 | the max retry times if writing records to database failed |
| sink.buffer-size | int | No | 10 * 1024 * 1024 (1MB) | the buffer size to cache data for stream load. |
| sink.buffer-count | int | No | 10000 | the buffer count to cache data for stream load. |
| selectdb.config | map | yes | - | This option is used to support operations such as `insert`, `delete`, and `update` when automatically generate sql,and supported formats. |

## Data Type Mapping

| SelectDB Cloud Data type | SeaTunnel Data type |
|--------------------------|-----------------------------------------|
| BOOLEAN | BOOLEAN |
| TINYINT | TINYINT |
| SMALLINT | SMALLINT<br/>TINYINT |
| INT | INT<br/>SMALLINT<br/>TINYINT |
| BIGINT | BIGINT<br/>INT<br/>SMALLINT<br/>TINYINT |
| LARGEINT | BIGINT<br/>INT<br/>SMALLINT<br/>TINYINT |
| FLOAT | FLOAT |
| DOUBLE | DOUBLE<br/>FLOAT |
| DECIMAL | DECIMAL<br/>DOUBLE<br/>FLOAT |
| DATE | DATE |
| DATETIME | TIMESTAMP |
| CHAR | STRING |
| VARCHAR | STRING |
| STRING | STRING |
| ARRAY | ARRAY |
| MAP | MAP |
| JSON | STRING |
| HLL | Not supported yet |
| BITMAP | Not supported yet |
| QUANTILE_STATE | Not supported yet |
| STRUCT | Not supported yet |

#### Supported import data formats

The supported formats include CSV and JSON

## Task Example

### Simple:

> The following example describes writing multiple data types to SelectDBCloud, and users need to create corresponding tables downstream

```hocon
env {
parallelism = 1
job.mode = "BATCH"
checkpoint.interval = 10000
}
```

JSON Write:
source {
FakeSource {
row.num = 10
map.size = 10
array.size = 10
bytes.length = 10
string.length = 10
schema = {
fields {
c_map = "map<string, array<int>>"
c_array = "array<int>"
c_string = string
c_boolean = boolean
c_tinyint = tinyint
c_smallint = smallint
c_int = int
c_bigint = bigint
c_float = float
c_double = double
c_decimal = "decimal(16, 1)"
c_null = "null"
c_bytes = bytes
c_date = date
c_timestamp = timestamp
}
}
}
}

```
selectdb.config {
file.type="json"
sink {
SelectDBCloud {
load-url = "warehouse_ip:http_port"
jdbc-url = "warehouse_ip:mysql_port"
cluster-name = "Cluster"
table.identifier = "test.test"
username = "admin"
password = "******"
selectdb.config {
file.type = "json"
}
}
}
```

### sink.buffer-size [string]

The maximum capacity of the cache, in bytes, that is flushed to the object storage. The default is 10MB. it is not recommended to modify it.

### sink.buffer-count [string]

Maximum number of entries flushed to the object store. The default value is 10000. it is not recommended to modify.

### sink.max-retries [string]

The maximum number of retries in the Commit phase, the default is 3.

## Example

Use JSON format to import data
### Use JSON format to import data

```
sink {
SelectDBCloud {
load-url="warehouse_ip:http_port"
jdbc-url="warehouse_ip:mysql_port"
cluster-name="Cluster"
table.identifier="test.test"
username="admin"
password="******"
load-url = "warehouse_ip:http_port"
jdbc-url = "warehouse_ip:mysql_port"
cluster-name = "Cluster"
table.identifier = "test.test"
username = "admin"
password = "******"
selectdb.config {
file.type="json"
file.type = "json"
}
}
}

```

Use CSV format to import data
### Use CSV format to import data

```
sink {
SelectDBCloud {
load-url="warehouse_ip:http_port"
jdbc-url="warehouse_ip:mysql_port"
cluster-name="Cluster"
table.identifier="test.test"
username="admin"
password="******"
load-url = "warehouse_ip:http_port"
jdbc-url = "warehouse_ip:mysql_port"
cluster-name = "Cluster"
table.identifier = "test.test"
username = "admin"
password = "******"
selectdb.config {
file.type="csv"
file.column_separator=","
file.line_delimiter="\n"
file.type = "csv"
file.column_separator = ","
file.line_delimiter = "\n"
}
}
}
Expand Down
Loading