Skip to content

Commit

Permalink
Merge branch 'apache:dev' into dev-fix-sink-multi-table-path
Browse files Browse the repository at this point in the history
  • Loading branch information
hailin0 committed Jul 13, 2024
2 parents 043caee + 37f2ee2 commit 8bedbd4
Show file tree
Hide file tree
Showing 76 changed files with 3,941 additions and 230 deletions.
8 changes: 7 additions & 1 deletion .github/workflows/labeler/label-scope-conf.yml
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ spark:
- changed-files:
- any-glob-to-any-file:
- seatunnel-translation/seatunnel-translation-spark/**
connector-v2:
connectors-v2:
- changed-files:
- any-glob-to-any-file: seatunnel-connectors-v2/**
transform-v2:
Expand Down Expand Up @@ -246,6 +246,12 @@ web3j:
- all:
- changed-files:
- any-glob-to-any-file: seatunnel-connectors-v2/connector-web3j/**
- all-globs-to-all-files: '!seatunnel-connectors-v2/connector-!(web3j)/**'
Milvus:
- all:
- changed-files:
- any-glob-to-any-file: seatunnel-connectors-v2/connector-milvus/**
- all-globs-to-all-files: '!seatunnel-connectors-v2/connector-!(milvus)/**'
Zeta Rest API:
- changed-files:
- any-glob-to-any-file: seatunnel-engine/**/server/rest/**
Expand Down
20 changes: 10 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ SeaTunnel addresses common data integration challenges:

- **Real-Time Monitoring**: Offers detailed insights during synchronization.

- **Two Job Development Methods**: Supports coding and visual job management with the [SeaTunnel web project](https://github.com/apache/seatunnel-web).
- **Two Job Development Methods**: Supports coding and visual job management with the [SeaTunnel Web Project](https://github.com/apache/seatunnel-web).

## SeaTunnel Workflow

Expand All @@ -75,7 +75,7 @@ For a list of connectors and their health status, visit the [Connector Status](d

## Getting Started

Download SeaTunnel from the [official website](https://seatunnel.apache.org/download).
Download SeaTunnel from the [Official Website](https://seatunnel.apache.org/download).

Choose your runtime execution engine:
- [SeaTunnel Zeta Engine](https://seatunnel.apache.org/docs/start-v2/locally/quick-start-seatunnel-engine/)
Expand All @@ -84,19 +84,19 @@ Choose your runtime execution engine:

## Use Cases

Explore real-world use cases of SeaTunnel, such as Weibo, Tencent Cloud, Sina, Sogou, and Yonghui Superstores. More use cases can be found on the [SeaTunnel blog](https://seatunnel.apache.org/blog).
Explore real-world use cases of SeaTunnel, such as Weibo, Tencent Cloud, Sina, Sogou, and Yonghui Superstores. More use cases can be found on the [SeaTunnel Blog](https://seatunnel.apache.org/blog).

## Code of Conduct

Participate in this project following the Contributor Covenant [Code of Conduct](https://www.apache.org/foundation/policies/conduct).
Participate in this project in accordance with the Contributor Covenant [Code of Conduct](https://www.apache.org/foundation/policies/conduct).

## Contributors

We appreciate all developers for their contributions. See the [list of contributors](https://github.com/apache/seatunnel/graphs/contributors).
We appreciate all developers for their contributions. See the [List Of Contributors](https://github.com/apache/seatunnel/graphs/contributors).

## How to Compile

Refer to this [document](docs/en/contribution/setup.md) for compilation instructions.
Refer to this [Setup](docs/en/contribution/setup.md) for compilation instructions.

## Contact Us

Expand All @@ -117,7 +117,7 @@ For more information, please refer to [SeaTunnel Web](https://github.com/apache/

## Our Users

Companies and organizations worldwide use SeaTunnel for research, production, and commercial products. Visit our [user page](https://seatunnel.apache.org/user) for more information.
Companies and organizations worldwide use SeaTunnel for research, production, and commercial products. Visit our [Users](https://seatunnel.apache.org/user) for more information.

## License

Expand All @@ -127,23 +127,23 @@ Companies and organizations worldwide use SeaTunnel for research, production, an

### 1. How do I install SeaTunnel?

Follow the [installation guide](https://seatunnel.apache.org/docs/2.3.3/start-v2/locally/deployment/) on our website to get started.
Follow the [Installation Guide](https://seatunnel.apache.org/docs/2.3.3/start-v2/locally/deployment/) on our website to get started.

### 2. How can I contribute to SeaTunnel?

We welcome contributions! Please refer to our [Contribution Guidelines](https://github.com/apache/seatunnel/blob/dev/docs/en/contribution/coding-guide.md) for details.

### 3. How do I report issues or request features?

You can report issues or request features on our [GitHub repository](https://github.com/apache/seatunnel/issues).
You can report issues or request features on our [GitHub Repository](https://github.com/apache/seatunnel/issues).

### 4. Can I use SeaTunnel for commercial purposes?

Yes, SeaTunnel is available under the Apache 2.0 License, allowing commercial use.

### 5. Where can I find documentation and tutorials?

Our [official documentation](https://seatunnel.apache.org/docs) includes detailed guides and tutorials to help you get started.
Our [Official Documentation](https://seatunnel.apache.org/docs) includes detailed guides and tutorials to help you get started.

### 7. Is there a community or support channel?

Expand Down
1 change: 1 addition & 0 deletions config/plugin_config
Original file line number Diff line number Diff line change
Expand Up @@ -85,4 +85,5 @@ connector-paimon
connector-rocketmq
connector-tdengine
connector-web3j
connector-milvus
--end--
26 changes: 13 additions & 13 deletions docs/en/about.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ SeaTunnel is a very easy-to-use, ultra-high-performance, distributed data integr
synchronization of massive data. It can synchronize tens of billions of data stably and efficiently every day, and has
been used in production by nearly 100 companies.

## Why we need SeaTunnel
## Why We Need SeaTunnel

SeaTunnel focuses on data integration and data synchronization, and is mainly designed to solve common problems in the field of data integration:

Expand All @@ -18,45 +18,45 @@ SeaTunnel focuses on data integration and data synchronization, and is mainly de
- High resource demand: Existing data integration and data synchronization tools often require vast computing resources or JDBC connection resources to complete real-time synchronization of massive small tables. This has increased the burden on enterprises.
- Lack of quality and monitoring: Data integration and synchronization processes often experience loss or duplication of data. The synchronization process lacks monitoring, and it is impossible to intuitively understand the real situation of the data during the task process.
- Complex technology stack: The technology components used by enterprises are different, and users need to develop corresponding synchronization programs for different components to complete data integration.
- Difficulty in management and maintenance: Limited to different underlying technology components (Flink/Spark), offline synchronization and real-time synchronization often have be developed and managed separately, which increases the difficulty of management and maintainance.
- Difficulty in management and maintenance: Limited to different underlying technology components (Flink/Spark), offline synchronization and real-time synchronization often have be developed and managed separately, which increases the difficulty of management and maintenance.

## Features of SeaTunnel
## Features Of SeaTunnel

- Rich and extensible Connector: SeaTunnel provides a Connector API that does not depend on a specific execution engine. Connectors (Source, Transform, Sink) developed based on this API can run on many different engines, such as SeaTunnel Engine, Flink, and Spark, that are currently supported.
- Connector plug-in: The plug-in design allows users to easily develop their own Connector and integrate it into the SeaTunnel project. Currently, SeaTunnel supports more than 100 Connectors, and the number is surging. Here is the list of [currently-supported connectors](Connector-v2-release-state.md)
- Rich and extensible Connector: SeaTunnel provides a Connector API that does not depend on a specific execution engine. Connectors (Source, Transform, Sink) developed based on this API can run on many different engines, such as SeaTunnel Engine(Zeta), Flink, and Spark.
- Connector plugin: The plugin design allows users to easily develop their own Connector and integrate it into the SeaTunnel project. Currently, SeaTunnel supports more than 100 Connectors, and the number is surging. Here is the list of [Currently Supported Connectors](Connector-v2-release-state.md)
- Batch-stream integration: Connectors developed based on the SeaTunnel Connector API are perfectly compatible with offline synchronization, real-time synchronization, full-synchronization, incremental synchronization and other scenarios. They greatly reduce the difficulty of managing data integration tasks.
- Supports a distributed snapshot algorithm to ensure data consistency.
- Multi-engine support: SeaTunnel uses the SeaTunnel Engine for data synchronization by default. SeaTunnel also supports the use of Flink or Spark as the execution engine of the Connector to adapt to the existing technical components of the enterprise. SeaTunnel supports multiple versions of Spark and Flink.
- Multi-engine support: SeaTunnel uses the SeaTunnel Engine(Zeta) for data synchronization by default. SeaTunnel also supports the use of Flink or Spark as the execution engine of the Connector to adapt to the enterprise's existing technical components. SeaTunnel supports multiple versions of Spark and Flink.
- JDBC multiplexing, database log multi-table parsing: SeaTunnel supports multi-table or whole database synchronization, which solves the problem of over-JDBC connections; and supports multi-table or whole database log reading and parsing, which solves the need for CDC multi-table synchronization scenarios to deal with problems with repeated reading and parsing of logs.
- High throughput and low latency: SeaTunnel supports parallel reading and writing, providing stable and reliable data synchronization capabilities with high throughput and low latency.
- Perfect real-time monitoring: SeaTunnel supports detailed monitoring information of each step in the data synchronization process, allowing users to easily understand the number of data, data size, QPS and other information read and written by the synchronization task.
- Two job development methods are supported: coding and canvas design. The SeaTunnel web project https://github.com/apache/seatunnel-web provides visual management of jobs, scheduling, running and monitoring capabilities.

## SeaTunnel work flowchart
## SeaTunnel Work Flowchart

![SeaTunnel work flowchart](../images/architecture_diagram.png)
![SeaTunnel Work Flowchart](../images/architecture_diagram.png)

The runtime process of SeaTunnel is shown in the figure above.

The user configures the job information and selects the execution engine to submit the job.

The Source Connector is responsible for parallel reading the data and sending the data to the downstream Transform or directly to the Sink, and the Sink writes the data to the destination. It is worth noting that Source, Transform and Sink can be easily developed and extended by yourself.
The Source Connector is responsible for parallel reading and sending the data to the downstream Transform or directly to the Sink, and the Sink writes the data to the destination. It is worth noting that Source, Transform and Sink can be easily developed and extended by yourself.

SeaTunnel is an EL(T) data integration platform. Therefore, in SeaTunnel, Transform can only be used to perform some simple transformations on data, such as converting the data of a column to uppercase or lowercase, changing the column name, or splitting a column into multiple columns.

The default engine use by SeaTunnel is [SeaTunnel Engine](seatunnel-engine/about.md). If you choose to use the Flink or Spark engine, SeaTunnel will package the Connector into a Flink or Spark program and submit it to Flink or Spark to run.

## Connector

- **Source Connectors** SeaTunnel supports reading data from various relational, graph, NoSQL, document, and memory databases; distributed file systems such as HDFS; and a variety of cloud storage solutions, such as S3 and OSS. We also support data reading of many common SaaS services. You can access the detailed list [here](connector-v2/source). If you want, You can develop your own source connector and easily integrate it into SeaTunnel.
- **Source Connectors** SeaTunnel supports reading data from various relational, graph, NoSQL, document, and memory databases; distributed file systems such as HDFS; and a variety of cloud storage solutions, such as S3 and OSS. We also support data reading of many common SaaS services. You can access the detailed list [Here](connector-v2/source). If you want, You can develop your own source connector and easily integrate it into SeaTunnel.

- **Transform Connector** If the schema is different between source and Sink, You can use the Transform Connector to change the schema read from source and make it the same as the Sink schema.

- **Sink Connector** SeaTunnel supports writing data to various relational, graph, NoSQL, document, and memory databases; distributed file systems such as HDFS; and a variety of cloud storage solutions, such as S3 and OSS. We also support writing data to many common SaaS services. You can access the detailed list [here](connector-v2/sink). If you want, you can develop your own Sink connector and easily integrate it into SeaTunnel.
- **Sink Connector** SeaTunnel supports writing data to various relational, graph, NoSQL, document, and memory databases; distributed file systems such as HDFS; and a variety of cloud storage solutions, such as S3 and OSS. We also support writing data to many common SaaS services. You can access the detailed list [Here](connector-v2/sink). If you want, you can develop your own Sink connector and easily integrate it into SeaTunnel.

## Who uses SeaTunnel
## Who Uses SeaTunnel

SeaTunnel has lots of users. You can find more information about them in [users](https://seatunnel.apache.org/user).
SeaTunnel has lots of users. You can find more information about them in [Users](https://seatunnel.apache.org/user).

## Landscapes

Expand Down
2 changes: 1 addition & 1 deletion docs/en/command/connector-check.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Connector check command usage
# Connector Check Command Usage

## Command Entrypoint

Expand Down
2 changes: 1 addition & 1 deletion docs/en/command/usage.mdx
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';

# Command usage
# Command Usage

## Command Entrypoint

Expand Down
14 changes: 7 additions & 7 deletions docs/en/concept/JobEnvConfig.md
Original file line number Diff line number Diff line change
@@ -1,23 +1,23 @@
# Job Env Config

This document describes env configuration information, the common parameters can be used in all engines. In order to better distinguish between engine parameters, the additional parameters of other engine need to carry a prefix.
This document describes env configuration information. The common parameters can be used in all engines. In order to better distinguish between engine parameters, the additional parameters of other engine need to carry a prefix.
In flink engine, we use `flink.` as the prefix. In the spark engine, we do not use any prefixes to modify parameters, because the official spark parameters themselves start with `spark.`

## Common Parameter

The following configuration parameters are common to all engines
The following configuration parameters are common to all engines.

### job.name

This parameter configures the task name.

### jars

Third-party packages can be loaded via `jars`, like `jars="file://local/jar1.jar;file://local/jar2.jar"`
Third-party packages can be loaded via `jars`, like `jars="file://local/jar1.jar;file://local/jar2.jar"`.

### job.mode

You can configure whether the task is in batch mode or stream mode through `job.mode`, like `job.mode = "BATCH"` or `job.mode = "STREAMING"`
You can configure whether the task is in batch or stream mode through `job.mode`, like `job.mode = "BATCH"` or `job.mode = "STREAMING"`

### checkpoint.interval

Expand Down Expand Up @@ -47,11 +47,11 @@ you can set it to `CLIENT`. Please use `CLUSTER` mode as much as possible, becau

Specify the method of encryption, if you didn't have the requirement for encrypting or decrypting config files, this option can be ignored.

For more details, you can refer to the documentation [config-encryption-decryption](../connector-v2/Config-Encryption-Decryption.md)
For more details, you can refer to the documentation [Config Encryption Decryption](../connector-v2/Config-Encryption-Decryption.md)

## Flink Engine Parameter

Here are some SeaTunnel parameter names corresponding to the names in Flink, not all of them, please refer to the official [flink documentation](https://flink.apache.org/) for more.
Here are some SeaTunnel parameter names corresponding to the names in Flink, not all of them. Please refer to the official [Flink Documentation](https://flink.apache.org/).

| Flink Configuration Name | SeaTunnel Configuration Name |
|---------------------------------|---------------------------------------|
Expand All @@ -62,4 +62,4 @@ Here are some SeaTunnel parameter names corresponding to the names in Flink, not

## Spark Engine Parameter

Because spark configuration items have not been modified, they are not listed here, please refer to the official [spark documentation](https://spark.apache.org/).
Because Spark configuration items have not been modified, they are not listed here, please refer to the official [Spark Documentation](https://spark.apache.org/).
Loading

0 comments on commit 8bedbd4

Please sign in to comment.