Skip to content

Commit

Permalink
1.添加aws相关的依赖
Browse files Browse the repository at this point in the history
2.添加dockerfile文件
  • Loading branch information
lklhdu committed Nov 16, 2022
1 parent dc63ff3 commit 8b2532e
Show file tree
Hide file tree
Showing 5 changed files with 45 additions and 11 deletions.
12 changes: 12 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
FROM openjdk:8u332-jdk
RUN apt update \
&& apt-get install -y netcat \
&& apt-get install -y vim \
&& apt-get install -y net-tools \
&& apt-get install -y telnet

# RUN cp /usr/share/zoneinfo/Asia/Shanghai /etc/localtime && echo 'Asia/Shanghai' >/etc/timezone

COPY arctic_benchmark_ingestion.tar.gz /usr/lib/benchmark-ingestion/arctic_benchmark_ingestion.tar.gz
RUN cd /usr/lib/benchmark-ingestion && tar -zxvf arctic_benchmark_ingestion.tar.gz && rm -rf arctic_benchmark_ingestion.tar.gz
WORKDIR /usr/lib/benchmark-ingestion
17 changes: 11 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,21 @@
# 总览
欢迎使用lakehouse-benchmark-ingestion。
lakehouse-benchmark-ingestion 是网易开源的数据湖性能基准测试 lakehouse-benchmark 项目下的数据同步工具,该工具基于 Flink-CDC 实现,能够将数据库中的数据实时同步到数据湖。
欢迎使用lakehouse-benchmark-ingestion。 lakehouse-benchmark-ingestion 是网易开源的数据湖性能基准测试 lakehouse-benchmark 项目下的数据同步工具,该工具基于 Flink-CDC 实现,能够将数据库中的数据实时同步到数据湖。

## 快速开始
1. 下载项目代码 `git clone https://github.com/NetEase/lakehouse-benchmark-ingestion.git`
2. 修改 resource/ingestion-conf.yaml ,填写配置项信息
3. 通过命令`mvn clean install -DskipTests`编译项目
4. 进入 target 目录,通过`java -cp eduard-1.0-SNAPSHOT.jar com.netease.arctic.benchmark.ingestion.MainRunner -confDir [confDir] -sinkType [arctic/iceberg/hudi] -sinkDatabase [dbName]`命令启动数据同步工具
4. 进入 target 目录,通过`tar -zxvf lakehouse_benchmark_ingestion.tar.gz`命令解压得到 lakehouse-benchmark-ingestion-1.0-SNAPSHOT.jar 和 conf 目录
5. 通过`java -cp lakehouse-benchmark-ingestion-1.0-SNAPSHOT.jar com.netease.arctic.benchmark.ingestion.MainRunner -confDir [confDir] -sinkType [arctic/iceberg/hudi] -sinkDatabase [dbName]`命令启动数据同步工具
5. 通过`localhost:8081`打开 Flink Web UI ,观察数据同步的情况

## 支持的参数
### 命令行参数

| 参数项 | 是否必须 | 默认值 | 描述 |
|--------------|------|--------|------------------------------------------|
| confDir || none | 配置文件 ingestion-conf.yaml 所在目录的绝对路径 |
| sinkType || (none) | 目标端数据湖 format 的类型,支持 Arctic/Iceberg/Hudi |
| confDir || (none) | 配置文件 ingestion-conf.yaml 所在目录的绝对路径 |
| sinkType || (none) | 目标端数据湖 Format 的类型,支持 Arctic/Iceberg/Hudi |
| sinkDatabase || (none) | 目标端数据库的名称 |
| restPort || 8081 | Flink Web UI的端口 |

Expand All @@ -24,7 +24,7 @@ lakehouse-benchmark-ingestion 是网易开源的数据湖性能基准测试 lake

| 参数项 | 是否必须 | 默认值 | 描述 |
|--------------------------|------|---------|---------------------------------------------------------------|
| source.type || none | 源端数据库的类型,目前仅支持 MySQL |
| source.type || (none) | 源端数据库的类型,目前仅支持 MySQL |
| source.username || (none) | 源端数据库用户名 |
| source.password || (none) | 源端数据库密码 |
| source.hostname || (none) | 源端数据库地址 |
Expand Down Expand Up @@ -68,3 +68,8 @@ lakehouse-benchmark-ingestion 是网易开源的数据湖性能基准测试 lake
1. Arctic
2. Iceberg
3. Hudi

## 相关说明
* 本项目使用的arctic-flink-runtime-1.14依赖需要基于Arctic工程进行源码编译,请下载[Arctic工程](https://github.com/NetEase/arctic)的代码,然后切换到0.3.x分支,
* 本项目使用的hudi-flink1.14-bundle_2.12依赖需要基于Hudi工程进行源码编译,具体操作请参考[Hudi工程](https://github.com/apache/hudi)Build with different Flink versions部分的文档说明
*
2 changes: 1 addition & 1 deletion assembly.xml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
<fileSet>
<directory>${project.build.directory}</directory>
<includes>
<include>eduard-1.0-SNAPSHOT.jar</include>
<include>lakehouse-benchmark-ingestion-1.0-SNAPSHOT.jar</include>
<!--
<include>metadata.properties</include>
-->
Expand Down
22 changes: 19 additions & 3 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,9 @@
<modelVersion>4.0.0</modelVersion>

<groupId>com.netease</groupId>
<artifactId>eduard</artifactId>
<artifactId>lakehouse-benchmark-ingestion</artifactId>
<version>1.0-SNAPSHOT</version>
<name>eduard</name>
<name>lakehouse-benchmark-ingestion</name>
<packaging>jar</packaging>

<properties>
Expand All @@ -16,7 +16,7 @@
<maven-checkstyle-plugin.version>3.1.2</maven-checkstyle-plugin.version>
<maven-shade-plugin.version>3.2.1</maven-shade-plugin.version>
<maven-assembly-plugin.version>3.3.0</maven-assembly-plugin.version>
<package.final.name>arctic_benchmark_ingestion</package.final.name>
<package.final.name>lakehouse_benchmark_ingestion</package.final.name>

<flink.version>1.14.5</flink.version>
<scala.binary.version>2.12</scala.binary.version>
Expand Down Expand Up @@ -116,6 +116,22 @@
<artifactId>hadoop-mapreduce-client-jobclient</artifactId>
<version>${hadoop.version}</version>
</dependency>
<!--aws dependencies-->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-aws</artifactId>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-java-sdk-s3</artifactId>
<version>1.11.199</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-cloud-storage</artifactId>
<version>${hadoop.version}</version>
</dependency>
<!--hudi dependencies-->
<dependency>
<groupId>org.apache.hudi</groupId>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -136,7 +136,8 @@ private static void createSinkCatalog(String sinkType, Map<String, String> props
((StreamTableEnvironmentImpl) tableEnv).executeInternal(operation);
}

private static Configuration loadConfiguration(final String configDir, Map<String, String> props) {
private static Configuration loadConfiguration(final String configDir,
Map<String, String> props) {

if (configDir == null) {
throw new IllegalArgumentException(
Expand Down

0 comments on commit 8b2532e

Please sign in to comment.