Skip to content

Commit

Permalink
Adds documentation for HiveServer2 support
Browse files Browse the repository at this point in the history
  • Loading branch information
linghengqian committed Nov 19, 2024
1 parent 4f87279 commit 9637ec8
Show file tree
Hide file tree
Showing 22 changed files with 816 additions and 304 deletions.
1 change: 1 addition & 0 deletions RELEASE-NOTES.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
1. Proxy: Add query parameters and check for mysql kill processId - [#33274](https://github.com/apache/shardingsphere/pull/33274)
1. Agent: Simplify the use of Agent's Docker Image - [#33356](https://github.com/apache/shardingsphere/pull/33356)
1. Build: Avoid using `-proc:full` when compiling ShardingSphere with OpenJDK23 - [#33681](https://github.com/apache/shardingsphere/pull/33681)
1. Doc: Adds documentation for HiveServer2 support - [#33717](https://github.com/apache/shardingsphere/pull/33717)

### Bug Fixes

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -289,86 +289,9 @@ Caused by: java.io.UnsupportedEncodingException: Codepage Cp1252 is not supporte

ClickHouse 不支持 ShardingSphere 集成级别的本地事务,XA 事务和 Seata AT 模式事务,更多讨论位于 https://github.com/ClickHouse/clickhouse-docs/issues/2300 。

7. 当需要通过 ShardingSphere JDBC 使用 Hive 方言时,受 https://issues.apache.org/jira/browse/HIVE-28445 影响,
用户不应该使用 `classifier` 为 `standalone` 的 `org.apache.hive:hive-jdbc:4.0.1`,以避免依赖冲突。
可能的配置例子如下,

```xml
<project>
<dependencies>
<dependency>
<groupId>org.apache.shardingsphere</groupId>
<artifactId>shardingsphere-jdbc</artifactId>
<version>${shardingsphere.version}</version>
</dependency>
<dependency>
<groupId>org.apache.shardingsphere</groupId>
<artifactId>shardingsphere-infra-database-hive</artifactId>
<version>${shardingsphere.version}</version>
</dependency>
<dependency>
<groupId>org.apache.shardingsphere</groupId>
<artifactId>shardingsphere-parser-sql-hive</artifactId>
<version>${shardingsphere.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-jdbc</artifactId>
<version>4.0.1</version>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-service</artifactId>
<version>4.0.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client-api</artifactId>
<version>3.3.6</version>
</dependency>
</dependencies>
</project>
```

这会导致大量的依赖冲突。
如果用户不希望手动解决潜在的数千行的依赖冲突,可以使用 HiveServer2 JDBC Driver 的 `Thin JAR` 的第三方构建。
可能的配置例子如下,

```xml
<project>
<dependencies>
<dependency>
<groupId>org.apache.shardingsphere</groupId>
<artifactId>shardingsphere-jdbc</artifactId>
<version>${shardingsphere.version}</version>
</dependency>
<dependency>
<groupId>org.apache.shardingsphere</groupId>
<artifactId>shardingsphere-infra-database-hive</artifactId>
<version>${shardingsphere.version}</version>
</dependency>
<dependency>
<groupId>org.apache.shardingsphere</groupId>
<artifactId>shardingsphere-parser-sql-hive</artifactId>
<version>${shardingsphere.version}</version>
</dependency>
<dependency>
<groupId>io.github.linghengqian</groupId>
<artifactId>hive-server2-jdbc-driver-thin</artifactId>
<version>1.5.0</version>
<exclusions>
<exclusion>
<groupId>com.fasterxml.woodstox</groupId>
<artifactId>woodstox-core</artifactId>
</exclusion>
</exclusions>
</dependency>
</dependencies>
</project>
```

受 https://github.com/grpc/grpc-java/issues/10601 影响,用户如果在项目中引入了 `org.apache.hive:hive-jdbc`,
7. 受 https://github.com/grpc/grpc-java/issues/10601 影响,用户如果在项目中引入了 `org.apache.hive:hive-jdbc`,
则需要在项目的 classpath 的 `META-INF/native-image/io.grpc/grpc-netty-shaded` 文件夹下创建包含如下内容的文件 `native-image.properties`,

```properties
Args=--initialize-at-run-time=\
io.grpc.netty.shaded.io.netty.channel.ChannelHandlerMask,\
Expand Down Expand Up @@ -400,55 +323,6 @@ Args=--initialize-at-run-time=\
io.grpc.netty.shaded.io.netty.util.AttributeKey
```

为了能够使用 `delete` 等 DML SQL 语句,当连接到 HiveServer2 时,
用户应当考虑在 ShardingSphere JDBC 中仅使用支持 ACID 的表。`apache/hive` 提供了多种事务解决方案。

第1种选择是使用 ACID 表,可能的建表流程如下。
由于其过时的基于目录的表格式,用户可能不得不在 DML 语句执行前后进行等待,以让 HiveServer2 完成低效的 DML 操作。

```sql
set metastore.compactor.initiator.on=true;
set metastore.compactor.cleaner.on=true;
set metastore.compactor.worker.threads=5;
set hive.support.concurrency=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
CREATE TABLE IF NOT EXISTS t_order
(
order_id BIGINT,
order_type INT,
user_id INT NOT NULL,
address_id BIGINT NOT NULL,
status VARCHAR(50),
PRIMARY KEY (order_id) disable novalidate
) CLUSTERED BY (order_id) INTO 2 BUCKETS STORED AS ORC TBLPROPERTIES ('transactional' = 'true');
```

第2种选择是使用 Iceberg 表,可能的建表流程如下。
Apache Iceberg 表格式有望在未来几年取代传统的 Hive 表格式,
参考 https://blog.cloudera.com/from-hive-tables-to-iceberg-tables-hassle-free/ 。

```sql
set iceberg.mr.schema.auto.conversion=true;
CREATE TABLE IF NOT EXISTS t_order
(
order_id BIGINT,
order_type INT,
user_id INT NOT NULL,
address_id BIGINT NOT NULL,
status VARCHAR(50),
PRIMARY KEY (order_id) disable novalidate
) STORED BY ICEBERG STORED AS ORC TBLPROPERTIES ('format-version' = '2');
```

由于 HiveServer2 JDBC Driver 未实现 `java.sql.DatabaseMetaData#getURL()`,
ShardingSphere 做了模糊处理,因此用户暂时仅可通过 HikariCP 连接 HiveServer2。

HiveServer2 不支持 ShardingSphere 集成级别的本地事务,XA 事务和 Seata AT 模式事务,更多讨论位于 https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions 。

8. 由于 https://github.com/oracle/graal/issues/7979 的影响,
对应 `com.oracle.database.jdbc:ojdbc8` Maven 模块的 Oracle JDBC Driver 无法在 GraalVM Native Image 下使用。

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -302,88 +302,10 @@ Possible configuration examples are as follows,
ClickHouse does not support local transactions, XA transactions, and Seata AT mode transactions at the ShardingSphere integration level.
More discussion is at https://github.com/ClickHouse/clickhouse-docs/issues/2300 .

7. When using the Hive dialect through ShardingSphere JDBC, affected by https://issues.apache.org/jira/browse/HIVE-28445 ,
users should not use `org.apache.hive:hive-jdbc:4.0.1` with `classifier` as `standalone` to avoid dependency conflicts.
Possible configuration examples are as follows,

```xml
<project>
<dependencies>
<dependency>
<groupId>org.apache.shardingsphere</groupId>
<artifactId>shardingsphere-jdbc</artifactId>
<version>${shardingsphere.version}</version>
</dependency>
<dependency>
<groupId>org.apache.shardingsphere</groupId>
<artifactId>shardingsphere-infra-database-hive</artifactId>
<version>${shardingsphere.version}</version>
</dependency>
<dependency>
<groupId>org.apache.shardingsphere</groupId>
<artifactId>shardingsphere-parser-sql-hive</artifactId>
<version>${shardingsphere.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-jdbc</artifactId>
<version>4.0.1</version>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-service</artifactId>
<version>4.0.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client-api</artifactId>
<version>3.3.6</version>
</dependency>
</dependencies>
</project>
```

This can lead to a large number of dependency conflicts.
If the user does not want to manually resolve potentially thousands of lines of dependency conflicts,
a third-party build of the HiveServer2 JDBC Driver `Thin JAR` can be used.
An example of a possible configuration is as follows,

```xml
<project>
<dependencies>
<dependency>
<groupId>org.apache.shardingsphere</groupId>
<artifactId>shardingsphere-jdbc</artifactId>
<version>${shardingsphere.version}</version>
</dependency>
<dependency>
<groupId>org.apache.shardingsphere</groupId>
<artifactId>shardingsphere-infra-database-hive</artifactId>
<version>${shardingsphere.version}</version>
</dependency>
<dependency>
<groupId>org.apache.shardingsphere</groupId>
<artifactId>shardingsphere-parser-sql-hive</artifactId>
<version>${shardingsphere.version}</version>
</dependency>
<dependency>
<groupId>io.github.linghengqian</groupId>
<artifactId>hive-server2-jdbc-driver-thin</artifactId>
<version>1.5.0</version>
<exclusions>
<exclusion>
<groupId>com.fasterxml.woodstox</groupId>
<artifactId>woodstox-core</artifactId>
</exclusion>
</exclusions>
</dependency>
</dependencies>
</project>
```

Affected by https://github.com/grpc/grpc-java/issues/10601 , should users incorporate `org.apache.hive:hive-service` into their project,
7. Affected by https://github.com/grpc/grpc-java/issues/10601 , should users incorporate `org.apache.hive:hive-jdbc` into their project,
it is imperative to create a file named `native-image.properties` within the directory `META-INF/native-image/io.grpc/grpc-netty-shaded` of the classpath,
containing the following content,

```properties
Args=--initialize-at-run-time=\
io.grpc.netty.shaded.io.netty.channel.ChannelHandlerMask,\
Expand Down Expand Up @@ -415,57 +337,6 @@ Args=--initialize-at-run-time=\
io.grpc.netty.shaded.io.netty.util.AttributeKey
```

In order to be able to use DML SQL statements such as `delete`, when connecting to HiveServer2,
users should consider using only ACID-supported tables in ShardingSphere JDBC. `apache/hive` provides a variety of transaction solutions.

The first option is to use ACID tables, and the possible table creation process is as follows.
Due to its outdated catalog-based table format,
users may have to wait before and after DML statement execution to let HiveServer2 complete the inefficient DML operations.

```sql
set metastore.compactor.initiator.on=true;
set metastore.compactor.cleaner.on=true;
set metastore.compactor.worker.threads=5;
set hive.support.concurrency=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
CREATE TABLE IF NOT EXISTS t_order
(
order_id BIGINT,
order_type INT,
user_id INT NOT NULL,
address_id BIGINT NOT NULL,
status VARCHAR(50),
PRIMARY KEY (order_id) disable novalidate
) CLUSTERED BY (order_id) INTO 2 BUCKETS STORED AS ORC TBLPROPERTIES ('transactional' = 'true');
```

The second option is to use Iceberg table. The possible table creation process is as follows.
Apache Iceberg table format is poised to replace the traditional Hive table format in the coming years,
see https://blog.cloudera.com/from-hive-tables-to-iceberg-tables-hassle-free/ .

```sql
set iceberg.mr.schema.auto.conversion=true;
CREATE TABLE IF NOT EXISTS t_order
(
order_id BIGINT,
order_type INT,
user_id INT NOT NULL,
address_id BIGINT NOT NULL,
status VARCHAR(50),
PRIMARY KEY (order_id) disable novalidate
) STORED BY ICEBERG STORED AS ORC TBLPROPERTIES ('format-version' = '2');
```

Since HiveServer2 JDBC Driver does not implement `java.sql.DatabaseMetaData#getURL()`,
ShardingSphere has done some obfuscation, so users can only connect to HiveServer2 through HikariCP for now.

HiveServer2 does not support local transactions, XA transactions, and Seata AT mode transactions at the ShardingSphere integration level.
More discussion is available at https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions .

8. Due to https://github.com/oracle/graal/issues/7979 ,
the Oracle JDBC Driver corresponding to the `com.oracle.database.jdbc:ojdbc8` Maven module cannot be used under GraalVM Native Image.

Expand Down
Loading

0 comments on commit 9637ec8

Please sign in to comment.