feat: support read and write from hive datasource #100

awang12345 · 2024-08-16T06:20:23Z

What type of PR is this?

feature

What problem(s) does this PR solve?

Issue(s) number:

Description:

Add hive datasource to read and write

How do you solve it?

hive: {
# algo's data source from hive
read: {
#[Optional] spark and hive require configuration on different clusters
metaStoreUris: "thrift://hive-metastore-server-01:9083"
#spark sql
sql: "select column_1,column_2,column_3 from database_01.table_01 "
#[Optional] graph source vid mapping with column of sql result.
srcId: "column_1"
#[Optional] graph dest vid mapping with column of sql result
dstId: "column_2"
#[Optional] graph weight mapping with column of sql result
weight: "column_3"
}

  # algo result sink into hive
  write: {
    #[Optional] spark and hive require configuration on different clusters
    metaStoreUris: "thrift://hive-metastore-server-02:9083"
    #save result to hive table
    dbTableName: "database_02.table_02"
    #[Optional] spark dataframe save mode，optional of Append,Overwrite,ErrorIfExists,Ignore. Default is Overwrite
    saveMode: "Overwrite"
    #[Optional] if auto create hive table. Default is true
    autoCreateTable: true
    #[Optional] algorithm result mapping with hive table column name. Default same with column name of algo result dataframe
    resultTableColumnMapping: {
      # Note: Different algorithms have different output fields, so let's take the pagerank algorithm for example:
      _id: "column_1"
      pagerank: "pagerank_value"
    }
  }
}

Special notes for your reviewer, ex. impact of this fix, design document, etc:

Spark and hive have no environment validation on different clusters. All other cases have been verified

Nicole00 · 2024-08-16T06:28:00Z

nebula-algorithm/src/main/scala/com/vesoft/nebula/algorithm/config/Configs.scala

+    val autoCreateTable: Boolean = getOrElse(config,"hive.write.autoCreateTable",true)
+    //hive元数据地址
+    val writeMetaStoreUris: String = getOrElse(config,"hive.write.metaStoreUris","")
+    //执行结果和表字段映射关系，比如将算法结果中的_id映射为user_id


please update the comment to English~

Nicole00 · 2024-08-16T06:29:02Z

nebula-algorithm/src/main/scala/com/vesoft/nebula/algorithm/reader/DataReader.scala

+      data.repartition(partitionNum)
+    }
+
+    data.show(3)


no need to show.

Nicole00 · 2024-08-16T06:29:29Z

nebula-algorithm/src/main/scala/com/vesoft/nebula/algorithm/writer/AlgoWriter.scala

+    }
+
+    println(s"Save to hive:${config.dbTableName}, saveMode:${saveMode}")
+    _data.show(3)


Nicole00

LGTM

feat: support read and write from hive datasource

eaf7b90

Nicole00 reviewed Aug 16, 2024

View reviewed changes

awang12345 added 2 commits August 16, 2024 18:25

feat: connect hive by meta store

f1a2708

refactor: remove show dataFrame

a1ff1f8

Nicole00 approved these changes Aug 19, 2024

View reviewed changes

Nicole00 merged commit 4accdfe into vesoft-inc:master Aug 19, 2024
2 checks passed

wey-gu mentioned this pull request Aug 24, 2024

Weekly Report 2024-08-23 vesoft-inc/nebula-community#450

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support read and write from hive datasource #100

feat: support read and write from hive datasource #100

awang12345 commented Aug 16, 2024

Nicole00 Aug 16, 2024

Nicole00 Aug 16, 2024

Nicole00 Aug 16, 2024

awang12345 Aug 17, 2024

Nicole00 left a comment

feat: support read and write from hive datasource #100

feat: support read and write from hive datasource #100

Conversation

awang12345 commented Aug 16, 2024

What type of PR is this?

What problem(s) does this PR solve?

Issue(s) number:

Description:

How do you solve it?

Special notes for your reviewer, ex. impact of this fix, design document, etc:

Nicole00 Aug 16, 2024

Choose a reason for hiding this comment

Nicole00 Aug 16, 2024

Choose a reason for hiding this comment

Nicole00 Aug 16, 2024

Choose a reason for hiding this comment

awang12345 Aug 17, 2024

Choose a reason for hiding this comment

Nicole00 left a comment

Choose a reason for hiding this comment