Issue with Glue to Hive through Amazon S3 Objects

i am trying to migrate  Glue Catalog to Hive Metastore of an EMR Cluster. I am using mysql 8 RDS instance as my hive-metastore. As mysql 8 connection is not supported in glue so i am trying to migrate using S3 objects. 

I am able to successfully migrate from glue to S3 objects But when i am trying to run spark job for hive metastore-migration from S3 to to-metastore then it is failing with

java.sql.BatchUpdateException: Duplicate entry 'events_db' for key 'DBS.UNIQUE_DATABASE'
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at com.mysql.cj.util.Util.handleNewInstance(Util.java:192)
        at com.mysql.cj.util.Util.getInstance(Util.java:167)
        at com.mysql.cj.util.Util.getInstance(Util.java:174)
        at com.mysql.cj.jdbc.exceptions.SQLError.createBatchUpdateException(SQLError.java:224)
        at com.mysql.cj.jdbc.ClientPreparedStatement.executeBatchSerially(ClientPreparedStatement.java:853)
        at com.mysql.cj.jdbc.ClientPreparedStatement.executeBatchInternal(ClientPreparedStatement.java:435)
        at com.mysql.cj.jdbc.StatementImpl.executeBatch(StatementImpl.java:796)
        at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.savePartition(JdbcUtils.scala:672)
        at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:834)
        at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:834)
        at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:935)
        at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:935)
        at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
        at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
        at org.apache.spark.scheduler.Task.run(Task.scala:123)

I am running spark job on my EMR cluster like this :
spark-submit  --deploy-mode cluster  hive_metastore_migration.py --mode 'to-metastore' --jdbc-url <> --jdbc-username <> --jdbc-password <>  --input_path 's3://<path>'

Any workaround to resolve this problem. 



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Issue with Glue to Hive through Amazon S3 Objects #72

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue with Glue to Hive through Amazon S3 Objects #72

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions