You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
i am trying to migrate Glue Catalog to Hive Metastore of an EMR Cluster. I am using mysql 8 RDS instance as my hive-metastore. As mysql 8 connection is not supported in glue so i am trying to migrate using S3 objects.
I am able to successfully migrate from glue to S3 objects But when i am trying to run spark job for hive metastore-migration from S3 to to-metastore then it is failing with
java.sql.BatchUpdateException: Duplicate entry 'events_db' for key 'DBS.UNIQUE_DATABASE'
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at com.mysql.cj.util.Util.handleNewInstance(Util.java:192)
at com.mysql.cj.util.Util.getInstance(Util.java:167)
at com.mysql.cj.util.Util.getInstance(Util.java:174)
at com.mysql.cj.jdbc.exceptions.SQLError.createBatchUpdateException(SQLError.java:224)
at com.mysql.cj.jdbc.ClientPreparedStatement.executeBatchSerially(ClientPreparedStatement.java:853)
at com.mysql.cj.jdbc.ClientPreparedStatement.executeBatchInternal(ClientPreparedStatement.java:435)
at com.mysql.cj.jdbc.StatementImpl.executeBatch(StatementImpl.java:796)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.savePartition(JdbcUtils.scala:672)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:834)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:834)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:935)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:935)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
I am running spark job on my EMR cluster like this :
spark-submit --deploy-mode cluster hive_metastore_migration.py --mode 'to-metastore' --jdbc-url <> --jdbc-username <> --jdbc-password <> --input_path 's3://'
Any workaround to resolve this problem.
The text was updated successfully, but these errors were encountered:
I followed the steps mentioned but whenever I run the Glue Job, it fails with the below error:
Invalid input: Exception in thread "main" java.io.FileNotFoundException: File file:/tmp/g-c8df90907da0771653b3edb778243f5a38e73e14-2295083367131776400/Scripts/hive_metastore_migration.py does not exist
i am trying to migrate Glue Catalog to Hive Metastore of an EMR Cluster. I am using mysql 8 RDS instance as my hive-metastore. As mysql 8 connection is not supported in glue so i am trying to migrate using S3 objects.
I am able to successfully migrate from glue to S3 objects But when i am trying to run spark job for hive metastore-migration from S3 to to-metastore then it is failing with
java.sql.BatchUpdateException: Duplicate entry 'events_db' for key 'DBS.UNIQUE_DATABASE'
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at com.mysql.cj.util.Util.handleNewInstance(Util.java:192)
at com.mysql.cj.util.Util.getInstance(Util.java:167)
at com.mysql.cj.util.Util.getInstance(Util.java:174)
at com.mysql.cj.jdbc.exceptions.SQLError.createBatchUpdateException(SQLError.java:224)
at com.mysql.cj.jdbc.ClientPreparedStatement.executeBatchSerially(ClientPreparedStatement.java:853)
at com.mysql.cj.jdbc.ClientPreparedStatement.executeBatchInternal(ClientPreparedStatement.java:435)
at com.mysql.cj.jdbc.StatementImpl.executeBatch(StatementImpl.java:796)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.savePartition(JdbcUtils.scala:672)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:834)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:834)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:935)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:935)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
I am running spark job on my EMR cluster like this :
spark-submit --deploy-mode cluster hive_metastore_migration.py --mode 'to-metastore' --jdbc-url <> --jdbc-username <> --jdbc-password <> --input_path 's3://'
Any workaround to resolve this problem.
The text was updated successfully, but these errors were encountered: