NPE while reading Multiple Partitions #101

adiu19 · 2020-11-23T06:35:15Z

Hi Guys, we have integrated spark-acid library into our production pipeline and recently started facing an issue while reading data from a lot of partitions. Below is the stack trace -

Caused by: java.lang.NullPointerException at org.apache.hadoop.conf.Configuration.<init>(Configuration.java:820) at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:440) at com.qubole.spark.hiveacid.rdd.HiveAcidRDD$.getJobConf(HiveAcidRDD.scala:457) at com.qubole.spark.hiveacid.reader.hive.HiveAcidPartitionComputer$$anonfun$2.apply(HiveAcidPartitionComputer.scala:73) at com.qubole.spark.hiveacid.reader.hive.HiveAcidPartitionComputer$$anonfun$2.apply(HiveAcidPartitionComputer.scala:69) at scala.collection.Iterator$$anon$11.next(Iterator.scala:410) at scala.collection.Iterator$class.foreach(Iterator.scala:891) at scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48) at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310) at scala.collection.AbstractIterator.to(Iterator.scala:1334) at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302) at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1334) at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289) at scala.collection.AbstractIterator.toArray(Iterator.scala:1334) at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:945) at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:945) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:121) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

All our tables are Hive ACID tables and our partitions have a two-level nesting based on date and are created dynamically. The read works perfectly fine if we execute it in smaller chunks. Has anyone faced this issue?

The text was updated successfully, but these errors were encountered:

amoghmargoor · 2020-11-27T19:02:32Z

Hi @adiu19, Seems like some bug in broadcasting Jobconf. Can you set this config to really high value so that it bypasses this code path spark.hiveAcid.parallel.partitioning.threshold ? This should be set more than your number of partitions.

adiu19 · 2020-12-03T06:41:21Z

@amoghmargoor thanks a lot, this worked. Are we planning to address this bug in any upcoming release?

amoghmargoor · 2020-12-05T17:33:20Z

@adiu19 yeah will take a look at it for next release. thanks for reporting.I may need some help with reproducing it if i cannot on our end.

maheshk114 · 2020-12-09T06:13:42Z

@amoghmargoor looks similar to the kryo serialization issue.

amoghmargoor · 2020-12-09T17:34:57Z

@maheshk114 in that case i believe it should have failed without the flag being disabled too. but anyways a good point to consider. @adiu19 Can you check if you guys were using Kyro on your end ?

adiu19 · 2020-12-30T10:30:43Z

@amoghmargoor : we aren't using Kyro on our side.

amoghmargoor self-assigned this Nov 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NPE while reading Multiple Partitions #101

NPE while reading Multiple Partitions #101

adiu19 commented Nov 23, 2020

amoghmargoor commented Nov 27, 2020 •

edited

Loading

adiu19 commented Dec 3, 2020

amoghmargoor commented Dec 5, 2020 •

edited

Loading

maheshk114 commented Dec 9, 2020

amoghmargoor commented Dec 9, 2020

adiu19 commented Dec 30, 2020

NPE while reading Multiple Partitions #101

NPE while reading Multiple Partitions #101

Comments

adiu19 commented Nov 23, 2020

amoghmargoor commented Nov 27, 2020 • edited Loading

adiu19 commented Dec 3, 2020

amoghmargoor commented Dec 5, 2020 • edited Loading

maheshk114 commented Dec 9, 2020

amoghmargoor commented Dec 9, 2020

adiu19 commented Dec 30, 2020

amoghmargoor commented Nov 27, 2020 •

edited

Loading

amoghmargoor commented Dec 5, 2020 •

edited

Loading