-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NPE while reading Multiple Partitions #101
Comments
Hi @adiu19, Seems like some bug in broadcasting Jobconf. Can you set this config to really high value so that it bypasses this code path |
@amoghmargoor thanks a lot, this worked. Are we planning to address this bug in any upcoming release? |
@adiu19 yeah will take a look at it for next release. thanks for reporting.I may need some help with reproducing it if i cannot on our end. |
@amoghmargoor looks similar to the kryo serialization issue. |
@maheshk114 in that case i believe it should have failed without the flag being disabled too. but anyways a good point to consider. @adiu19 Can you check if you guys were using Kyro on your end ? |
@amoghmargoor : we aren't using Kyro on our side. |
Hi Guys, we have integrated spark-acid library into our production pipeline and recently started facing an issue while reading data from a lot of partitions. Below is the stack trace -
Caused by: java.lang.NullPointerException at org.apache.hadoop.conf.Configuration.<init>(Configuration.java:820) at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:440) at com.qubole.spark.hiveacid.rdd.HiveAcidRDD$.getJobConf(HiveAcidRDD.scala:457) at com.qubole.spark.hiveacid.reader.hive.HiveAcidPartitionComputer$$anonfun$2.apply(HiveAcidPartitionComputer.scala:73) at com.qubole.spark.hiveacid.reader.hive.HiveAcidPartitionComputer$$anonfun$2.apply(HiveAcidPartitionComputer.scala:69) at scala.collection.Iterator$$anon$11.next(Iterator.scala:410) at scala.collection.Iterator$class.foreach(Iterator.scala:891) at scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48) at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310) at scala.collection.AbstractIterator.to(Iterator.scala:1334) at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302) at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1334) at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289) at scala.collection.AbstractIterator.toArray(Iterator.scala:1334) at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:945) at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:945) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:121) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
All our tables are Hive ACID tables and our partitions have a two-level nesting based on date and are created dynamically. The read works perfectly fine if we execute it in smaller chunks. Has anyone faced this issue?
The text was updated successfully, but these errors were encountered: