Unable to retrieve Spark History #371

mateo41 · 2018-04-16T23:10:23Z

Hi folks,

I'm running a Spark 2.2.0 cluster on AWS EMR. I've been able to get the web application running and I've changed the Fetcher configuration to read from the correct application logs directory in hdfs. However, dr-elephant does not read in the spark application logs, because it expects them to be compressed. Our application logs are not compressed. The issue seems to be in the SparkUtils train and in the pathAndCodecforEventLog method. I'm not sure what exact issue is, but before I start adding logging statements, I'd like to ask you guys first.

Here is the stack trace that I'm seeing.

04-16-2018 22:59:54 ERROR [dr-el-executor-thread-2] com.linkedin.drelephant.ElephantRunner : java.security.PrivilegedActionException: java.io.FileNotFoundException: File does not exist: /var/log/spark/apps/application_1504037151523_312527.lz4
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:360)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1678)
at com.linkedin.drelephant.security.HadoopSecurity.doAs(HadoopSecurity.java:109)
at org.apache.spark.deploy.history.SparkFSFetcher.doAsPrivilegedAction(SparkFSFetcher.scala:78)
at org.apache.spark.deploy.history.SparkFSFetcher.fetchData(SparkFSFetcher.scala:74)
at com.linkedin.drelephant.spark.fetchers.FSFetcher.fetchData(FSFetcher.scala:34)
at com.linkedin.drelephant.spark.fetchers.FSFetcher.fetchData(FSFetcher.scala:29)
at com.linkedin.drelephant.analysis.AnalyticJob.getAnalysis(AnalyticJob.java:247)
at com.linkedin.drelephant.ElephantRunner$ExecutorJob.run(ElephantRunner.java:175)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.FileNotFoundException: File does not exist: /var/log/spark/apps/application_1504037151523_312527.lz4
at sun.reflect.GeneratedConstructorAccessor32.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.toIOException(WebHdfsFileSystem.java:399)
at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$600(WebHdfsFileSystem.java:98)
at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.shouldRetry(WebHdfsFileSystem.java:686)
at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:652)
at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:472)
at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:502)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:498)
at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:877)
at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:887)
at org.apache.spark.deploy.history.SparkFSFetcher.doFetchData(SparkFSFetcher.scala:89)
at org.apache.spark.deploy.history.SparkFSFetcher$$anonfun$fetchData$1.apply(SparkFSFetcher.scala:74)
at org.apache.spark.deploy.history.SparkFSFetcher$$anonfun$fetchData$1.apply(SparkFSFetcher.scala:74)
at org.apache.spark.deploy.history.SparkFSFetcher$$anon$1.run(SparkFSFetcher.scala:78)
... 15 more
Caused by: org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File does not exist: /var/log/spark/apps/application_1504037151523_312527.lz4
at org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:118)
at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:367)
at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$200(WebHdfsFileSystem.java:98)
at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:623)
... 27 more

Thanks,
Matt

mateo41 · 2018-04-20T22:39:08Z

Hi folks,

I was able to solve the issue, but I had to make some modifications to SparkUtils.scala. You can see the changes I made in this PR fixes it. #372. Hmmm, it appears that there is already another PR request to do this. #357. Perhaps the approach taken in that PR is better.

I'm not a scala programmer, but I'd appreciate any feedback. It's also possible that I might have been able to solve my problem by configuring dr-elephant differently. If that's the case, please let me know.

Thanks,
Matt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to retrieve Spark History #371

Unable to retrieve Spark History #371

mateo41 commented Apr 16, 2018

mateo41 commented Apr 20, 2018

Unable to retrieve Spark History #371

Unable to retrieve Spark History #371

Comments

mateo41 commented Apr 16, 2018

mateo41 commented Apr 20, 2018