Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to retrieve Spark History #371

Open
mateo41 opened this issue Apr 16, 2018 · 1 comment
Open

Unable to retrieve Spark History #371

mateo41 opened this issue Apr 16, 2018 · 1 comment

Comments

@mateo41
Copy link

mateo41 commented Apr 16, 2018

Hi folks,

I'm running a Spark 2.2.0 cluster on AWS EMR. I've been able to get the web application running and I've changed the Fetcher configuration to read from the correct application logs directory in hdfs. However, dr-elephant does not read in the spark application logs, because it expects them to be compressed. Our application logs are not compressed. The issue seems to be in the SparkUtils train and in the pathAndCodecforEventLog method. I'm not sure what exact issue is, but before I start adding logging statements, I'd like to ask you guys first.

Here is the stack trace that I'm seeing.

04-16-2018 22:59:54 ERROR [dr-el-executor-thread-2] com.linkedin.drelephant.ElephantRunner : java.security.PrivilegedActionException: java.io.FileNotFoundException: File does not exist: /var/log/spark/apps/application_1504037151523_312527.lz4
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:360)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1678)
at com.linkedin.drelephant.security.HadoopSecurity.doAs(HadoopSecurity.java:109)
at org.apache.spark.deploy.history.SparkFSFetcher.doAsPrivilegedAction(SparkFSFetcher.scala:78)
at org.apache.spark.deploy.history.SparkFSFetcher.fetchData(SparkFSFetcher.scala:74)
at com.linkedin.drelephant.spark.fetchers.FSFetcher.fetchData(FSFetcher.scala:34)
at com.linkedin.drelephant.spark.fetchers.FSFetcher.fetchData(FSFetcher.scala:29)
at com.linkedin.drelephant.analysis.AnalyticJob.getAnalysis(AnalyticJob.java:247)
at com.linkedin.drelephant.ElephantRunner$ExecutorJob.run(ElephantRunner.java:175)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.FileNotFoundException: File does not exist: /var/log/spark/apps/application_1504037151523_312527.lz4
at sun.reflect.GeneratedConstructorAccessor32.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.toIOException(WebHdfsFileSystem.java:399)
at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$600(WebHdfsFileSystem.java:98)
at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.shouldRetry(WebHdfsFileSystem.java:686)
at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:652)
at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:472)
at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:502)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:498)
at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:877)
at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:887)
at org.apache.spark.deploy.history.SparkFSFetcher.doFetchData(SparkFSFetcher.scala:89)
at org.apache.spark.deploy.history.SparkFSFetcher$$anonfun$fetchData$1.apply(SparkFSFetcher.scala:74)
at org.apache.spark.deploy.history.SparkFSFetcher$$anonfun$fetchData$1.apply(SparkFSFetcher.scala:74)
at org.apache.spark.deploy.history.SparkFSFetcher$$anon$1.run(SparkFSFetcher.scala:78)
... 15 more
Caused by: org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File does not exist: /var/log/spark/apps/application_1504037151523_312527.lz4
at org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:118)
at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:367)
at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$200(WebHdfsFileSystem.java:98)
at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:623)
... 27 more

Thanks,
Matt

@mateo41
Copy link
Author

mateo41 commented Apr 20, 2018

Hi folks,

I was able to solve the issue, but I had to make some modifications to SparkUtils.scala. You can see the changes I made in this PR fixes it. #372. Hmmm, it appears that there is already another PR request to do this. #357. Perhaps the approach taken in that PR is better.

I'm not a scala programmer, but I'd appreciate any feedback. It's also possible that I might have been able to solve my problem by configuring dr-elephant differently. If that's the case, please let me know.

Thanks,
Matt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant