You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm running a Spark 2.2.0 cluster on AWS EMR. I've been able to get the web application running and I've changed the Fetcher configuration to read from the correct application logs directory in hdfs. However, dr-elephant does not read in the spark application logs, because it expects them to be compressed. Our application logs are not compressed. The issue seems to be in the SparkUtils train and in the pathAndCodecforEventLog method. I'm not sure what exact issue is, but before I start adding logging statements, I'd like to ask you guys first.
Here is the stack trace that I'm seeing.
04-16-2018 22:59:54 ERROR [dr-el-executor-thread-2] com.linkedin.drelephant.ElephantRunner : java.security.PrivilegedActionException: java.io.FileNotFoundException: File does not exist: /var/log/spark/apps/application_1504037151523_312527.lz4
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:360)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1678)
at com.linkedin.drelephant.security.HadoopSecurity.doAs(HadoopSecurity.java:109)
at org.apache.spark.deploy.history.SparkFSFetcher.doAsPrivilegedAction(SparkFSFetcher.scala:78)
at org.apache.spark.deploy.history.SparkFSFetcher.fetchData(SparkFSFetcher.scala:74)
at com.linkedin.drelephant.spark.fetchers.FSFetcher.fetchData(FSFetcher.scala:34)
at com.linkedin.drelephant.spark.fetchers.FSFetcher.fetchData(FSFetcher.scala:29)
at com.linkedin.drelephant.analysis.AnalyticJob.getAnalysis(AnalyticJob.java:247)
at com.linkedin.drelephant.ElephantRunner$ExecutorJob.run(ElephantRunner.java:175)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.FileNotFoundException: File does not exist: /var/log/spark/apps/application_1504037151523_312527.lz4
at sun.reflect.GeneratedConstructorAccessor32.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.toIOException(WebHdfsFileSystem.java:399)
at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$600(WebHdfsFileSystem.java:98)
at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.shouldRetry(WebHdfsFileSystem.java:686)
at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:652)
at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:472)
at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:502)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:498)
at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:877)
at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:887)
at org.apache.spark.deploy.history.SparkFSFetcher.doFetchData(SparkFSFetcher.scala:89)
at org.apache.spark.deploy.history.SparkFSFetcher$$anonfun$fetchData$1.apply(SparkFSFetcher.scala:74)
at org.apache.spark.deploy.history.SparkFSFetcher$$anonfun$fetchData$1.apply(SparkFSFetcher.scala:74)
at org.apache.spark.deploy.history.SparkFSFetcher$$anon$1.run(SparkFSFetcher.scala:78)
... 15 more
Caused by: org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File does not exist: /var/log/spark/apps/application_1504037151523_312527.lz4
at org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:118)
at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:367)
at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$200(WebHdfsFileSystem.java:98)
at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:623)
... 27 more
Thanks,
Matt
The text was updated successfully, but these errors were encountered:
I was able to solve the issue, but I had to make some modifications to SparkUtils.scala. You can see the changes I made in this PR fixes it. #372. Hmmm, it appears that there is already another PR request to do this. #357. Perhaps the approach taken in that PR is better.
I'm not a scala programmer, but I'd appreciate any feedback. It's also possible that I might have been able to solve my problem by configuring dr-elephant differently. If that's the case, please let me know.
Hi folks,
I'm running a Spark 2.2.0 cluster on AWS EMR. I've been able to get the web application running and I've changed the Fetcher configuration to read from the correct application logs directory in hdfs. However, dr-elephant does not read in the spark application logs, because it expects them to be compressed. Our application logs are not compressed. The issue seems to be in the SparkUtils train and in the pathAndCodecforEventLog method. I'm not sure what exact issue is, but before I start adding logging statements, I'd like to ask you guys first.
Here is the stack trace that I'm seeing.
04-16-2018 22:59:54 ERROR [dr-el-executor-thread-2] com.linkedin.drelephant.ElephantRunner : java.security.PrivilegedActionException: java.io.FileNotFoundException: File does not exist: /var/log/spark/apps/application_1504037151523_312527.lz4
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:360)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1678)
at com.linkedin.drelephant.security.HadoopSecurity.doAs(HadoopSecurity.java:109)
at org.apache.spark.deploy.history.SparkFSFetcher.doAsPrivilegedAction(SparkFSFetcher.scala:78)
at org.apache.spark.deploy.history.SparkFSFetcher.fetchData(SparkFSFetcher.scala:74)
at com.linkedin.drelephant.spark.fetchers.FSFetcher.fetchData(FSFetcher.scala:34)
at com.linkedin.drelephant.spark.fetchers.FSFetcher.fetchData(FSFetcher.scala:29)
at com.linkedin.drelephant.analysis.AnalyticJob.getAnalysis(AnalyticJob.java:247)
at com.linkedin.drelephant.ElephantRunner$ExecutorJob.run(ElephantRunner.java:175)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.FileNotFoundException: File does not exist: /var/log/spark/apps/application_1504037151523_312527.lz4
at sun.reflect.GeneratedConstructorAccessor32.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.toIOException(WebHdfsFileSystem.java:399)
at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$600(WebHdfsFileSystem.java:98)
at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.shouldRetry(WebHdfsFileSystem.java:686)
at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:652)
at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:472)
at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:502)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:498)
at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:877)
at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:887)
at org.apache.spark.deploy.history.SparkFSFetcher.doFetchData(SparkFSFetcher.scala:89)
at org.apache.spark.deploy.history.SparkFSFetcher$$anonfun$fetchData$1.apply(SparkFSFetcher.scala:74)
at org.apache.spark.deploy.history.SparkFSFetcher$$anonfun$fetchData$1.apply(SparkFSFetcher.scala:74)
at org.apache.spark.deploy.history.SparkFSFetcher$$anon$1.run(SparkFSFetcher.scala:78)
... 15 more
Caused by: org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File does not exist: /var/log/spark/apps/application_1504037151523_312527.lz4
at org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:118)
at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:367)
at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$200(WebHdfsFileSystem.java:98)
at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:623)
... 27 more
Thanks,
Matt
The text was updated successfully, but these errors were encountered: