Description
I have been struggling with setting up this framework on my EC2 server. I tried the best to follow the instruction of both this repo and also faromero's forked repo, but I have been getting this error message each time I run sudo ./../driver/bin/spark-submit ml_kmeans.py --master lambda://test
:
19/04/05 05:11:40 ERROR ShuffleBlockFetcherIterator: Error occurred while fetching local blocks
java.io.FileNotFoundException: No such file or directory: s3://mc-cse597cc/tmp/executor-driver-4775351731/30/shuffle_0_0_0.index
at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1636)
at org.apache.hadoop.fs.s3a.S3AFileSystem.open(S3AFileSystem.java:684)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:772)
at org.apache.spark.shuffle.S3ShuffleBlockResolver.getRemoteBlockData(S3ShuffleBlockResolver.scala:240)
at org.apache.spark.shuffle.S3ShuffleBlockResolver.getBlockData(S3ShuffleBlockResolver.scala:263)
at org.apache.spark.storage.BlockManager.getBlockData(BlockManager.scala:318)
at org.apache.spark.storage.ShuffleBlockFetcherIterator.fetchLocalBlocks(ShuffleBlockFetcherIterator.scala:258)
at org.apache.spark.storage.ShuffleBlockFetcherIterator.initialize(ShuffleBlockFetcherIterator.scala:292)
at org.apache.spark.storage.ShuffleBlockFetcherIterator.<init>(ShuffleBlockFetcherIterator.scala:120)
at org.apache.spark.shuffle.BlockStoreShuffleReader.read(BlockStoreShuffleReader.scala:45)
at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:109)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
19/04/05 05:11:40 WARN TaskSetManager: Lost task 0.0 in stage 8.0 (TID 8, localhost, executor driver): FetchFailed(BlockManagerId(driver, 172.31.123.183, 33995, None), shuffleId=0, mapId=0, reduceId=0, message=
org.apache.spark.shuffle.FetchFailedException: No such file or directory: s3://mc-cse597cc/tmp/executor-driver-4775351731/30/shuffle_0_0_0.index
at org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:357)
at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:332)
at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:54)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:154)
at org.apache.spark.Aggregator.combineCombinersByKey(Aggregator.scala:50)
at org.apache.spark.shuffle.BlockStoreShuffleReader.read(BlockStoreShuffleReader.scala:85)
at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:109)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.FileNotFoundException: No such file or directory: s3://mc-cse597cc/tmp/executor-driver-4775351731/30/shuffle_0_0_0.index
at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1636)
at org.apache.hadoop.fs.s3a.S3AFileSystem.open(S3AFileSystem.java:684)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:772)
at org.apache.spark.shuffle.S3ShuffleBlockResolver.getRemoteBlockData(S3ShuffleBlockResolver.scala:240)
at org.apache.spark.shuffle.S3ShuffleBlockResolver.getBlockData(S3ShuffleBlockResolver.scala:263)
at org.apache.spark.storage.BlockManager.getBlockData(BlockManager.scala:318)
at org.apache.spark.storage.ShuffleBlockFetcherIterator.fetchLocalBlocks(ShuffleBlockFetcherIterator.scala:258)
at org.apache.spark.storage.ShuffleBlockFetcherIterator.initialize(ShuffleBlockFetcherIterator.scala:292)
at org.apache.spark.storage.ShuffleBlockFetcherIterator.<init>(ShuffleBlockFetcherIterator.scala:120)
at org.apache.spark.shuffle.BlockStoreShuffleReader.read(BlockStoreShuffleReader.scala:45)
... 9 more
)
19/04/05 05:11:40 INFO DAGScheduler: Marking ResultStage 8 (countByValue at KMeans.scala:399) as failed due to a fetch failure from ShuffleMapStage 7 (countByValue at KMeans.scala:399)
19/04/05 05:11:40 INFO DAGScheduler: ResultStage 8 (countByValue at KMeans.scala:399) failed in 0.069 s due to org.apache.spark.shuffle.FetchFailedException: No such file or directory: s3://mc-cse597cc/tmp/executor-driver-4775351731/30/shuffle_0_0_0.index
at org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:357)
at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:332)
at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:54)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:154)
at org.apache.spark.Aggregator.combineCombinersByKey(Aggregator.scala:50)
at org.apache.spark.shuffle.BlockStoreShuffleReader.read(BlockStoreShuffleReader.scala:85)
at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:109)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.FileNotFoundException: No such file or directory: s3://mc-cse597cc/tmp/executor-driver-4775351731/30/shuffle_0_0_0.index
at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1636)
at org.apache.hadoop.fs.s3a.S3AFileSystem.open(S3AFileSystem.java:684)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:772)
at org.apache.spark.shuffle.S3ShuffleBlockResolver.getRemoteBlockData(S3ShuffleBlockResolver.scala:240)
at org.apache.spark.shuffle.S3ShuffleBlockResolver.getBlockData(S3ShuffleBlockResolver.scala:263)
at org.apache.spark.storage.BlockManager.getBlockData(BlockManager.scala:318)
at org.apache.spark.storage.ShuffleBlockFetcherIterator.fetchLocalBlocks(ShuffleBlockFetcherIterator.scala:258)
at org.apache.spark.storage.ShuffleBlockFetcherIterator.initialize(ShuffleBlockFetcherIterator.scala:292)
at org.apache.spark.storage.ShuffleBlockFetcherIterator.<init>(ShuffleBlockFetcherIterator.scala:120)
at org.apache.spark.shuffle.BlockStoreShuffleReader.read(BlockStoreShuffleReader.scala:45)
... 9 more
My config file:
spark.dynamicAllocation.enabled true
spark.dynamicAllocation.minExecutors 2
spark.shuffle.s3.enabled true
spark.lambda.concurrent.requests.max 100
spark.hadoop.fs.s3n.impl org.apache.hadoop.fs.s3a.S3AFileSystem
spark.hadoop.fs.s3.impl org.apache.hadoop.fs.s3a.S3AFileSystem
spark.hadoop.fs.AbstractFileSystem.s3.impl org.apache.hadoop.fs.s3a.S3A
spark.hadoop.fs.AbstractFileSystem.s3n.impl org.apache.hadoop.fs.s3a.S3A
spark.hadoop.fs.AbstractFileSystem.s3a.impl org.apache.hadoop.fs.s3a.S3A
spark.hadoop.qubole.aws.use.v4.signature true
spark.hadoop.fs.s3a.fast.upload true
spark.lambda.function.name spark-lambda
spark.lambda.spark.software.version 149
spark.hadoop.fs.s3a.endpoint s3.us-east-1.amazonaws.com
spark.hadoop.fs.s3n.awsAccessKeyId KEY
spark.hadoop.fs.s3n.awsSecretAccessKey SECRET
spark.shuffle.s3.bucket s3://mc-cse597cc
spark.lambda.s3.bucket s3://mc-cse597cc
~/.aws/config
is us-east-1. VPC subnets are configured following the forked repo's instruction.
My Lambda function is tested to be able to write and read to S3. My spark-submit command ran on EC2 is able to write to S3 (it generates a tmp/
folder on the bucket), but does not let the Lambda run at all. CloudWatch for my Lambda has no logs. However, I am able to run my Lambda from EC2 using something like aws lambda invoke --function-name spark-lambda ~/test.txt
. I guess I configured Spark-on-Lambda wrong but I've been following the instructions.
I am now trying to dive into the source code. Is there any clue for this message?