Skip to content
This repository has been archived by the owner on Mar 27, 2022. It is now read-only.

Spark eventlog directory points to GCS even if default_fs is set to hdfs #35

Open
dennishuo opened this issue Jun 26, 2015 · 0 comments
Assignees

Comments

@dennishuo
Copy link
Contributor

Right now spark.eventLog.dir gets set to a GCS path regardless of what DEFAULT_FS is set for deployment; this means if a deployment intentionally disables GCS accessibility, e.g. by removing external IP addresses, then even an HDFS-only setup doesn't work for Spark.

The temporary workaround is to manually edit spark.eventLog.dir on the master's /home/hadoop/spark-install/conf/spark-defaults.conf to something like hdfs:///spark-eventlog-base and to run hadoop fs -mkdir -p hdfs:///spark-eventlog-base, or to set spark.eventLog.enabled to false.

We can fix this to automatically derive the right path based on the default filesystem. Unfortunately Spark doesn't appear to correctly pick up the fs.default.name automatically for schemeless paths, possibly because of classloading ordering issues so that the path is resolved before default core-site.xml has been loaded; schemeless settings end up with something like:

java.lang.IllegalArgumentException: Log directory file:/spark-eventlog-base/dhuo-noip-m does not exist.
@dennishuo dennishuo self-assigned this Jun 26, 2015
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant