David's bdutil fork

Hadoop/Yarn/Spark, etc. are stuck on Java 7/JRE1.7. If you project needs Java 1.8, you can use this repo.

Instructions

Clone this repo
Start the cluster Start the cluster using extensions/spark/spark_env.sh, e.g.,

./bdutil --bucket your-bucket -n 1 -P your-project --env_var_files extensions/spark/spark_env.sh --zone us-central1-a deploy
(Optional) If you see repeated messages about being unable to ssh, you may need to add the all-ssh tag. If your project has a ssh-whitelist (which is pretty reasonable), you may need to tag your master and workers with all-ssh so the bdutil script can ssh in and install Spark, etc. The easiest way is to go to https://console.developers.google.com/, find your master and workers, and add all-ssh on each one. You'll know that you need this if the bdutil script saying it's waiting for the master and workers to start up.
Submit Spark jobs as you normally would (with spark-submit on the master).

Tools for creating Hadoop and Spark clusters on Google Compute Engine. See http://cloud.google.com/hadoop for more information.

Name		Name	Last commit message	Last commit date
Latest commit History 163 Commits
conf		conf
extensions		extensions
libexec		libexec
platforms		platforms
sampleapps/querytools		sampleapps/querytools
samples		samples
CHANGES.txt		CHANGES.txt
CONTRIBUTING		CONTRIBUTING
LICENSE		LICENSE
README.md		README.md
bdutil		bdutil
bdutil_env.sh		bdutil_env.sh
bigquery_env.sh		bigquery_env.sh
hadoop-validate-setup.sh		hadoop-validate-setup.sh
hadoop2_env.sh		hadoop2_env.sh
single_node_env.sh		single_node_env.sh
standalone_nfs_cache_env.sh		standalone_nfs_cache_env.sh