Spark optimisation training and workshop
This builds all images needed for the setup.
cd docker; ./build.sh
This script creates required directories, which are used by the setup.
./init_env.sh
# this directory will be shared among Spark and Jupyter services
mkdir ./shared-vol
# download data, specify --with-csv if you want to download and unzip data in csv format (100Gb) as well
cd shared-vol
../collect_data.sh
# this will start Docker compose application
SHARED_DIR=`pwd`/shared-vol docker-compose -f docker/docker-compose.yml up
Sparklint doesn't fetch new logs automatically. To process new logs you can either add them manually through UI or restart Sparklint docker component
cd docker; docker-compose restart sparklint
Removes all stopped containers and deletes images with intermediate layers.
cd docker; ./cleanup.sh