Skip to content

Latest commit

 

History

History
39 lines (31 loc) · 1.13 KB

GCP.md

File metadata and controls

39 lines (31 loc) · 1.13 KB

Quickstart

Notebook

Step1: Setup Configuration

cp bin/env_template.yaml bin/env.yaml

Fill in the env.yaml file with your own configurations.

Step2: Create a Kubernetes cluster on GCP

source bin/setup.sh

Step3: Create a Jupyter Notebook

A service notebook will be created on the Kubernetes cluster.

Step4: Check Spark Integration

Alt text

Check Spark information by running the following code in a notebook cell:

start()

Step5: Check Spark UI

Alt text

Check Spark UI by clicking the link in the notebook cell output.

Docker Image

  • all-spark-notebook

    • Based on jupyter/all-spark-notebook:spark-3.5.0
    • Include Google Cloud SDK and GCS connector
    • Include pyspark startup script
    • Include notebook save hook function to save notebook to GCS
  • spark-history-server

    • Based on apache/spark:3.5.0
    • Include GCS connector