Opinionated deployment of a PANGEO-style JupyterHub with Terraform
A cloud based JupyterHub close to your data is a great way to run interactive computations, especially paired with Dask for parallel compute. However, setting these up on your cloud provider of choice in an automated fashion with reasonable defaults can be a chore. This project aims to automate as much of that as possible.
This project's goal is to help you set up and maintain this kind of environment in a completely automated fashion - including setting up all the cloud infrastructure necessary. We do this by leveraging open source projects like terraform, helm and zero-to-jupyterhub.
Currently, there is only code for AWS here. However, we hope other cloud providers will be represented here soon enough.
You'll need the following tools installed:
- Terraform.
If you are on MacOS, you can install it with
brew install terraform
- kubectl.
If you are on MacOS, you can install it with
brew install kubectl
- AWS CLI
You need to have the aws
CLI configured to run correctly from your
local machine - terraform will just read from the same source. The
documentation on configuring AWS CLI
should help.
The terraform deployment needs several variable names set before it
can start. You can copy the file aws/your-cluster.tfvars.template
into a file
named aws/<your-cluster>.tfvars
, and modify the placeholders there
as appropriate.
Once this is all done, you should:
a. cd aws
b. Run terraform init
to set up appropriate plugins
c. Run terraform apply -var-file=<your-cluster>.tfvars
, referring to
The tfvars
file you made in step 3
d. Type yes
when prompted
e. .
This could take a while!
Your cluster is now set up! There are no hubs on it yet though. You should make a copy of the hubploy template repo, and go from there.