diff --git a/README.md b/README.md index 5af6edf..7edec22 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,8 @@ # Dandihub -This Terraform blueprint creates a Kubernetes environment (EKS) and installs JupyterHub. Based on [AWS Data on EKS JupyterHub](https://github.com/awslabs/data-on-eks/tree/main/ai-ml/jupyterhub). +This Terraform blueprint creates a Kubernetes environment (EKS) and installs JupyterHub. +Based on [AWS Data on EKS JupyterHub](https://github.com/awslabs/data-on-eks/tree/main/ai-ml/jupyterhub). +For more information, see our [Architecture documentation](doc/architecture.md). ## Table of Contents diff --git a/doc/architecture.md b/doc/architecture.md new file mode 100644 index 0000000..3e607d3 --- /dev/null +++ b/doc/architecture.md @@ -0,0 +1,41 @@ +# Deployment Architecture + +A deployment of this project creates a core set of system-components that are intended to always be active, as well as dynamically creates set of virtual server for each active user. + +The core components are responsible for managing the dynamic components. + +All images in this doc can be viewed/edited via [draw.io](https://app.diagrams.net/#G1AutjQn7oE7zq2Coj9ujn9g8HhteyLY_d) + +![autoscaling-kabi](static/autoscaling-kabi.png) + +## Autoscaling + +![scaling-up](static/scale-up.png) + +Here we see the actions that occur when a user requests a new Jupyerlab Server. + +1. The user selects one of the server profile options +2. The Hub begins the process by creating a [Kubernetes Pod](https://kubernetes.io/docs/concepts/workloads/pods/) +3. If there is room to schedule the pod on an existing [Kubernete Node](https://kubernetes.io/docs/concepts/architecture/nodes/), its is started there. + Otherwise, a new [Karpenter Node Claim](https://karpenter.sh/docs/concepts/nodeclaims/) is + created. +4. Karpenter ensures that there is a Node to fulfill each NodeClaim. + If there are not enough Nodes, Karpenter will create a new one (in our case, an AWS EC2 instance) +5. Once the Node is ready, the Pod starts up + +![scaling-down](static/scale-down.png) + +Here we see the actions that occur when a user stops using their server. + +1. The user either deletes their server, or is idle for more than the timeout (default 1 hour). +2. The Hub recieves the request from the user, or from the culler, and deletes the pod. +3. If the Node is now empty (drained) Karpenter deletes the Node/EC2 instance. + +## Traffic + +![traffic-routing](static/traffic-routing.png) + +All traffic is routed through DNS (AWS Route 53) to a load balancer, which distributes the traffic +either to the Hub server, or to the a specific JupyterLab Pod. + + diff --git a/doc/static/autoscaling-kabi.png b/doc/static/autoscaling-kabi.png new file mode 100644 index 0000000..2347c8e Binary files /dev/null and b/doc/static/autoscaling-kabi.png differ diff --git a/doc/static/scale-down.png b/doc/static/scale-down.png new file mode 100644 index 0000000..6042cbf Binary files /dev/null and b/doc/static/scale-down.png differ diff --git a/doc/static/scale-up.png b/doc/static/scale-up.png new file mode 100644 index 0000000..105a020 Binary files /dev/null and b/doc/static/scale-up.png differ diff --git a/doc/static/traffic-routing.png b/doc/static/traffic-routing.png new file mode 100644 index 0000000..e522621 Binary files /dev/null and b/doc/static/traffic-routing.png differ