Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add architecture document with diagrams #194

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
# Dandihub

This Terraform blueprint creates a Kubernetes environment (EKS) and installs JupyterHub. Based on [AWS Data on EKS JupyterHub](https://github.com/awslabs/data-on-eks/tree/main/ai-ml/jupyterhub).
This Terraform blueprint creates a Kubernetes environment (EKS) and installs JupyterHub.
Based on [AWS Data on EKS JupyterHub](https://github.com/awslabs/data-on-eks/tree/main/ai-ml/jupyterhub).
For more information, see our [Architecture documentation](doc/architecture.md).

## Table of Contents

Expand Down
41 changes: 41 additions & 0 deletions doc/architecture.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# Deployment Architecture

A deployment of this project creates a core set of system-components that are intended to always be active, as well as dynamically creates set of virtual server for each active user.

The core components are responsible for managing the dynamic components.

All images in this doc can be viewed/edited via [draw.io](https://app.diagrams.net/#G1AutjQn7oE7zq2Coj9ujn9g8HhteyLY_d)

![autoscaling-kabi](static/autoscaling-kabi.png)

## Autoscaling

![scaling-up](static/scale-up.png)

Here we see the actions that occur when a user requests a new Jupyerlab Server.

1. The user selects one of the server profile options
2. The Hub begins the process by creating a [Kubernetes Pod](https://kubernetes.io/docs/concepts/workloads/pods/)
3. If there is room to schedule the pod on an existing [Kubernete Node](https://kubernetes.io/docs/concepts/architecture/nodes/), its is started there.
Otherwise, a new [Karpenter Node Claim](https://karpenter.sh/docs/concepts/nodeclaims/) is
created.
4. Karpenter ensures that there is a Node to fulfill each NodeClaim.
If there are not enough Nodes, Karpenter will create a new one (in our case, an AWS EC2 instance)
5. Once the Node is ready, the Pod starts up

![scaling-down](static/scale-down.png)

Here we see the actions that occur when a user stops using their server.

1. The user either deletes their server, or is idle for more than the timeout (default 1 hour).
2. The Hub recieves the request from the user, or from the culler, and deletes the pod.
3. If the Node is now empty (drained) Karpenter deletes the Node/EC2 instance.

## Traffic

![traffic-routing](static/traffic-routing.png)

All traffic is routed through DNS (AWS Route 53) to a load balancer, which distributes the traffic
either to the Hub server, or to the a specific JupyterLab Pod.


Binary file added doc/static/autoscaling-kabi.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/static/scale-down.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/static/scale-up.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/static/traffic-routing.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.