WebServer highly available on Amazon EKS

1. Developer's guide

The IaC has been developed with AWS CDK and Python. Git Flow has been used in the git repository to achieve a better branching strategy.

The architecture can be deployed over different environments and with different parameters. The json parameter files are contained in the /infrastructure/parameters folder and they are chosen at runtime following the ENVIRONMENT environment variable.

The repository is split in the following way:

root
│   README.md
│   app.py - The entrypoint script
│
└───infrastructure
│   │   infrastructure_stack.py - The architecture's main stack
│   │
│   └───parameters - The folder containing the environment's parameters
│   │    │   dev.json
│   │    │   prod.json
│   │
│   └───stacks - The folder containing the stacks code
│   │    │   vpc_stack.py
│   │    │   eks_stack.py
│   │    │   alb_ingress_stack.py
│   │    │   metrics_server_stack.py
│   │    │   pipeline_stack.py
│   │
│   └───utils - A folder containing utility scripts
└───images
│   │
│   └───web_server - The folder containing the files needed for the docker image build
│        │   Dockerfile
│        │   index.html - The custom index page to be served by the web server
│
└───helm
    │
    └───ccekswebserver - The folder containing the helm charts needed to spin up the web server

Prerequisites installation

Install Python (https://www.python.org/downloads/)
Install node.js (https://nodejs.org/en/download/)
Install the npm dependency of CDK npm i -g aws-cdk:1.137.0
Install the development dependencies for Python pip3 install -r requirements-dev.txt
Create a Python virtual environment if it doesn't exist python3 -m venv ./.venv
Activate the Python virtual environment to avoid the polluting the global environment
1. Windows .\.venv\bin\activate
2. Unix source .venv/bin/activate
Install Python's dependencies pip3 install -r requirements.txt
Set the environment variable ENVIRONMENT to dev or prod - if it doesn't exist, dev is the default
If it's the first time that the infrastructure is deployed on the AWS account, it's important to execute the command cdk bootstrap
Deploy the infrastructure with cdk deploy

2. Architectural choices

VPC:

The conformation of the VPC is easily customizable through the use of parameters. The stack that deploys the VPC always create public and private subnets. The number of availability zones on which subnets are created is driven from the az_number parameter. In a production environment, this parameter is set to 3 to allow deployment the various resources in high reliability.

Furthermore, to better manage the costs related to the network infrastructure, it is possible to decide through the nats_number parameter the number of nat gateways to deploy. In development environments it is possible to use a single nat gateway to reduce costs (a nat gateway charges around $ 30 per month on AWS billing). In production, instead, it is recommended to set the nats_number parameter to 3 (one per natted subnet) to avoid cutting fully access to the internet of the application in case of fail of an AZ.

Kubernetes cluster

EKS has been chosen as a computing resource on which to host the web server, prometheus and grafana. EKS nodes will be spawned on natted subnets to allow access to different AWS services such as ECR and the internet. The Kubernetes Cluster is based on Kubernetes version 1.21. To keep EKS costs down, it was added the possibility to create a part of the nodes in Spot mode. Via the eks.spot_instance_count and eks.on_demand_instance_count parameters, you can decide the strategy used by the autoscalers to choose the type of node to spawn. In fact, two worker groups have been created by deploying as many autoscaling groups. Additionally, an autoscaling rule has been implemented to scales out spot nodes when the 75% of cpu utilization is reached. To balance the load towards the pods that host the web server in the different AZs, an Application Load Balancer has been created.

The usage of CDK allowed the deployment of the Kubernetes resources in a simple and controlled way. In case of failure during the application of kubernetes manifests and helm charts results in a CloudFormation rollback avoiding the possibility of broken deployments.

To enable the Horizontal Pod Autoscaler (HPA) to do its job, the first resources deployed on the Kubernetes cluster is the Metrics Server manifest.

The application resources have been deployed via Helm Charts. Prometheus and Grafana, in fact, are pulled from their official helm repositories, while the Web Server is based on a custom chart specifically created (in the folder /helm/ccekswebserver).

Grafana's helm chart values have been modified to allow it to reach Prometheus metrics and download a dashboard from the internet.

CI/CD:

To meet the CI / CD requirements, it was chosen to create a simple pipeline via the CodePipeline service. It consists of two phases:

Source: the code of this repository is automatically fetched from GitHub (*) at each commit. To allow the integration between CodePipeline and GitHub requires using an oAuth token saved in a secret on secrets manager before creating the pipeline.
Build and deploy: this phase is managed through CodeBuild container based on Amazon Linux 2. Inside of them, NodeJs and Python runtimes are installed. The first is necessary for the use of CDK and the second for the interpretation of the IaC of this repository.

The advantage of using CDK is in fact the possibility of maintaining a strong synergy between the infrastructure and the code. In the case of this project, for example, the web server Dockerfile for creating containers, the helm charts and definition of their infrastructure are included in the same repository, completely avoiding disjoint deployments of the parts. In fact, CDK will push the docker images to ECR and helm charts to S3 in conjunction with the cdk deploy command. This feature is exploited more when Lambda functions are present. In fact, it is possible to keep the CDK and Back-End infrastructure code in the same code base and use the same or different programming languages.

(*) To create the codestar connection between CodePipeline and GitHub, you need to have created a personal access token on github. This token will be securely preserved in the secret called simple_eks_secret_github_token. The IaC is implemented with the possibility to take this token from the /infrastructure/parameters/uncommitted/.env.json file. As can be seen from the path, this file is not committed to git, so you can use the example.env.json file as a base. It's also possible to deploy the architecture without the CI/CD stack in such a way that the creation of a GitHub token is not necessary. To enable/disable its creation, you can change the value of the ci_cd_enabled parameter.

Possible optimizations:

There are many opportunities for optimization and improvement of the project. In this paragraph the most important ones I identified are listed:

Testing: the framework used (CDK) allows IaC testing to ensure that the resources created are correct in all their parts. To avoid incorrect deployments, you can add a test phase in the pipeline of CI/CD to block the deployment process in case of failure. The directory where to implement the tests is already present.
In this moment, the implemented IaC allows access to the web server via the HTTP protocol. It is very important, however, to discontinue its use in favor of HTTPS. To do this, one can generate an SSL certificate via AWS Certificate Manager. Thanks to the TLS session termination feature of the Load Balancers, it is then possible to enable HTTPS easily with the created certificate.
Better scaling strategies can be implemented according to relevant metrics. For example the bytes received by the load balancers can be useful to scale-out the nodes from 0 when users try to reach the web server's pages.
Usage of Fargate over EC2 EKS nodes can be extremely beneficial to reduce the hustle of managing the cluster autoscaler.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

WebServer highly available on Amazon EKS

1. Developer's guide

Prerequisites installation

2. Architectural choices

VPC:

Kubernetes cluster

CI/CD:

Possible optimizations:

Christian Calabrese

Files

README.md

Latest commit

History

README.md

File metadata and controls

WebServer highly available on Amazon EKS

1. Developer's guide

Prerequisites installation

2. Architectural choices

VPC:

Kubernetes cluster

CI/CD:

Possible optimizations:

Christian Calabrese