diff --git a/devops/README.md b/devops/README.md index 0c8faaae2..4bfa2e642 100644 --- a/devops/README.md +++ b/devops/README.md @@ -2,19 +2,17 @@ ## The goal -The course aims to offer in-depth knowledge of DevOps principles and essential AWS services necessary for efficient automation and infrastructure management. Participants will gain practical skills in setting up, deploying, and managing Kubernetes clusters on AWS, using tools like Kops and Terraform. +The course aims to offer in-depth knowledge of DevOps principles and essential AWS services necessary for efficient automation and infrastructure management. Participants will gain practical skills in setting up, deploying, and managing Kubernetes clusters on AWS, using tools like K3s and Terraform, Jenkins and monitoring tools. ## Prerequisite -- The application code should have unit tests and should be able to run (`npm test` or similar). -- Students should have a SonarQube account and will be able to run with their code (`sonar-scanner -Dsonar.projectKey=Your_Project_Key -Dsonar.sources=. -Dsonar.host.url=https://sonarcloud.io/`). -- Dockerfile with application. +- Basic knowledge of Cloud computing and networking +- Personal laptop ## Module 1: Configuration and Resources ### Part 1 (configuration) -- Developing the architecture of the infrastructure with private and public networks. - Install aws cli. - Installation and configuration of Terraform. - Configuring access to AWS via Terraform (API keys, IAM roles). @@ -46,31 +44,7 @@ The course aims to offer in-depth knowledge of DevOps principles and essential A ## Module 2: Cluster Configuration and Creation -### Part 1 (cluster configuration) - -#### Option 1 (kops config) - -- Installing Kops on your workstation. -- Creating an IAM user for Kops with the necessary permissions. -- Configuring access to AWS using IAM credentials. - -#### Option 2 (k3s config) - -- Installing K3s on your local machine. -- Creating an IAM user with the necessary permissions specifically for managing the K3s environment. -- Configuring AWS IAM credentials on the workstation. - -### Part 2 (create cluster) - -#### Option 1 (kops installation) - -- Prepare cluster configuration with kops command. -- Apply kops configuration. -- Validate the cluster. -- Create an account in Kubernetes for Jenkins. - -#### Option 2 (k3s installation) - +- Installing K3s on your EC2 instances. - Prepare the K3s cluster configuration. - Applying the K3s configuration using Terraform and K3s setup commands. - Validating the cluster to ensure it's correctly configured and operational. @@ -89,7 +63,12 @@ The course aims to offer in-depth knowledge of DevOps principles and essential A - Installing necessary plugins in Jenkins. (sonarqube, docker). - Set up necessary plugins in Jenkins for Kubernetes like Kubernetes plugin. Configure the plugin with endpoints and credentials for Kubernetes. -### Part 2 (Create Pipeline) +### Part 2 (Create HELM chart) + +- Create a Helm chart for [given application](./flask_app). The chart should contain templates for all necessary Kubernetes resources like Deployments as well as Health checks, liveness, readiness probes. +- Check that the application works as expected. + +### Part 3 (Create Pipeline) - Create pipeline, add steps: 1. Build application. @@ -97,34 +76,25 @@ The course aims to offer in-depth knowledge of DevOps principles and essential A 3. SonarQube check. 4. Build and push docker image to ECR. 5. Deploy docker image to Kubernetes cluster. -- Create a Helm chart for your application. The chart should contain templates for all necessary Kubernetes resources like Deployments as well as Health checks, liveness, readiness probes. -- Store your Helm chart in a source control system accessible from Jenkins. - In the deployment stage of your Jenkinsfile, add steps to deploy the application using Helm. - Check that the application works as expected. - After the deployment, you can add steps to verify that the application is running as expected. This could involve checking the status of the Kubernetes deployment, running integration tests, or hitting a health check endpoint. ## Module 4: Monitoring with Prometheus and Grafana - -### Part 1 (Installation and configuration Prometheus) - +### Prometheus - Using Helm to install Prometheus in Kubernetes. - Configuring Prometheus to collect metrics from the cluster. - Creating and configuring Service Monitor to track services in the cluster. - Configuring alert rules in Prometheus for monitoring critical events. - -### Part 2 (Installation and configuration Grafana) - +### Grafana - Deploying Grafana in Kubernetes using Helm. - Setting up secure access to Grafana via Ingress or LoadBalancer. - Configuring Grafana to connect to Prometheus as a data source. - Importing or creating dashboards to visualize metrics from Prometheus. - -### Part 3. (Testing monitoring) - +### Alerting Management - Conducting tests to verify the collection of metrics and their display in Grafana. - Simulating failures or high loads to test configured alerts. ## Useful links -- [Kubernetes with Kops](https://blog.kubecost.com/blog/kubernetes-kops/) - [K3s AWS Terraform Cluster](https://garutilorenzo.github.io/k3s-aws-terraform-cluster/) diff --git a/devops/flask_app/README.md b/devops/flask_app/README.md new file mode 100644 index 000000000..8c162ee24 --- /dev/null +++ b/devops/flask_app/README.md @@ -0,0 +1,8 @@ +To run this application, use Docker source image with python3.9+ +INstall requirements with ```pip install -r requirements.txt``` + +Run application with: +``` +FLASK_APP=main.py +flask run --host=0.0.0.0 --port=8080 +``` diff --git a/devops/flask_app/main.py b/devops/flask_app/main.py new file mode 100644 index 000000000..2a9acf86d --- /dev/null +++ b/devops/flask_app/main.py @@ -0,0 +1,8 @@ +from flask import Flask + +app = Flask(__name__) + + +@app.route('/') +def hello(): + return 'Hello, World!' \ No newline at end of file diff --git a/devops/flask_app/requirements.txt b/devops/flask_app/requirements.txt new file mode 100644 index 000000000..2077213c3 --- /dev/null +++ b/devops/flask_app/requirements.txt @@ -0,0 +1 @@ +Flask \ No newline at end of file diff --git a/devops/modules/1_basic-configuration/task_1.md b/devops/modules/1_basic-configuration/task_1.md index c1046fe13..2969032e0 100644 --- a/devops/modules/1_basic-configuration/task_1.md +++ b/devops/modules/1_basic-configuration/task_1.md @@ -1,5 +1,5 @@ # Task 1: AWS Account Configuration - +![task_1 schema](../../visual_assets/task_1.png) ## Objective In this task, you will: @@ -7,9 +7,12 @@ In this task, you will: - Install and configure the required software on your local computer - Set up an AWS account with the necessary permissions and security configurations - Deploy S3 buckets for Terraform states +- Create a Github Actions workflow to deploy infrastructure in AWS + +Additional tasks: - Create a federation with your AWS account for Github Actions - Create an IAM role for Github Actions -- Create a Github Actions workflow to deploy infrastructure in AWS + ## Steps @@ -43,10 +46,11 @@ In this task, you will: 5. **Create a bucket for Terraform states** + - Locking terraform state via DynamoDB is not required in this task, but recommended by the best practices. vvvv - [Managing Terraform states Best Practices](https://spacelift.io/blog/terraform-s3-backend) - [Terraform backend S3](https://developer.hashicorp.com/terraform/language/backend/s3) -6. **Create an IAM role for Github Actions** +6. **Create an IAM role for Github Actions(Additional task)💫** - Create an IAM role `GithubActionsRole` with the same permissions as in step 2: - AmazonEC2FullAccess @@ -58,9 +62,9 @@ In this task, you will: - AmazonEventBridgeFullAccess - [Terraform resource](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_role) -7. **Configure an Identity Provider and Trust policies for Github Actions** +7. **Configure an Identity Provider and Trust policies for Github Actions(Additional task)💫** - - Update the `GithubActionsRole` IAM role with Trust policy following the next guides + - Update the `GithubActionsRole` IAM role with a Trust policy following the next guides - [IAM roles terms and concepts](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles.html#id_roles_terms-and-concepts) - [Github tutorial](https://docs.github.com/en/actions/security-for-github-actions/security-hardening-your-deployments/configuring-openid-connect-in-amazon-web-services) - [AWS documentation on OIDC providers](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-idp_oidc.html#idp_oidc_Create_GitHub) @@ -68,7 +72,7 @@ In this task, you will: 8. **Create a Github Actions workflow for deployment via Terraform** - The workflow should have 3 jobs that run on pull request and push to the default branch: - - `terraform-check` with format checking [terraform fmt](https://developer.hashicorp.com/terraform/cli/commands/fmt) + - `terraform-check` with format checking using [terraform fmt](https://developer.hashicorp.com/terraform/cli/commands/fmt) - `terraform-plan` for planning deployments [terraform plan](https://developer.hashicorp.com/terraform/cli/commands/plan) - `terraform-apply` for deploying [terraform apply](https://developer.hashicorp.com/terraform/cli/commands/apply) - [terraform init](https://developer.hashicorp.com/terraform/cli/commands/init) @@ -77,20 +81,24 @@ In this task, you will: - [Configure AWS Credentials](https://github.com/aws-actions/configure-aws-credentials) ## Submission - -Ensure that the AWS CLI and Terraform installations are verified using `aws --version` and `terraform version`. + - Create a branch `task_1` from `main` branch in your repository. + - [Create a Pull Request](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request) (PR) from `task_1` branch to `main`. + - Provide the code for Terraform and GitHub Actions in the PR. + - Provide screenshots of `aws --version` and `terraform version` in the PR description. + - Provide a link to the Github Actions workflow run in the PR description. + - Provide the Terraform plan output with S3 bucket (and possibly additional resources) creation in the PR description. ## Evaluation Criteria (100 points for covering all criteria) 1. **MFA User configured (10 points)** - - Provide a screenshot of the non-root account secured by MFA (ensure sensitive information is not shared). + - Screenshot of the non-root account secured by MFA (ensure sensitive information is not shared) is presented -2. **Bucket and GithubActionsRole IAM role configured (30 points)** +2. **Bucket and GithubActionsRole IAM role configured (20 points)** - Terraform code is created and includes: - - A bucket for Terraform states - - IAM role with correct Identity-based and Trust policies + - Provider initialization + - Creation of S3 Bucket 3. **Github Actions workflow is created (30 points)** @@ -103,11 +111,12 @@ Ensure that the AWS CLI and Terraform installations are verified using `aws --ve 5. **Verification (10 points)** - - Terraform plan is executed successfully for `GithubActionsRole` - - Terraform plan is executed successfully for a terraform state bucket + - Terraform plan is executed successfully -6. **Additional Tasks (10 points)** +6. **Additional Tasks (20 points)💫** - **Documentation (5 points)** - - Document the infrastructure setup and usage in a README file. + - Document the infrastructure setup and usage in a README file. - **Submission (5 points)** - A GitHub Actions (GHA) pipeline is passing + - **Secure authorization (10 points)** + - IAM role with correct Identity-based and Trust policies used to connect GitHubActions to AWS. diff --git a/devops/modules/1_basic-configuration/task_2.md b/devops/modules/1_basic-configuration/task_2.md index 58e4e2803..c7261caee 100644 --- a/devops/modules/1_basic-configuration/task_2.md +++ b/devops/modules/1_basic-configuration/task_2.md @@ -1,4 +1,5 @@ # Task 2: Basic Infrastructure Configuration +![task_2 schema](../../visual_assets/task_2.png) ## Objective @@ -27,19 +28,20 @@ In this task, you will write Terraform code to configure the basic networking in - Execute `terraform plan` to ensure the configuration is correct. - Provide a resource map screenshot (VPC -> Your VPCs -> your_VPC_name -> Resource map). -4. **Submit Code** - - - Create a PR with the Terraform code in a new repository. - - (Optional) Set up a GitHub Actions (GHA) pipeline for the Terraform code. - -5. **Additional Tasks** +4. **Additional Tasks💫** - Implement security groups. - Create a bastion host for secure access to the private subnets. - - Organize NAT for private subnets, so instances in private subnet can connect with outside world: + - Organize NAT for private subnets, so instances in the private subnet can connect with the outside world: - Simpler way: create a NAT Gateway - - Cheaper way: configure a NAT instance in public subnet + - Cheaper way: configure a NAT instance in the public subnet - Document the infrastructure setup and usage in a README file. +## Submission + - Create `task_2` branch from `main` in your repository. + - [Create a Pull Request](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request) (PR) with the Terraform code in your repository from `task_2` to `main`. + - Provide screenshots of a resource map screenshot (VPC -> Your VPCs -> your_VPC_name -> Resource map) in the PR description. + - (Optional) Set up a GitHub Actions (GHA) pipeline for the Terraform code. + ## Evaluation Criteria (100 points for covering all criteria) 1. **Terraform Code Implementation (50 points)** @@ -51,7 +53,7 @@ In this task, you will write Terraform code to configure the basic networking in - Internet Gateway - Routing configuration: - Instances in all subnets can reach each other - - Instances in public subnets can reach addresses outside VPC and vice-versa + - Instances in public subnets can reach addresses outside the VPC and vice-versa 2. **Code Organization (10 points)** @@ -63,14 +65,14 @@ In this task, you will write Terraform code to configure the basic networking in - Terraform plan is executed successfully. - A resource map screenshot is provided (VPC -> Your VPCs -> your_VPC_name -> Resource map). -4. **Additional Tasks (30 points)** +4. **Additional Tasks (30 points)💫** - **Security Groups and Network ACLs (5 points)** - Implement security groups and network ACLs for the VPC and subnets. - **Bastion Host (5 points)** - Create a bastion host for secure access to the private subnets. - **NAT is implemented for private subnets (10 points)** - - Orginize NAT for private subnets with simpler or cheaper way - - Instances in private subnets should be able to reach addresses outside VPC + - Orginize NAT for private subnets in a simpler or cheaper way + - Instances in private subnets should be able to reach addresses outside the VPC - **Documentation (5 points)** - Document the infrastructure setup and usage in a README file. - **Submission (5 points)** diff --git a/devops/modules/2_cluster-configuration/README.md b/devops/modules/2_cluster-configuration/README.md index 9a56010c4..04499aeef 100644 --- a/devops/modules/2_cluster-configuration/README.md +++ b/devops/modules/2_cluster-configuration/README.md @@ -6,16 +6,9 @@ In this module you need to configure a K8s cluster on top of the network infrast ## K8s deployment and configuration -There are multiple ways of deployment K8s cluster on AWS. In this course you're supposed to use either kOps (https://kops.sigs.k8s.io/) or k3s (https://k3s.io/). You need to get familiar with both project and decide which one is more suitable for you. +There are multiple ways of deployment K8s cluster on AWS. In this course you're supposed to use k3s (https://k3s.io/). -Several things to keep in mind during cluster deployment: - -1. kOps will handle creation of most resources for you, with k3s you are in charge of underlying infrastructure management; -2. By default, kOps creates more AWS resources. This may lead to additional expenses and not fit into AWS Free Tier; -3. You'll need a domain name or a sub-domain for kOps-managed cluster; -4. Make sure you're using AWS EC2 instances type from Free Tier to avoid addition expenses (see https://aws.amazon.com/free for more details). - -Rule of thumb: use k3s if you don't want to spend on AWS resources; use kOps if you'd like to practise with a more real-life cluster. +Make sure you're using AWS EC2 instances type from Free Tier to avoid addition expenses (see https://aws.amazon.com/free for more details). **This task is considered as done if all the conditions below are met:** diff --git a/devops/modules/2_cluster-configuration/task_3.md b/devops/modules/2_cluster-configuration/task_3.md index 0e50b39fb..fee935587 100644 --- a/devops/modules/2_cluster-configuration/task_3.md +++ b/devops/modules/2_cluster-configuration/task_3.md @@ -1,20 +1,15 @@ # Task: K8s Cluster Configuration and Creation - +![task_3 schema](../../visual_assets/task_3.png) ## Objective -In this task, you will configure and deploy a Kubernetes (K8s) cluster on AWS using either kOps or k3s. You will also verify the cluster by running a simple workload. +In this task, you will configure and deploy a Kubernetes (K8s) cluster on AWS using or k3s. You will also verify the cluster by running a simple workload. ## Steps 1. **Choose Deployment Method** - - Get familiar with both [kOps](https://kops.sigs.k8s.io/) and [k3s](https://k3s.io/). - - Decide which deployment method is more suitable for you based on the following considerations: - - kOps handles the creation of most resources for you, while k3s requires you to manage the underlying infrastructure. - - kOps may lead to additional expenses due to the creation of more AWS resources. - - kOps requires a domain name or sub-domain. - - Use AWS EC2 instances from the Free Tier to avoid additional expenses. - + - Get familiar with [k3s](https://k3s.io/). + - Use AWS EC2 instances from the [AWS Free Tier](https://aws.amazon.com/free/) to avoid additional expenses. 2. **Create or Extend Terraform Code** - Create or extend Terraform code to manage AWS resources required for the cluster creation. @@ -22,12 +17,14 @@ In this task, you will configure and deploy a Kubernetes (K8s) cluster on AWS us 3. **Deploy the Cluster** - - Deploy the K8s cluster using the chosen method (kOps or k3s). - - Ensure the cluster is accessible from your local computer. + - Deploy [Bastion host](https://www.geeksforgeeks.org/what-is-aws-bastion-host/) + - Deploy the K8s cluster using k3s. + - Ensure the cluster is accessible from your bastion host + - **Additional task** make it accessible from your local computer. 4. **Verify the Cluster** - - Run the `kubectl get nodes` command from your local computer to get information about the cluster. + - Run the `kubectl get nodes` command from your bastion host to get information about the cluster. - Provide a screenshot of the `kubectl get nodes` command output. 5. **Deploy a Simple Workload** @@ -38,16 +35,15 @@ In this task, you will configure and deploy a Kubernetes (K8s) cluster on AWS us ``` - Ensure the workload runs successfully on the cluster. -6. **Additional Tasks** - - Implement monitoring for the cluster using Prometheus and Grafana. - - Document the cluster setup and deployment process in a README file. +6. **Additional Tasks💫** + - Document the cluster setup process in a README file. ## Submission +- Create a `task_3` branch from `main` in your repository. - Provide a PR with the Terraform code for the K8s cluster and bastion host. -- Provide a screenshot of the `kubectl get nodes` command output. -- Ensure that the simple workload is deployed and running successfully on the cluster. -- Provide a PR with the monitoring setup. +- Provide a screenshot of the `kubectl get all --all-namespaces` command output. (pod named "nginx" should be present) +- Provide a screenshot of the `kubectl get nodes` command output. 2 nodes should be present. - Provide a README file documenting the cluster setup and deployment process. ## Evaluation Criteria (100 points for covering all criteria) @@ -57,20 +53,19 @@ In this task, you will configure and deploy a Kubernetes (K8s) cluster on AWS us - Terraform code is created or extended to manage AWS resources required for the cluster creation. - The code includes the creation of a bastion host. -2. **Cluster Deployment (60 points)** - - - A K8s cluster is deployed using either kOps or k3s. - - The deployment method is chosen based on the user's preference and understanding of the trade-offs. +2. **Cluster Verification (50 points)** -3. **Cluster Verification (10 points)** + - The cluster is verified by running the `kubectl get nodes` command from the bastion host. + - k8s cluster consists of 2 nodes (may be checked on screenshot). - - The cluster is verified by running the `kubectl get nodes` command from the local computer. - - A screenshot of the `kubectl get nodes` command output is provided. - -4. **Workload Deployment (10 points)** +4. **Workload Deployment (30 points)** - A simple workload is deployed on the cluster using `kubectl apply -f https://k8s.io/examples/pods/simple-pod.yaml`. - - The workload runs successfully on the cluster. + - Pod named "nginx" presented in the output of `kubectl get all --all-namespaces` command + +6. **Additional Tasks (10 points)💫** + - **Documentation (5 points)** + - Document the cluster setup and deployment process in a README file. + - **Cluster accessability (5 points)** + - The cluster is verified by running the `kubectl get nodes` command from the **local computer**. -5. **Additional Tasks (10 points)** - - Document the cluster setup and deployment process in a README file. diff --git a/devops/modules/3_ci-configuration/README.md b/devops/modules/3_ci-configuration/README.md index 1c42928fc..efb7083a2 100644 --- a/devops/modules/3_ci-configuration/README.md +++ b/devops/modules/3_ci-configuration/README.md @@ -6,14 +6,9 @@ In this module you need to install Jenkins CI server to your k8s cluster and con ## Task 1. Installation and configuration of a Jenkins server -There are multiple ways to install Jenkins on a K8s cluster. In this task you need to install Jenkins using a Helm chart. Note that Helm should be installed prior on your computer, where you have kubectl configured to access the cluster from the previous module. See https://helm.sh/ for more details on Helm installation. You may verify your Helm installation by deployment and removal of Nginx chart (https://artifacthub.io/packages/helm/bitnami/nginx). +There are multiple ways to install Jenkins on a K8s cluster. In this task you need to install Jenkins using a Helm chart. Note that Helm should be installed prior on your computer, where you have kubectl configured to access the cluster from the previous module. See [HELM Documentation](https://helm.sh/) for more details on Helm installation. You may verify your Helm installation by deployment and removal of [Nginx chart](https://artifacthub.io/packages/helm/bitnami/nginx). -When done with Helm, install Jenkins using the instruction from Jenkins documentation (https://www.jenkins.io/doc/book/installing/kubernetes/#install-jenkins-with-helm-v3). Make sure that Jenkins is installed into a separate namespace. - -Before proceeding with Jenkins installation, please make sure that your cluster meets the following requirements: - -1. The cluster has a solution for managing persistent volumes (PV) and persistent volume claims (PVC). For more information on PV and PVC see the official K8s documentation (https://kubernetes.io/docs/concepts/storage/volumes/). See this k3s documentation page for some possible solutions (https://docs.k3s.io/storage) -2. Since you are going to access your Jenkins server via the internet, your communications with the server should be encrypted using TLS. For that reason you'll need an ingress controller installed and configured to serve traffic via HTTPS. Note, that you'll also need a certificate and a private key. You may use one provided via Letsencrypt, ZeroSSL or any other services. If you don't have a domain name, you may also use self-signed certificate, but it's not the recommended solution. +When done with Helm, install Jenkins using the instruction from [Jenkins documentation](https://www.jenkins.io/doc/book/installing/kubernetes/#install-jenkins-with-helm-v3). Make sure that Jenkins is installed into a separate namespace. When Jenkins installation is done, you may verify it by creation of a really simple freestyle project, which should do nothing but writing "Hello world" into the log. @@ -24,27 +19,32 @@ When Jenkins installation is done, you may verify it by creation of a really sim - you are able to run a simple Jenkins project; - Jenkins configuration is stored on a persistent volume and not being lost when Jenkins' pod is terminated. -## Task 2. A pipeline configuration - -In this task you need to configure full CI/CD pipeline for your project (i.e. the pipeline should cover such software lifecycle phases as build, testing and deployment). +## Task 2. HELM chart creation -Before starting with the pipeline, you need to implement the following: +In this task you need to create HELM chart for [Given Application](../../flask_app/). Using [helm tool](https://helm.sh/docs/helm/helm_create/) create a HELM chart for your application and adjust it to fit your needs. -1. A Docker image(s) for your application. The image(s) should be stored in AWS ECR repository. Note, that your k8s nodes should be able to access this repository as well. For that reason you'll need to adjust or create a new instance profile, associated with your EC2 instances, to let them access the repository. +1. A Docker image(s) for your application. The image(s) should be stored in the [public image registry](https://hub.docker.com). Note, that your k8s nodes should be able to access this repository as well. 2. A Helm chart for your application. The chart should handle configuration of all the components (like configmaps, deployments, etc.), which are required to run the application withing a K8s cluster. See Helm documentation for more details on this step. During the development of the chart, you may test it manually from your computer, where Helm is installed. -Artifacts from these steps (Dockerfile and Helm chart) should be stored in a git repository, which will be available to Jenkins. You may use your application's main repository as well. +**This task is considered as done if all the conditions below are met:** + +- your built artifacts are stored within git (Dockerfile, Helm chart) and ECR (Docker image); + +## Task 3. A pipeline configuration + +In this task you need to configure full CI/CD pipeline for your project (i.e. the pipeline should cover such software lifecycle phases as build, testing and deployment). + +Artifacts from previous task (Dockerfile and Helm chart) should be stored in a git repository, which will be available to Jenkins. You may use your application's main repository as well. The next step is to configure a Jenkins pipeline. The pipeline should be stored in a form of Jenkinsfile in your main git repository, and be triggered on each push event on this repo. The pipeline should contain the following steps: 1. Application's build; 2. Unit test's execution; 3. Security check with SonarQube; -4. Docker image building and pushing to ECR. This step is supposed be triggered manually; +4. Docker image building and pushing to public registry. This step is supposed be triggered manually; 5. Deployment of an application's build into your k8s cluster with Helm. This step should depend on the previous one. 6. (Optional) Application's verification (curl main page, send requests toward API if any, etc.) You may run a smoke test on this step as well. **This task is considered as done if all the conditions below are met:** -- your built artifacts are stored within git (Dockerfile, Helm chart) and ECR (Docker image); - your pipeline has all steps from the description above. diff --git a/devops/modules/3_ci-configuration/task_4.md b/devops/modules/3_ci-configuration/task_4.md index fc0d272e0..b2a6bfc44 100644 --- a/devops/modules/3_ci-configuration/task_4.md +++ b/devops/modules/3_ci-configuration/task_4.md @@ -1,11 +1,14 @@ # Task 4: Jenkins Installation and Configuration +![task_4 schema](../../visual_assets/task_4-6.png) ## Objective -In this task, you will install Jenkins CI server on your Kubernetes (K8s) cluster using Helm and configure it to be accessible via internet. -IMPORTANT! You better choose to use t3/t2.small VMs, since micro have not sufficient amount of RAM for running Jenkins. Be aware that small instances are not included in the free tier, so you'll be charged 0.05$/hour for them. -Best choise for saving - create 1 small instalnce in public network. Setup init script to install k3s and deploy all of the necessary HELM charts to startup jenkins. Destroy environment whenever you are not working with it. -Have a look at this [JCasC article](https://medium.com/globant/jenkins-jcasc-for-beginners-819dff6f8bc) to store jenkins configuration and jobs as s code. +In order to avoid unnecessary spending, we'll not use cluster we've just created in the AWS. However, we'll leverage [Minikube](https://minikube.sigs.k8s.io/docs/start/?arch=%2Fmacos%2Fx86-64%2Fstable%2Fbinary+download). It's a k8s cluster, which you may install to your local machines. It should be enough for learning purposes. Follow the documentation from that link to install minikube on your local PC. And then, proceed right to [steps](#steps). + +If you're brave enough to keep using cluster deployed in the clouds - pay attention on the resource's consumption my your VM. And keep in mind notification down bellow. + +IMPORTANT!( for cloud deployment only. Skip this one if you are chose to use minikube) You better choose to use t3/t2.small VMs, since micro have not sufficient amount of RAM for running Jenkins. Be aware that small instances are not included in the free tier, so you'll be charged 0.05$/hour for them. +Best choise for saving - create 1 small instance in public network. Set up an init script to install k3s, and deploy all of the necessary HELM charts to startup jenkins. Destroy environment whenever you are not working with it. ## Steps @@ -16,28 +19,31 @@ Have a look at this [JCasC article](https://medium.com/globant/jenkins-jcasc-for 2. **Prepare the Cluster** - - Ensure your cluster has a solution for managing persistent volumes (PV) and persistent volume claims (PVC). Refer to the [K8s documentation](https://kubernetes.io/docs/concepts/storage/volumes/) and [k3s documentation](https://docs.k3s.io/storage) for more details. + - Ensure your cluster has a solution for managing persistent volumes (PV) and persistent volume claims (PVC). Refer to the [K8s documentation](https://kubernetes.io/docs/concepts/storage/volumes/) and [k3s documentation](https://docs.k3s.io/storage) or [Minikube PVC](https://minikube.sigs.k8s.io/docs/handbook/persistent_volumes/)for more details. 3. **Install Jenkins** - Follow the instructions from the [Jenkins documentation](https://www.jenkins.io/doc/book/installing/kubernetes/#install-jenkins-with-helm-v3) to install Jenkins using Helm. Ensure Jenkins is installed in a separate namespace. [Debug init container](https://kubernetes.io/docs/tasks/debug/debug-application/debug-init-containers/#accessing-logs-from-init-containers) + - Ensure that Jenkins is accessible via web browser. [Setup reverse proxy](https://www.digitalocean.com/community/tutorials/how-to-configure-nginx-as-a-reverse-proxy-on-ubuntu-22-04) if you are working in the environment behind the bastion host. 4. **Verify Jenkins Installation** - Create a simple freestyle project in Jenkins that writes "Hello world" into the log. -5. **Additional Tasks** - - Set up a GitHub Actions (GHA) pipeline to deploy Jenkins. +5. **Additional Tasks💫** + - Set up a GitHub Actions (GHA) pipeline to deploy Jenkins. (not applicable on minikube installation) - Configure authentication and security settings for Jenkins. + - Use JCasC to store your Hello World job. ## Submission -- Provide a PR with the Helm chart for Jenkins deployment in a new repository. -- Ensure that Jenkins is accessible via intenet. [Setup reverse proxy](https://www.digitalocean.com/community/tutorials/how-to-configure-nginx-as-a-reverse-proxy-on-ubuntu-22-04) if you are working in the environment behind the bastion host. +- Create a `task_4` branch from `main` in your repository. +- Provide a PR with the Helm chart for Jenkins deployment. - Provide a screenshot of the Jenkins freestyle project log showing "Hello world". - Provide a PR with the GHA pipeline code for Jenkins deployment. -- Document the authentication and security configurations in a README file. +- Attach screenshot with ```kubectl get all --all-namespaces``` to the PR +- Provide a README file documenting the installation and configuration process. ## Evaluation Criteria (100 points for covering all criteria) @@ -49,21 +55,23 @@ Have a look at this [JCasC article](https://medium.com/globant/jenkins-jcasc-for - The cluster has a solution for managing persistent volumes (PV) and persistent volume claims (PVC). -3. **Jenkins Installation (50 points)** +3. **Jenkins Installation (40 points)** - Jenkins is installed using Helm in a separate namespace. - - Jenkins is available from the internet. + - Jenkins is available from the web browser. 4. **Jenkins Configuration (10 points)** - Jenkins configuration is stored on a persistent volume and is not lost when Jenkins' pod is terminated. -5. **Verification (10 points)** +5. **Verification (15 points)** - A simple Jenkins freestyle project is created and runs successfully, writing "Hello world" into the log. -6. **Additional Tasks (10 points)** +6. **Additional Tasks (15 points)💫** - **GitHub Actions (GHA) Pipeline (5 points)** - A GHA pipeline is set up to deploy Jenkins. - **Authentication and Security (5 points)** - Authentication and security settings are configured for Jenkins. + - **JCasC is used to describe job in Jenkins (5 points)** + - "Hello World" job is created via JCasC in HELM chart values. diff --git a/devops/modules/3_ci-configuration/task_5.md b/devops/modules/3_ci-configuration/task_5.md index b1a2941f7..0be90c07f 100644 --- a/devops/modules/3_ci-configuration/task_5.md +++ b/devops/modules/3_ci-configuration/task_5.md @@ -1,60 +1,50 @@ # Task 5: Simple Application Deployment with Helm +![task_5 schema](../../visual_assets/task_4-6.png) ## Objective -In this task, you will create a Helm chart for a simple application and deploy it on your Kubernetes (K8s) cluster. +In this task, you will create a Docker image and Helm chart for a simple application and deploy it on your Kubernetes (K8s) cluster. ## Steps 1. **Create Helm Chart** - - Create a Helm chart for your application. + - Create a Helm chart for your [Application](https://github.com/rolling-scopes-school/tasks/tree/master/devops/flask_app/README.md). 2. **Deploy the Application** - - Deploy the WordPress application using the Helm chart. - - Ensure the application is accessible from the internet. + - Deploy the application using the Helm chart. + - Ensure the application is accessible from the web browser. 3. **Store Artifacts in Git** - - Store the WordPress application and Helm chart in a new git repository. + - Store the application and Helm chart in your git repository. -4. **Verify the Application** - - - Verify that the application is running and accessible. - -5. **Additional Tasks** - - Implement a CI/CD pipeline to automate the deployment of the WordPress. +4. **Additional Tasks💫** - Document the application setup and deployment process in a README file. ## Submission -- Provide a PR with the application and Helm chart in a new repository. -- Ensure that the application is accessible from the internet. -- Provide a PR with the CI/CD pipeline code for the application deployment. +- Create a `task_5` branch from `main` in your repository. +- Provide a PR with the application and Helm chart in your repository. +- Provide a screenshot from your browser with working application. - Provide a README file documenting the application setup and deployment process. ## Evaluation Criteria (100 points for covering all criteria) 1. **Helm Chart Creation (40 points)** - - A Helm chart for the WordPress application is created. + - A Helm chart for the application is created. -2. **Application Deployment (30 points)** +2. **Application Deployment (50 points)** - The application is deployed using the Helm chart. - - The application is accessible from the internet. - -3. **Repository Submission (5 points)** + - The application is accessible from the web browser. - - A new repository is created with the WordPress and Helm chart. - -4. **Verification (5 points)** +4. **Additional Tasks (10 points)💫** + - **Documentation (10 points)** + - The application setup and deployment processes are documented in a README file. - - The application is verified to be running and accessible. +## References -5. **Additional Tasks (20 points)** - - **CI/CD Pipeline (10 points)** - - A CI/CD pipeline is set up to automate the deployment of the application. - - **Documentation (10 points)** - - The application setup and deployment process are documented in a README file. +- [Create your HELM chart](https://helm.sh/docs/helm/helm_create/) diff --git a/devops/modules/3_ci-configuration/task_6.md b/devops/modules/3_ci-configuration/task_6.md index abf81430b..7e16b9cb6 100644 --- a/devops/modules/3_ci-configuration/task_6.md +++ b/devops/modules/3_ci-configuration/task_6.md @@ -1,4 +1,5 @@ # Task 6: Application Deployment via Jenkins Pipeline +![task_6 schema](../../visual_assets/task_4-6.png) ## Objective @@ -6,44 +7,33 @@ In this task, you will configure a Jenkins pipeline to deploy your application o ## Steps -1. **Create Docker Image and Store in ECR** - - Create a Docker image for your application. - - Store the Docker image in an AWS ECR repository. - - Ensure your K8s nodes can access the ECR repository by adjusting or creating a new instance profile for your EC2 instances. +1. **Configure Jenkins Pipeline** -2. **Create Helm Chart** - - - Create a Helm chart for your application. - - Test the Helm chart manually from your local machine. - -3. **Store Artifacts in Git** - - - Store the Dockerfile and Helm chart in a git repository accessible to Jenkins. - -4. **Configure Jenkins Pipeline** - - - Create a Jenkins pipeline and store it as a Jenkinsfile in your main git repository. + - Create a Jenkins pipeline and store it as a Jenkinsfile in your git repository. - Configure the pipeline to be triggered on each push event to the repository. -5. **Pipeline Steps** +2. **Pipeline Steps** - The pipeline should include the following steps: 1. Application build 2. Unit test execution 3. Security check with SonarQube - 4. Docker image building and pushing to ECR (manual trigger) - 5. Deployment to K8s cluster with Helm (dependent on the previous step) - 6. (Optional) Application verification (e.g., curl main page, send requests to API, smoke test) + 4. Docker image building and pushing to any Registry + 5. Deployment to the K8s cluster with Helm (dependent on the previous step) + 6. (Optional) Application verification (e.g., curl the main page, send requests to API, smoke test) -6. **Additional Tasks** +3. **Application verification** + - Ensure that the pipeline runs successfully and deploys the application to the K8s cluster. +4. **Additional Tasks💫** - Set up a notification system to alert on pipeline failures or successes. - Document the pipeline setup and deployment process in a README file. ## Submission +- Create a `task_6` branch from `main` in your repository. - Provide a PR with the application, Helm chart, and Jenkinsfile in a repository. -- Ensure that the pipeline runs successfully and deploys the application to the K8s cluster. +- Provide a screenshot of passed Jenkins pipeline - Provide a README file documenting the pipeline setup and deployment process. ## Evaluation Criteria (100 points for covering all criteria) @@ -70,7 +60,7 @@ In this task, you will configure a Jenkins pipeline to deploy your application o - The pipeline runs successfully and deploys the application to the K8s cluster. -5. **Additional Tasks (30 points)** +5. **Additional Tasks (30 points)💫** - **Application Verification (10 points)** - Application verification is performed (e.g., curl main page, send requests to API, smoke test). - **Notification System (10 points)** diff --git a/devops/modules/4_monitoring-configuration/README.md b/devops/modules/4_monitoring-configuration/README.md index a8c393220..5103e6bb2 100644 --- a/devops/modules/4_monitoring-configuration/README.md +++ b/devops/modules/4_monitoring-configuration/README.md @@ -4,7 +4,7 @@ In this module you need to configure cluster's metrics collection with Prometheus, visualize it with Grafana, and configure and test some alerts with Alert Manager. -## Task 1. Install and configure Prometheus. +## Install and configure Prometheus. Monitoring is an essential part of any production setup. Prometheus is an open-source system which may collect metrics from applications, hosted in a K8s cluster as well as from the cluster itself. See Prometheus documentation (https://prometheus.io/docs/introduction/overview/) for more details. @@ -16,7 +16,7 @@ In this task you need to install Prometheus to your cluster using Helm chart by - Prometheus is not exposed outside (i.e. you can't access it directly from the internet); - Prometheus is collecting cluster-specific metrics. -## Task 2. Install and configure Grafana. +## Install and configure Grafana. While Prometheus shines in metrics collection, it's visualisation abilities are pretty limited. For that reason it's commonly being used with Grafana, which provides a lot of options for data representation. See Grafana official documentation (https://grafana.com/docs/) for more detail. A lot of examples of Grafana data visualisation could be find at official Grafana demo site (https://play.grafana.org/). @@ -28,9 +28,9 @@ In this task you need to install Grafana using Helm chart by Bitnami. When done - Grafana is available from the internet via an encrypted connection; - you have a dashboard with main K8s metrics visualised (see above for the examples of what is supposed to be included). -## Task 3. Alerting configuration and verification. +## Alerting configuration and verification. -It's easy to miss some important change in your service when relying only on visual observation of related metrics. To avoid such situations, it's crucial to have good alerting service. In this task you need to configure Alertmanager (https://prometheus.io/docs/alerting/latest/alertmanager/) - a Prometheus' component, which might be used to send alerts via various channels. +It's easy to miss some important change in your service when relying only on visual observation of related metrics. To avoid such situations, it's crucial to have good alerting service. In this task you need to configure [Grafana Alerting](https://grafana.com/docs/grafana/latest/alerting/) - a Grafana' component, which might be used to send alerts via various channels. Your task is to configure multiple alerts for such events as: @@ -38,9 +38,9 @@ Your task is to configure multiple alerts for such events as: - lack of CPU cores capacity on any node of your cluster; - no running CoreDNS pods (if applicable to your cluster). -Alertmanager should be configured to deliver alerts to your email address. +Alert manager should be configured to deliver alerts to your email address. -When finish with Alertmanager configuration, try to simulate any failure from the list above and check whether you receive alerts or not. +When finish with Alert manager configuration, try to simulate any failure from the list above and check whether you receive alerts or not. **This task is considered as done if all the conditions below are met:** diff --git a/devops/modules/4_monitoring-configuration/task_7.md b/devops/modules/4_monitoring-configuration/task_7.md index 17f160772..51b94435f 100644 --- a/devops/modules/4_monitoring-configuration/task_7.md +++ b/devops/modules/4_monitoring-configuration/task_7.md @@ -1,11 +1,14 @@ # Task 7: Prometheus Deployment on K8s +![task_7 schema](../../visual_assets/task_7.png) ## Objective -In this task, you will install Prometheus on your Kubernetes (K8s) cluster using a Helm chart and configure it to collect essential cluster-specific metrics. +In this task, you'll setup a set of tools to monitor your cluster and application. We'll do it by installing Prometheus and Grafana in your cluster using HELM charts. Main goal is to collect essential metrics from an our infrastructure and visualize them. ## Steps +### Prometheus + 1. **Install Prometheus** - Follow the instructions to install Prometheus using the Helm chart by Bitnami. Refer to the [Prometheus documentation](https://prometheus.io/docs/introduction/overview/) for more details. 2. **Install Exporters** @@ -15,31 +18,104 @@ In this task, you will install Prometheus on your Kubernetes (K8s) cluster using 4. **Verify Metrics Collection** - Ensure Prometheus is collecting essential cluster-specific metrics, such as nodes' memory usage. - Check the collected metrics via Prometheus web interface. +### Grafana +1. **Install Grafana** + - Follow the instructions to install Grafana using the Helm chart by Bitnami. Refer to the [Grafana documentation](https://grafana.com/docs/) for more details. +2. **Configure Grafana** + - Add a new data source pointing to the existing Prometheus installation. +3. **Create a Dashboard** + - Create a dashboard with basic metrics visualized, such as CPU and memory utilization, storage usage, etc. + +### Alertmanager + +1. **Configure SMTP for Grafana** + - Configure SMTP server + - For local setup you can consider any SMTP server + - To send emails in AWS consider using Amazon SES (Simple Email Service) + - Configure Grafana SMTP settings + - Local setup will need only host:port and probably skipVerify to bypass tls verification + - AWS SES will need host:port, authentication details and "from address" which must be verified in SES +2. **Configure Contact points** + - Add your email as a contact point. (this email should be also verified if you are using AWS SES) +3. **Configure Alert Rules** + - Configure alerts for the following events: + - High CPU utilization on any node of the cluster. + - Lack of RAM capacity on any node of the cluster. + - Ensure alerts are delivered to your email address. +4. **Verify Alerts** + - Simulate CPU and memory stress on a Kubernetes node using tools like `stress` or `sysbench`. + +### **Additional Tasks💫** + - Alert Rules, Contact Points, and SMTP settings are configured using YAML files or other code-based methods. + - Document the process of installation and configuration of monitoring tools. + ## Submission +- Create a `task_7` branch from `main` in your repository. +- Provide a README file documenting the Monitoring tools deployment and configuration. **Note:** Ensure that all personal data, such as email addresses and SMTP credentials, are hidden in screenshots, code and documentation. -- Provide a PR with automation of a Prometheus deployment in Kubernetes with IaC or CI/CD pipeline. -- Provide an output of `kubectl get pods` with running Prometheus. +#### Prometheus +- Provide a PR with automation of a MOnitoring tools deployment in Kubernetes with IaC or CI/CD pipeline. +- Provide an output of `kubectl get all --all-namespaces` with running Prometheus and Grafana. - Include a screenshot of any metrics (e.g. node disk space usage) shown in the Prometheus web UI. -- Provide a README file documenting the Prometheus deployment and configuration. + +#### Grafana +- Include a screenshot or configuration of the Prometheus data source configuration. +- Include a screenshot of the dashboard created. +- Include a JSON file of the dashboard layout. + +#### Alertmanager +- Include into PR (description or in changes) screenshots of: + - Contact Points. + - Alert Rules in normal and firing state. + - Alert Rules configuration. + - Received emails. ## Evaluation Criteria (100 points for covering all criteria) 1. **Prometheus Installation (20 points)** - - Prometheus is installed and running on the K8s cluster. -2. **Deployment Automation (30 points)** + - Prometheus and Grafana are installed and running on the K8s cluster. +2. **Deployment Automation (10 points)** - Automation of deployment with IaC or CI/CD pipeline is created. -3. **Web interface is available (10 points)** - - Metrics can be checked via Prometheus web interface. -4. **Metrics Collection (35 points)** - - Prometheus is collecting essential cluster-specific metrics, such as nodes' memory usage. -5. **Documentation is created (5 points)** - - A README file is created or updated documenting the Prometheus deployment and configuration. +3. **Grafana Installation (10 points)** + - A Grafana data source pointing to the existing Prometheus installation is added. +4. **Dashboard Creation (10 points)** + - A dashboard is created with basic metrics visualized, such as CPU and memory utilization, storage usage, etc. +5. **Additional Tasks (10 points)** + - Admin user password is created with a separate secret. (10 points) + - A JSON file of the dashboard layout is provided. (5 points) +6. **Alert Rules created (20 points)** + - Alert Rules are configured to send alerts for the following events: + - High CPU utilization on any node of the cluster. + - Lack of RAM capacity on any node of the cluster. + - Alerts are configured to be delivered to your email address. +7. **Additional Tasks (20 points)💫** + - **Configuration is done completely in code (10 points)** + - Alert Rules, Contact Points, and SMTP settings are configured using YAML files or other code-based methods. + - **Documentation (10 points)** + - The process of installation and configuration of monitoring tools is documented in a README file. ## References +### Prometheus - [Prometheus Documentation](https://prometheus.io/docs/introduction/overview/) - [Helm Chart for Prometheus](https://github.com/bitnami/charts/tree/main/bitnami/prometheus) - [Bitnami Prometheus Helm Chart README](https://github.com/bitnami/charts/blob/main/bitnami/prometheus/README.md) - [Kube state metrics](https://github.com/kubernetes/kube-state-metrics) - [Prometheus Node Exporter](https://github.com/prometheus/node_exporter) + +### Grafana + +- [Grafana Documentation](https://grafana.com/docs/) +- [Helm Chart for Grafana](https://github.com/bitnami/charts/tree/main/bitnami/grafana) +- [Tool to impose load](https://linux.die.net/man/1/stress) +- [Understanding Machine CPU Usage](https://www.robustperception.io/understanding-machine-cpu-usage/) + +### Grafana AlertManager +- [Grafana Alerting Documentation](https://grafana.com/docs/grafana/latest/alerting/) +- AWS restricts outgoing connections for sending emails: [AWS Documentation](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-resource-limits.html#port-25-throttle) +- Domain for AWS SES can be anything, it won't require verification to send emails to verified email addresses. +- Simple SMTP server deployment [docker-postfix](https://github.com/bokysan/docker-postfix) +- [Tool to impose load](https://linux.die.net/man/1/stress) +- [Helm Chart for Grafana](https://github.com/bitnami/charts/tree/main/bitnami/grafana) +- [Use configuration files to provision alerting resources](https://grafana.com/docs/grafana/latest/alerting/set-up/provision-alerting-resources/file-provisioning/) \ No newline at end of file diff --git a/devops/modules/4_monitoring-configuration/task_8.md b/devops/modules/4_monitoring-configuration/task_8.md deleted file mode 100644 index e6ed4be55..000000000 --- a/devops/modules/4_monitoring-configuration/task_8.md +++ /dev/null @@ -1,46 +0,0 @@ -# Task 8: Grafana Installation and Dashboard Creation - -## Objective - -In this task, you will install Grafana on your Kubernetes (K8s) cluster using a Helm chart and create a dashboard to visualize Prometheus metrics. - -## Steps - -1. **Install Grafana** - - Follow the instructions to install Grafana using the Helm chart by Bitnami. Refer to the [Grafana documentation](https://grafana.com/docs/) for more details. -2. **Configure Grafana** - - Add a new data source pointing to the existing Prometheus installation. -3. **Create a Dashboard** - - Create a dashboard with basic metrics visualized, such as CPU and memory utilization, storage usage, etc. -4. **Document the Setup** - - Create a README file documenting the Grafana setup, including the dashboard creation. - -## Submission - -- Provide a PR with automation of a Grafana deployment in Kubernetes with IaC or CI/CD pipeline. -- Provide an output of `kubectl get pods` with running Grafana. -- Include a screenshot or configuration of the Prometheus data source configuration. -- Include a screenshot of the dashboard created. -- Include a JSON file of the dashboard layout. -- Provide a README file documenting the Grafana deployment and configuration. - -## Evaluation Criteria (100 points for covering all criteria) - -1. **Grafana Installation (30 points)** - - Grafana is installed on the K8s cluster using the Helm chart by Bitnami. - - A data source pointing to the existing Prometheus installation is added. -2. **Dashboard Creation (40 points)** - - A dashboard is created with basic metrics visualized, such as CPU and memory utilization, storage usage, etc. -3. **Deployment Automation (10 points)** - - Automation of deployment with IaC or CI/CD pipeline is created. -4. **Additional Tasks (20 points)** - - Admin user password is created with a separate secret. (10 points) - - A JSON file of the dashboard layout is provided. (5 points) - - The Grafana setup, including the dashboard creation, is documented in a README file. (5 points) - -## References - -- [Grafana Documentation](https://grafana.com/docs/) -- [Helm Chart for Grafana](https://github.com/bitnami/charts/tree/main/bitnami/grafana) -- [Tool to impose load](https://linux.die.net/man/1/stress) -- [Understanding Machine CPU Usage](https://www.robustperception.io/understanding-machine-cpu-usage/) diff --git a/devops/modules/4_monitoring-configuration/task_9.md b/devops/modules/4_monitoring-configuration/task_9.md deleted file mode 100644 index 0a0ca5476..000000000 --- a/devops/modules/4_monitoring-configuration/task_9.md +++ /dev/null @@ -1,64 +0,0 @@ -# Task 9: Alertmanager Configuration and Verification - -## Objective - -In this task, you will configure Grafana Alerting to send alerts for specific events in your Kubernetes (K8s) cluster and verify that the alerts are received. - -## Steps - -1. **Configure SMTP for Grafana** - - Configure SMTP server - - For local setup you can consider any SMTP server - - To send emails in AWS consider using Amazon SES (Simple Email Service) - - Configure Grafana SMTP settings - - Local setup will need only host:port and probably skipVerify to bypass tls verification - - AWS SES will need host:port, authentication details and "from address" which must be verified in SES -2. **Configure Contact points** - - Add your email as a contact point. (this email should be also verified if you are using AWS SES) -3. **Configure Alert Rules** - - Configure alerts for the following events: - - High CPU utilization on any node of the cluster. - - Lack of RAM capacity on any node of the cluster. - - Ensure alerts are delivered to your email address. -4. **Verify Alerts** - - Simulate CPU and memory stress on a Kubernetes node using tools like `stress` or `sysbench`. -5. **Additional Tasks** - - Document the setup and alert configuration in a README file. - -## Submission - -- Provide a PR with the changes in configuration files. -- Include into PR (description or in changes) screenshots of: - - Contact Points. - - Alert Rules in normal and firing state. - - Alert Rules configuration. - - Received emails. -- Provide a README file documenting the setup and alert configuration. - **Note:** Ensure that all personal data, such as email addresses and SMTP credentials, are hidden in screenshots, code and documentation. - -## Evaluation Criteria (100 points for covering all criteria) - -1. **Contact Points created (10 points)** -2. **Alert Rules created (40 points)** - - Alert Rules are configured to send alerts for the following events: - - High CPU utilization on any node of the cluster. - - Lack of RAM capacity on any node of the cluster. - - Alerts are configured to be delivered to your email address. -3. **Alert Rules are working as expected (20 points)** - - Alert Rules are firing when the specified events occur. -4. **Email is received (10 points)** -5. **Additional Tasks (20 points)** - - **Documentation (10 points)** - - The Alertmanager setup and alert configuration are documented in a README file. - - **Configuration is done completely in code (10 points)** - - Alert Rules, Contact Points, and SMTP settings are configured using YAML files or other code-based methods. - -## Additional Resources - -- [Grafana Alerting Documentation](https://grafana.com/docs/grafana/latest/alerting/) -- AWS restricts outgoing connections for sending emails: [AWS Documentation](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-resource-limits.html#port-25-throttle) -- Domain for AWS SES can be anything, it won't require verification to send emails to verified email addresses. -- Simple SMTP server deployment [docker-postfix](https://github.com/bokysan/docker-postfix) -- [Tool to impose load](https://linux.die.net/man/1/stress) -- [Helm Chart for Grafana](https://github.com/bitnami/charts/tree/main/bitnami/grafana) -- [Use configuration files to provision alerting resources](https://grafana.com/docs/grafana/latest/alerting/set-up/provision-alerting-resources/file-provisioning/) diff --git a/devops/visual_assets/task_1.png b/devops/visual_assets/task_1.png new file mode 100644 index 000000000..a4301fb4b Binary files /dev/null and b/devops/visual_assets/task_1.png differ diff --git a/devops/visual_assets/task_2.png b/devops/visual_assets/task_2.png new file mode 100644 index 000000000..facaf7754 Binary files /dev/null and b/devops/visual_assets/task_2.png differ diff --git a/devops/visual_assets/task_3.png b/devops/visual_assets/task_3.png new file mode 100644 index 000000000..673557c2a Binary files /dev/null and b/devops/visual_assets/task_3.png differ diff --git a/devops/visual_assets/task_4-6.png b/devops/visual_assets/task_4-6.png new file mode 100644 index 000000000..3a27a6a8d Binary files /dev/null and b/devops/visual_assets/task_4-6.png differ diff --git a/devops/visual_assets/task_7.png b/devops/visual_assets/task_7.png new file mode 100644 index 000000000..5c2c57411 Binary files /dev/null and b/devops/visual_assets/task_7.png differ