Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[IBCDPE-1034] Move deployment model to ArgoCD #16

Merged
merged 1 commit into from
Aug 16, 2024

Conversation

BryanFauble
Copy link
Contributor

@BryanFauble BryanFauble commented Aug 13, 2024

Problem:

  1. When a helm release fails during a Terraform apply the "blast radius" can be quite large and put the Terraform state into a bad state that requires quite a lot of "massaging" to get it back into a working state.

Solution:

  1. Move the deployment model to ArgoCD which has significantly better tooling to handle deploying of kubernetes resources.
  2. Terraform is responsible for deploying the initial set of applications to K8s, in addition to the resource definitions that ArgoCD uses to deploy further applications on the cluster.
  3. Docs on ArgoCD: https://argo-cd.readthedocs.io/en/stable/

Overview of deployment steps:

  1. Spacelift deploys out spacelift specific resources like the 2 stacks (Infrastructure, and k8s deployment resources)
  2. The Infrastrucure stack deploys out the VPC, the EKS cluster, and several security roles required to run the EKS cluster.
  3. The K8s deployment resources deploys out the K8s auto scaler, some EKS add ons (That require a node to be running), ArgoCD via the helm terraform provider, and a number of kubernetes resources that configure ArgoCD
  4. ArgoCD then deploys any applications defined in the previous step (either manually) or automatically if auto deploy is turned on.

Testing:

1.Tested the deployment of the cluster to K8s and deploying the applications via ArgoCD

@spacelift-int-sagebionetworks spacelift-int-sagebionetworks bot temporarily deployed to spacelift/root-spacelift-administrative-stack August 13, 2024 19:14 Inactive
@spacelift-int-sagebionetworks spacelift-int-sagebionetworks bot temporarily deployed to spacelift/root-spacelift-administrative-stack August 13, 2024 19:29 Inactive
@spacelift-int-sagebionetworks spacelift-int-sagebionetworks bot temporarily deployed to spacelift/root-spacelift-administrative-stack August 13, 2024 19:39 Inactive
@spacelift-int-sagebionetworks spacelift-int-sagebionetworks bot temporarily deployed to spacelift/root-spacelift-administrative-stack August 13, 2024 19:41 Inactive
@spacelift-int-sagebionetworks spacelift-int-sagebionetworks bot temporarily deployed to spacelift/root-spacelift-administrative-stack August 13, 2024 19:48 Inactive
@spacelift-int-sagebionetworks spacelift-int-sagebionetworks bot temporarily deployed to spacelift/root-spacelift-administrative-stack August 13, 2024 20:13 Inactive
@spacelift-int-sagebionetworks spacelift-int-sagebionetworks bot temporarily deployed to spacelift/root-spacelift-administrative-stack August 13, 2024 20:54 Inactive
@spacelift-int-sagebionetworks spacelift-int-sagebionetworks bot temporarily deployed to spacelift/root-spacelift-administrative-stack August 13, 2024 21:07 Inactive
@BryanFauble BryanFauble marked this pull request as ready for review August 13, 2024 21:19
@BryanFauble BryanFauble requested a review from a team as a code owner August 13, 2024 21:19
Copy link
Contributor

@thomasyu888 thomasyu888 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR. @BWMac, did you want to take a stab at a review ?

README.md Outdated
down when we destroy the VPC and EKS cluster.


In addition to this scanning that is in place we are also taking advantage of the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note, for trivy, lets focus on the HIGH/CRITICAL and issues that we can fix: https://github.com/Sage-Bionetworks/shiny-base/blob/e5770f31fe4a7d79150a25989a34ff3c3b1a78d1/.github/workflows/trivy.yml#L60-L69

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So Trivy is smart enough to know which vulnerabilities can and cannot be fixed if you configure it a certain way?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I set 50ac49a ignoreUnfixed: true,

We saw today that the number of issues flagged was around half

@spacelift-int-sagebionetworks spacelift-int-sagebionetworks bot temporarily deployed to spacelift/root-spacelift-administrative-stack August 14, 2024 21:24 Inactive
Copy link
Contributor

@thomasyu888 thomasyu888 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't had a chance to look closer at the argo.cd config, but just a few comments.

@@ -12,5 +12,9 @@ terraform {
source = "hashicorp/helm"
version = "~> 2.0"
}
kubectl = {
source = "gavinbunney/kubectl"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why a secondary source of kubectl?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the helm provider here - I accidentally left it in.

But to answer why I moved over to using this over provider to deploy a kubectl_manifest:

hashicorp/terraform-provider-kubernetes#1367

The root of the problem is a race condition that exists when you are installing a CRD (Custom Resource definition) and the CR (Custom Resource) in the same terraform stack. These are ArgoCD specific CRDs that require the ArgoCD helm chart is installed first. The official hashicorp k8s provider requires that the CRD is already installed on the cluster during the plan stage, which has obvious problems here.

The use of this tf provider does not require that the CRD is install during plan stage and correctly applies it to the cluster in the right order as I am using a depends_on relationship between the module I have the CR and the ArgoCD module which installs the CRD.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a custom yaml config?

Copy link
Contributor Author

@BryanFauble BryanFauble Aug 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This values.yaml comes from the ArgoCD helm chart:
https://github.com/argoproj/argo-helm/tree/main/charts/argo-cd

It can also come from this helm command: https://helm.sh/docs/helm/helm_show_values/

Basically what is happening here is that I am pointing to this values.yaml file with this terraform resource:

resource "helm_release" "argo-cd" {
  name       = "argo-cd"
  repository = "https://argoproj.github.io/argo-helm"
  chart      = "argo-cd"
  namespace  = "argocd"
  version    = "7.4.3"
  depends_on = [kubernetes_namespace.argocd]

  values = [templatefile("${path.module}/templates/values.yaml", {})]
}

This is how we are configuring the specifics of how the helm chart is deployed to the cluster.


Then, for anything that ArgoCD is deploying, I am using the concept of multiple sources:
https://argo-cd.readthedocs.io/en/stable/user-guide/multiple_sources/

Multiple sources mean that any helm charts we are installing use values.yaml file that we have created in our git repo. Look at the README.md in modules/apache-airflow for more specifics on how that is configured.

Eventually, if we have more specific templating needs to dynamically configure the values.yaml files we'll want to implement something like kustomize: https://kubernetes.io/docs/tasks/manage-kubernetes-objects/kustomization/ - Since out use case is relatively simple for now I am just pointing out to a static yaml file.

@BryanFauble BryanFauble changed the base branch from ibcdpe-1007-split-into-vars to main August 16, 2024 00:04
@BryanFauble BryanFauble merged commit 25e7a6e into main Aug 16, 2024
4 of 8 checks passed
@BryanFauble BryanFauble deleted the ibcdpe-1034-argocd branch August 16, 2024 00:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants