Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calico Helm release cannot be deleted because of ServiceAccount with stuck finalizer #6629

Closed
mnikolic-ep opened this issue Aug 26, 2022 · 6 comments · Fixed by #8586
Closed

Comments

@mnikolic-ep
Copy link

Expected Behavior

When I try to delete a Calico Helm release, I expect all resources to be deleted.

Current Behavior

Sometimes, when I try to delete a Calico Helm release it will timeout.

From some investigation this seems to be because of a stuck ServiceAccount in the calico-system namespace. The ServiceAccount has a finalizer named tigera.io/cni-protector which seems to be failing to complete.

When I try to manually delete the ServiceAccount with kubectl it also just gets stuck.

$ kubectl get serviceaccount -n calico-system calico-node -o yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  creationTimestamp: "2022-08-09T18:36:29Z"
  deletionGracePeriodSeconds: 0
  deletionTimestamp: "2022-08-26T20:55:12Z"
  finalizers:
  - tigera.io/cni-protector
  name: calico-node
  namespace: calico-system
  ownerReferences:
  - apiVersion: operator.tigera.io/v1
    blockOwnerDeletion: true
    controller: true
    kind: Installation
    name: default
    uid: 795c9a46-21f3-4d7e-9f30-31bdc6569b21
  resourceVersion: "120396802"
  uid: 2dba06f9-63aa-4b0a-9cff-5c9cd236ddf2

Possible Solution

Not sure, but I tried googling the finalizer and nothing came up other than this Go package page which doesn't shed much light.

Steps to Reproduce (for bugs)

  1. Set up a K8 deployment as described below in the environment section
  2. Install Calico using the tigera-operator Helm Chart through a Terraform helm_release (version 3.23.3) (it will probably also work using the helm cli)
  3. Delete the Calico Helm release
  4. Repeat above if it doesn't happen the first few times since the issue is intermittent

Context

We have a CI pipeline that sets up a new Kubernetes deployment every night and then cleans it up. This is how we test the IaC that creates our Kubernetes deployments. The deployment includes a K8s cluster and supporting tools like Ingress Controllers, Calico, SealedSecrets controller, Sysdig, Datadog, etc.

Currently, our CI pipeline is intermittently failing because it's timing out trying to delete the Calico Helm release.

Your Environment

  • Calico version - Helm Chart tigera-operator version 3.23.3 (Managed by Terraform Helm provider)
  • Orchestrator version (e.g. kubernetes, mesos, rkt): Kubernetes 1.21 (EKS)
  • Operating System and version: Arch: amd64, Linux Distro: Amazon Linux 2, Linux Kernel: 5.4.176-91.338.amzn2.x86_64, AWS AMI: amazon-eks-node-1.21-v20220216
@caseydavenport
Copy link
Member

xref: tigera/operator#2031

^ Believe this is the same issue. There are a couple of workarounds suggested in that thread, but fundamentally to fix this we need to make some code / chart changes.

@PawelLipski
Copy link

+1, stuck on that as well every time, need to edit that ServiceAccount manually with kubectl edit to remove the finalizer

@advissor
Copy link

+1, Still the issue

This workaround helps, before deleting all namespaces related to calico
kubectl patch -n calico-system ServiceAccount/calico-node --type json --patch='[{"op":"remove","path":"/metadata/finalizers"}]'

@vijay-veeranki
Copy link

+1

Deleting the installations.operator.tigera.io default before destroying the helm tigera, does remove the finalizers on calico-system ServiceAccount/calico-node

resource "helm_release" "tigera_calico" {
  name       = "tigera-calico-release"
  chart      = "tigera-operator"
  repository = "https://projectcalico.docs.tigera.io/charts"
  namespace  = "tigera-operator"
  timeout    = 300
  version    = "3.25.0"

  depends_on = [
    kubernetes_namespace.tigera_operator,
    kubernetes_namespace.calico_system,
    kubernetes_namespace.calico_apiserver
    ]
  set {
    name  = "installation.kubernetesProvider"
    value = "EKS"
  }
}

resource "null_resource" "remove_finalizers" {
  depends_on = [helm_release.tigera_calico]

  provisioner "local-exec" {
    when    = destroy
    command = <<-EOT
      kubectl delete installations.operator.tigera.io default
    EOT
  }

  triggers = {
    helm_tigera = helm_release.tigera_calico.status
  }
}

@mw-tlhakhan
Copy link

+1

We are also having the same issue.

In the work around posted above, the dependency on the kubectl being available on the CI runner machine is another pain point.

@caseydavenport
Copy link
Member

Example PR with one approach for resolving this: tigera/operator#2662

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants