Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cert-manager is creating/using invalid certs #1264

Closed
goern opened this issue Sep 19, 2024 · 12 comments
Closed

cert-manager is creating/using invalid certs #1264

goern opened this issue Sep 19, 2024 · 12 comments

Comments

@goern
Copy link

goern commented Sep 19, 2024

What happened:

{"level":"info","ts":"2024-09-19T09:36:09.018Z","logger":"certificate/Manager.Reconcile","msg":"TLS certificate chain failed verification, forcing rotation, err: failed verifying TLS secret openshift-nmstate/nmstate-webhook: CA bundle and CA secret certificate are different","webhookType":"Mutating","webhookName":"nmstate","Request.Namespace":"openshift-nmstate","Request.Name":"nmstate-ca"}
{"level":"info","ts":"2024-09-19T09:36:09.018Z","logger":"certificate/Manager","msg":"Rotating CA cert/key","webhookType":"Mutating","webhookName":"nmstate"}
{"level":"info","ts":"2024-09-19T09:36:09.255Z","logger":"certificate/Manager","msg":"Reset CA bundle with one cert for webhook","webhookType":"Mutating","webhookName":"nmstate"}
{"level":"info","ts":"2024-09-19T09:36:09.255Z","logger":"certificate/Manager","msg":"Updating CA bundle for webhook","webhookType":"Mutating","webhookName":"nmstate"}
{"level":"info","ts":"2024-09-19T09:36:09.324Z","logger":"certificate/Manager","msg":"Rotating Services cert/key","webhookType":"Mutating","webhookName":"nmstate"}
{"level":"info","ts":"2024-09-19T09:36:10.074Z","logger":"certificate/Manager","msg":"Bad TLS certificate chain, forcing rotation: failed verifying TLS secret openshift-nmstate/nmstate-webhook: CA bundle and CA secret certificate are different","webhookType":"Mutating","webhookName":"nmstate"}
{"level":"info","ts":"2024-09-19T09:36:10.074Z","logger":"certificate/Manager","msg":"elapsedToRotateCAFromLastDeadline {now: 2024-09-19 09:36:10.074114113 +0000 UTC m=+37506.421748431, deadline: 2026-07-25 15:57:39 +0000 UTC, elapsedToRotate: 16182h21m28.925885887s}","webhookType":"Mutating","webhookName":"nmstate"}
{"level":"info","ts":"2024-09-19T09:36:10.074Z","logger":"certificate/Manager","msg":"Certificate expiration is 2025-03-20 21:36:10 +0000 UTC, totalDuration is 1.5768002e+16, rotation deadline is 2025-03-19 21:36:10 +0000 UTC","webhookType":"Mutating","webhookName":"nmstate"}
{"level":"info","ts":"2024-09-19T09:36:10.074Z","logger":"certificate/Manager","msg":"elapsedToRotateServicesFromLastDeadline{now: 2024-09-19 09:36:10.074711695 +0000 UTC m=+37506.422346022, deadline: 2025-03-19 21:36:10 +0000 UTC, elapsedToRotate: 4355h59m59.925288305s}","webhookType":"Mutating","webhookName":"nmstate"}
{"level":"info","ts":"2024-09-19T09:36:10.074Z","logger":"certificate/Manager.earliestElapsedForCACertsCleanup","msg":"{now: 2024-09-19 09:36:10.074812543 +0000 UTC m=+37506.422446863, deadline: 2025-06-26 15:57:39 +0000 UTC, elapsedForCleanup: 6726h21m28.925187457s}","webhookType":"Mutating","webhookName":"nmstate"}
{"level":"info","ts":"2024-09-19T09:36:10.074Z","logger":"certificate/Manager.earliestElapsedForServiceCertsCleanup","msg":"{now: 2024-09-19 09:36:10.074911316 +0000 UTC m=+37506.422545634, deadline: 2025-03-20 21:36:10 +0000 UTC, elapsedForCleanup: 4379h59m59.925088684s}","webhookType":"Mutating","webhookName":"nmstate","service":{"name":"nmstate-webhook","namespace":"openshift-nmstate"}}
{"level":"info","ts":"2024-09-19T09:36:10.074Z","logger":"certificate/Manager","msg":"Calculating RequeueAfter","webhookType":"Mutating","webhookName":"nmstate","elapsedToRotateCA":58256488.925885886,"elapsedToRotateServices":15681599.925288305,"elapsedForCABundleCleanup":24214888.925187457,"elapsedForServiceCertsCleanup":15767999.925088683}
{"level":"info","ts":"2024-09-19T09:36:10.074Z","logger":"certificate/Manager","msg":"Certificates will be Reconcile on 2025-03-19 21:36:10.000227114 +0000 UTC m=+15719106.347861433","webhookType":"Mutating","webhookName":"nmstate"}
{"level":"info","ts":"2024-09-19T09:36:10.075Z","logger":"certificate/Manager.Reconcile","msg":"Reconciling Certificates","webhookType":"Mutating","webhookName":"nmstate","Request.Namespace":"openshift-nmstate","Request.Name":"nmstate-webhook"}
{"level":"info","ts":"2024-09-19T09:36:10.075Z","logger":"certificate/Manager","msg":"elapsedToRotateCAFromLastDeadline {now: 2024-09-19 09:36:10.075007382 +0000 UTC m=+37506.422641699, deadline: 2026-07-25 15:57:39 +0000 UTC, elapsedToRotate: 16182h21m28.924992618s}","webhookType":"Mutating","webhookName":"nmstate"}
{"level":"info","ts":"2024-09-19T09:36:10.075Z","logger":"certificate/Manager","msg":"elapsedToRotateServicesFromLastDeadline{now: 2024-09-19 09:36:10.075016135 +0000 UTC m=+37506.422650455, deadline: 2025-03-19 21:36:10 +0000 UTC, elapsedToRotate: 4355h59m59.924983865s}","webhookType":"Mutating","webhookName":"nmstate"}

What you expected to happen:
valide certificates?!

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:
I'm using registry.redhat.io/openshift4/ose-kubernetes-nmstate-handler-rhel9@sha256:00eb91c1ff12cbf5c1cf0dfbc5476ff2fc78ad24c62e1fb3f1352d9bf51cc980 but this issue seems to be rooted in the upstream project.

Environment:

  • NodeNetworkState on affected nodes (use kubectl get nodenetworkstate <node_name> -o yaml):
    ok

  • Problematic NodeNetworkConfigurationPolicy:
    n/a

  • kubernetes-nmstate image (use kubectl get pods --all-namespaces -l app=kubernetes-nmstate -o jsonpath='{.items[0].spec.containers[0].image}'):
    registry.redhat.io/openshift4/ose-kubernetes-nmstate-handler-rhel9@sha256:00eb91c1ff12cbf5c1cf0dfbc5476ff2fc78ad24c62e1fb3f1352d9bf51cc980%

  • NetworkManager version (use nmcli --version):
    n/a

  • Kubernetes version (use kubectl version):

Client Version: 4.16.0-202407111006.p0.gfa84651.assembly.stream.el9-fa84651
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: 4.16.13
Kubernetes Version: v1.29.8+f10c92d
  • OS (e.g. from /etc/os-release):
NAME="Red Hat Enterprise Linux"
VERSION="9.2 (Plow)"
ID="rhel"
ID_LIKE="fedora"
VERSION_ID="9.2"
PLATFORM_ID="platform:el9"
PRETTY_NAME="Red Hat Enterprise Linux 9.2 (Plow)"
ANSI_COLOR="0;31"
LOGO="fedora-logo-icon"
CPE_NAME="cpe:/o:redhat:enterprise_linux:9::baseos"
HOME_URL="https://www.redhat.com/"
DOCUMENTATION_URL="https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9"
BUG_REPORT_URL="https://issues.redhat.com/"

REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 9"
REDHAT_BUGZILLA_PRODUCT_VERSION=9.2
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="9.2"
  • Others:
    n/a
@jorti
Copy link

jorti commented Sep 19, 2024

I've discovered that I have the same problem. Today the creation of a new NNCP failed in my OCP 4.16 cluster:

Error from server (InternalError): error when creating "NNCP.yaml": Internal error occurred: failed calling webhook "nodenetworkconfigurationpolicies-mutate.nmstate.io": failed to call webhook: Post "https://nmstate-webhook.openshift-nmstate.svc:443/nodenetworkconfigurationpolicies-mutate?timeout=10s": tls: failed to verify certificate: x509: certificate signed by unknown authority

The nmstate-cert-manager pod is in a tight loop logging this:

{"level":"info","ts":"2024-09-19T10:15:20.343Z","logger":"certificate/Manager","msg":"Bad TLS certificate chain, forcing rotation: failed verifying TLS secret openshift-nmstate/nmstate-webhook: CA bundle and CA secret certificate are different","webhookType":"Mutating","webhookName":"nmstate"}
{"level":"info","ts":"2024-09-19T10:15:20.343Z","logger":"certificate/Manager","msg":"Bad TLS certificate chain, forcing rotation: failed verifying TLS secret openshift-nmstate/nmstate-webhook: CA bundle and CA secret certificate are different","webhookType":"Mutating","webhookName":"nmstate"}
{"level":"info","ts":"2024-09-19T10:15:20.343Z","logger":"certificate/Manager","msg":"elapsedToRotateCAFromLastDeadline {now: 2024-09-19 10:15:20.343431226 +0000 UTC m=+379.773562336, deadline: 2024-09-19 10:15:20.343431166 +0000 UTC m=+379.773562276, elapsedToRotate: -60ns}","webhookType":"Mutating","webhookName":"nmstate"}
{"level":"info","ts":"2024-09-19T10:15:20.343Z","logger":"certificate/Manager","msg":"Certificate expiration is 2025-03-20 22:15:20 +0000 UTC, totalDuration is 1.5768001e+16, rotation deadline is 2025-03-19 22:15:20 +0000 UTC","webhookType":"Mutating","webhookName":"nmstate"}
{"level":"info","ts":"2024-09-19T10:15:20.343Z","logger":"certificate/Manager","msg":"elapsedToRotateServicesFromLastDeadline{now: 2024-09-19 10:15:20.343622976 +0000 UTC m=+379.773754086, deadline: 2025-03-19 22:15:20 +0000 UTC, elapsedToRotate: 4355h59m59.656377024s}","webhookType":"Mutating","webhookName":"nmstate"}
{"level":"info","ts":"2024-09-19T10:15:20.343Z","logger":"certificate/Manager.earliestElapsedForCACertsCleanup","msg":"{now: 2024-09-19 10:15:20.343654385 +0000 UTC m=+379.773785485, deadline: 2026-11-14 05:53:01 +0000 UTC, elapsedForCleanup: 18859h37m40.656345615s}","webhookType":"Mutating","webhookName":"nmstate"}
{"level":"info","ts":"2024-09-19T10:15:20.343Z","logger":"certificate/Manager.earliestElapsedForServiceCertsCleanup","msg":"{now: 2024-09-19 10:15:20.343695882 +0000 UTC m=+379.773826983, deadline: 2025-03-20 22:15:20 +0000 UTC, elapsedForCleanup: 4379h59m59.656304118s}","webhookType":"Mutating","webhookName":"nmstate","service":{"name":"nmstate-webhook","namespace":"openshift-nmstate"}}
{"level":"info","ts":"2024-09-19T10:15:20.343Z","logger":"certificate/Manager","msg":"Calculating RequeueAfter","webhookType":"Mutating","webhookName":"nmstate","elapsedToRotateCA":-0.00000006,"elapsedToRotateServices":15681599.656377023,"elapsedForCABundleCleanup":67894660.65634562,"elapsedForServiceCertsCleanup":15767999.656304117}
{"level":"info","ts":"2024-09-19T10:15:20.343Z","logger":"certificate/Manager","msg":"Certificates will be Reconcile on 2024-09-19 10:15:20.34371045 +0000 UTC m=+379.773841560","webhookType":"Mutating","webhookName":"nmstate"}
{"level":"info","ts":"2024-09-19T10:15:20.343Z","logger":"certificate/Manager.Reconcile","msg":"Reconciling Certificates","webhookType":"Mutating","webhookName":"nmstate","Request.Namespace":"openshift-nmstate","Request.Name":"nmstate-ca"}

Kubernetes nmstate operator 4.16.0-202409111235

nmstate-cert-manager image:
registry.redhat.io/openshift4/ose-kubernetes-nmstate-handler-rhel9@sha256:a33cc00576fc6a7b30d36c6b91d6524eef53700f8b4706e2fd5b83d61791511e

nmstate-webhook image:
registry.redhat.io/openshift4/ose-kubernetes-nmstate-handler-rhel9@sha256:00eb91c1ff12cbf5c1cf0dfbc5476ff2fc78ad24c62e1fb3f1352d9bf51cc980

@jorti
Copy link

jorti commented Sep 19, 2024

I've noticed that the nmstate-ca secret was being recreated several times per seconds, causing a high CPU usage.

After removing the operator, deleting the openshift-nmstate namespace and reinstalling it again, now works.

One thing that I've observed is that the nmstate-cert-manager pod no longer exists. Maybe related to #1263 ?

@seb2020
Copy link

seb2020 commented Sep 19, 2024

We have the same issue with the same version of nmstate in OpenShift. We have reverted to last version (4.16.0-202409032335) and the issue is not present.

@goern
Copy link
Author

goern commented Sep 19, 2024

After removing the operator, deleting the openshift-nmstate namespace and reinstalling it again, now works.

can confirm this.

@bverschueren
Copy link

Looks like cleaning up the nmstate-cert-manager deployment fails since we're passing in an empty namespace, as reported by the operator:

"failed deleting obsolete cert-manager deployment at openshift: an empty namespace may not be set when a resource name is provided"

Instead of using the namespace from the returned nmstate instance (which is presumably empty) we could get it from the env, like so (bverschueren@2d42efe):

err = r.Client.Delete(ctx, &appsv1.Deployment{
	ObjectMeta: metav1.ObjectMeta{
		Namespace: os.Getenv("HANDLER_NAMESPACE"),
		Name:      os.Getenv("HANDLER_PREFIX") + "nmstate-cert-manager",
	},
})

WDYT @qinqon ?

@seb2020
Copy link

seb2020 commented Sep 19, 2024

I have try to remove the operator, delete the namespace and after the operator installation I see that the pod cert-manager is not present. I believe it's linked to #1263 as @jorti said

@qinqon
Copy link
Member

qinqon commented Sep 20, 2024

@bverschueren @seb2020 we are reverting this at 4.16, that was a mistake, then after proper testing we will cherry-pick but just at 4.17.

@qinqon
Copy link
Member

qinqon commented Sep 30, 2024

I am closing this since we have revert the behaviour

@qinqon qinqon closed this as completed Sep 30, 2024
@goern
Copy link
Author

goern commented Oct 2, 2024

/reopen

with image: 'registry.redhat.io/openshift4/ose-kubernetes-nmstate-handler-rhel9@sha256:e12c4eedd684f661bcd26f4618367f32e6d81d4ab7da3158cd6d747bdf63d175' the problem is the same

@kubevirt-bot kubevirt-bot reopened this Oct 2, 2024
@kubevirt-bot
Copy link
Collaborator

@goern: Reopened this issue.

In response to this:

/reopen

with image: 'registry.redhat.io/openshift4/ose-kubernetes-nmstate-handler-rhel9@sha256:e12c4eedd684f661bcd26f4618367f32e6d81d4ab7da3158cd6d747bdf63d175' the problem is the same

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@qinqon
Copy link
Member

qinqon commented Oct 11, 2024

@goern latest 4.16 was reverted and 4.17 is fixed for upgrade, it should work now.

@qinqon
Copy link
Member

qinqon commented Oct 16, 2024

Closing, since it got fixed at proper place.

@qinqon qinqon closed this as completed Oct 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants