cert-manager is creating/using invalid certs #1264

goern · 2024-09-19T09:42:20Z

What happened:

{"level":"info","ts":"2024-09-19T09:36:09.018Z","logger":"certificate/Manager.Reconcile","msg":"TLS certificate chain failed verification, forcing rotation, err: failed verifying TLS secret openshift-nmstate/nmstate-webhook: CA bundle and CA secret certificate are different","webhookType":"Mutating","webhookName":"nmstate","Request.Namespace":"openshift-nmstate","Request.Name":"nmstate-ca"}
{"level":"info","ts":"2024-09-19T09:36:09.018Z","logger":"certificate/Manager","msg":"Rotating CA cert/key","webhookType":"Mutating","webhookName":"nmstate"}
{"level":"info","ts":"2024-09-19T09:36:09.255Z","logger":"certificate/Manager","msg":"Reset CA bundle with one cert for webhook","webhookType":"Mutating","webhookName":"nmstate"}
{"level":"info","ts":"2024-09-19T09:36:09.255Z","logger":"certificate/Manager","msg":"Updating CA bundle for webhook","webhookType":"Mutating","webhookName":"nmstate"}
{"level":"info","ts":"2024-09-19T09:36:09.324Z","logger":"certificate/Manager","msg":"Rotating Services cert/key","webhookType":"Mutating","webhookName":"nmstate"}
{"level":"info","ts":"2024-09-19T09:36:10.074Z","logger":"certificate/Manager","msg":"Bad TLS certificate chain, forcing rotation: failed verifying TLS secret openshift-nmstate/nmstate-webhook: CA bundle and CA secret certificate are different","webhookType":"Mutating","webhookName":"nmstate"}
{"level":"info","ts":"2024-09-19T09:36:10.074Z","logger":"certificate/Manager","msg":"elapsedToRotateCAFromLastDeadline {now: 2024-09-19 09:36:10.074114113 +0000 UTC m=+37506.421748431, deadline: 2026-07-25 15:57:39 +0000 UTC, elapsedToRotate: 16182h21m28.925885887s}","webhookType":"Mutating","webhookName":"nmstate"}
{"level":"info","ts":"2024-09-19T09:36:10.074Z","logger":"certificate/Manager","msg":"Certificate expiration is 2025-03-20 21:36:10 +0000 UTC, totalDuration is 1.5768002e+16, rotation deadline is 2025-03-19 21:36:10 +0000 UTC","webhookType":"Mutating","webhookName":"nmstate"}
{"level":"info","ts":"2024-09-19T09:36:10.074Z","logger":"certificate/Manager","msg":"elapsedToRotateServicesFromLastDeadline{now: 2024-09-19 09:36:10.074711695 +0000 UTC m=+37506.422346022, deadline: 2025-03-19 21:36:10 +0000 UTC, elapsedToRotate: 4355h59m59.925288305s}","webhookType":"Mutating","webhookName":"nmstate"}
{"level":"info","ts":"2024-09-19T09:36:10.074Z","logger":"certificate/Manager.earliestElapsedForCACertsCleanup","msg":"{now: 2024-09-19 09:36:10.074812543 +0000 UTC m=+37506.422446863, deadline: 2025-06-26 15:57:39 +0000 UTC, elapsedForCleanup: 6726h21m28.925187457s}","webhookType":"Mutating","webhookName":"nmstate"}
{"level":"info","ts":"2024-09-19T09:36:10.074Z","logger":"certificate/Manager.earliestElapsedForServiceCertsCleanup","msg":"{now: 2024-09-19 09:36:10.074911316 +0000 UTC m=+37506.422545634, deadline: 2025-03-20 21:36:10 +0000 UTC, elapsedForCleanup: 4379h59m59.925088684s}","webhookType":"Mutating","webhookName":"nmstate","service":{"name":"nmstate-webhook","namespace":"openshift-nmstate"}}
{"level":"info","ts":"2024-09-19T09:36:10.074Z","logger":"certificate/Manager","msg":"Calculating RequeueAfter","webhookType":"Mutating","webhookName":"nmstate","elapsedToRotateCA":58256488.925885886,"elapsedToRotateServices":15681599.925288305,"elapsedForCABundleCleanup":24214888.925187457,"elapsedForServiceCertsCleanup":15767999.925088683}
{"level":"info","ts":"2024-09-19T09:36:10.074Z","logger":"certificate/Manager","msg":"Certificates will be Reconcile on 2025-03-19 21:36:10.000227114 +0000 UTC m=+15719106.347861433","webhookType":"Mutating","webhookName":"nmstate"}
{"level":"info","ts":"2024-09-19T09:36:10.075Z","logger":"certificate/Manager.Reconcile","msg":"Reconciling Certificates","webhookType":"Mutating","webhookName":"nmstate","Request.Namespace":"openshift-nmstate","Request.Name":"nmstate-webhook"}
{"level":"info","ts":"2024-09-19T09:36:10.075Z","logger":"certificate/Manager","msg":"elapsedToRotateCAFromLastDeadline {now: 2024-09-19 09:36:10.075007382 +0000 UTC m=+37506.422641699, deadline: 2026-07-25 15:57:39 +0000 UTC, elapsedToRotate: 16182h21m28.924992618s}","webhookType":"Mutating","webhookName":"nmstate"}
{"level":"info","ts":"2024-09-19T09:36:10.075Z","logger":"certificate/Manager","msg":"elapsedToRotateServicesFromLastDeadline{now: 2024-09-19 09:36:10.075016135 +0000 UTC m=+37506.422650455, deadline: 2025-03-19 21:36:10 +0000 UTC, elapsedToRotate: 4355h59m59.924983865s}","webhookType":"Mutating","webhookName":"nmstate"}

What you expected to happen:
valide certificates?!

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:
I'm using registry.redhat.io/openshift4/ose-kubernetes-nmstate-handler-rhel9@sha256:00eb91c1ff12cbf5c1cf0dfbc5476ff2fc78ad24c62e1fb3f1352d9bf51cc980 but this issue seems to be rooted in the upstream project.

Environment:

NodeNetworkState on affected nodes (use kubectl get nodenetworkstate <node_name> -o yaml):
ok
Problematic NodeNetworkConfigurationPolicy:
n/a
kubernetes-nmstate image (use kubectl get pods --all-namespaces -l app=kubernetes-nmstate -o jsonpath='{.items[0].spec.containers[0].image}'):
registry.redhat.io/openshift4/ose-kubernetes-nmstate-handler-rhel9@sha256:00eb91c1ff12cbf5c1cf0dfbc5476ff2fc78ad24c62e1fb3f1352d9bf51cc980%
NetworkManager version (use nmcli --version):
n/a
Kubernetes version (use kubectl version):

Client Version: 4.16.0-202407111006.p0.gfa84651.assembly.stream.el9-fa84651
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: 4.16.13
Kubernetes Version: v1.29.8+f10c92d

OS (e.g. from /etc/os-release):

NAME="Red Hat Enterprise Linux"
VERSION="9.2 (Plow)"
ID="rhel"
ID_LIKE="fedora"
VERSION_ID="9.2"
PLATFORM_ID="platform:el9"
PRETTY_NAME="Red Hat Enterprise Linux 9.2 (Plow)"
ANSI_COLOR="0;31"
LOGO="fedora-logo-icon"
CPE_NAME="cpe:/o:redhat:enterprise_linux:9::baseos"
HOME_URL="https://www.redhat.com/"
DOCUMENTATION_URL="https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9"
BUG_REPORT_URL="https://issues.redhat.com/"

REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 9"
REDHAT_BUGZILLA_PRODUCT_VERSION=9.2
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="9.2"

Others:
n/a

The text was updated successfully, but these errors were encountered:

jorti · 2024-09-19T10:21:51Z

I've discovered that I have the same problem. Today the creation of a new NNCP failed in my OCP 4.16 cluster:

Error from server (InternalError): error when creating "NNCP.yaml": Internal error occurred: failed calling webhook "nodenetworkconfigurationpolicies-mutate.nmstate.io": failed to call webhook: Post "https://nmstate-webhook.openshift-nmstate.svc:443/nodenetworkconfigurationpolicies-mutate?timeout=10s": tls: failed to verify certificate: x509: certificate signed by unknown authority

The nmstate-cert-manager pod is in a tight loop logging this:

{"level":"info","ts":"2024-09-19T10:15:20.343Z","logger":"certificate/Manager","msg":"Bad TLS certificate chain, forcing rotation: failed verifying TLS secret openshift-nmstate/nmstate-webhook: CA bundle and CA secret certificate are different","webhookType":"Mutating","webhookName":"nmstate"}
{"level":"info","ts":"2024-09-19T10:15:20.343Z","logger":"certificate/Manager","msg":"Bad TLS certificate chain, forcing rotation: failed verifying TLS secret openshift-nmstate/nmstate-webhook: CA bundle and CA secret certificate are different","webhookType":"Mutating","webhookName":"nmstate"}
{"level":"info","ts":"2024-09-19T10:15:20.343Z","logger":"certificate/Manager","msg":"elapsedToRotateCAFromLastDeadline {now: 2024-09-19 10:15:20.343431226 +0000 UTC m=+379.773562336, deadline: 2024-09-19 10:15:20.343431166 +0000 UTC m=+379.773562276, elapsedToRotate: -60ns}","webhookType":"Mutating","webhookName":"nmstate"}
{"level":"info","ts":"2024-09-19T10:15:20.343Z","logger":"certificate/Manager","msg":"Certificate expiration is 2025-03-20 22:15:20 +0000 UTC, totalDuration is 1.5768001e+16, rotation deadline is 2025-03-19 22:15:20 +0000 UTC","webhookType":"Mutating","webhookName":"nmstate"}
{"level":"info","ts":"2024-09-19T10:15:20.343Z","logger":"certificate/Manager","msg":"elapsedToRotateServicesFromLastDeadline{now: 2024-09-19 10:15:20.343622976 +0000 UTC m=+379.773754086, deadline: 2025-03-19 22:15:20 +0000 UTC, elapsedToRotate: 4355h59m59.656377024s}","webhookType":"Mutating","webhookName":"nmstate"}
{"level":"info","ts":"2024-09-19T10:15:20.343Z","logger":"certificate/Manager.earliestElapsedForCACertsCleanup","msg":"{now: 2024-09-19 10:15:20.343654385 +0000 UTC m=+379.773785485, deadline: 2026-11-14 05:53:01 +0000 UTC, elapsedForCleanup: 18859h37m40.656345615s}","webhookType":"Mutating","webhookName":"nmstate"}
{"level":"info","ts":"2024-09-19T10:15:20.343Z","logger":"certificate/Manager.earliestElapsedForServiceCertsCleanup","msg":"{now: 2024-09-19 10:15:20.343695882 +0000 UTC m=+379.773826983, deadline: 2025-03-20 22:15:20 +0000 UTC, elapsedForCleanup: 4379h59m59.656304118s}","webhookType":"Mutating","webhookName":"nmstate","service":{"name":"nmstate-webhook","namespace":"openshift-nmstate"}}
{"level":"info","ts":"2024-09-19T10:15:20.343Z","logger":"certificate/Manager","msg":"Calculating RequeueAfter","webhookType":"Mutating","webhookName":"nmstate","elapsedToRotateCA":-0.00000006,"elapsedToRotateServices":15681599.656377023,"elapsedForCABundleCleanup":67894660.65634562,"elapsedForServiceCertsCleanup":15767999.656304117}
{"level":"info","ts":"2024-09-19T10:15:20.343Z","logger":"certificate/Manager","msg":"Certificates will be Reconcile on 2024-09-19 10:15:20.34371045 +0000 UTC m=+379.773841560","webhookType":"Mutating","webhookName":"nmstate"}
{"level":"info","ts":"2024-09-19T10:15:20.343Z","logger":"certificate/Manager.Reconcile","msg":"Reconciling Certificates","webhookType":"Mutating","webhookName":"nmstate","Request.Namespace":"openshift-nmstate","Request.Name":"nmstate-ca"}

Kubernetes nmstate operator 4.16.0-202409111235

nmstate-cert-manager image:
registry.redhat.io/openshift4/ose-kubernetes-nmstate-handler-rhel9@sha256:a33cc00576fc6a7b30d36c6b91d6524eef53700f8b4706e2fd5b83d61791511e

nmstate-webhook image:
registry.redhat.io/openshift4/ose-kubernetes-nmstate-handler-rhel9@sha256:00eb91c1ff12cbf5c1cf0dfbc5476ff2fc78ad24c62e1fb3f1352d9bf51cc980

jorti · 2024-09-19T11:33:11Z

I've noticed that the nmstate-ca secret was being recreated several times per seconds, causing a high CPU usage.

After removing the operator, deleting the openshift-nmstate namespace and reinstalling it again, now works.

One thing that I've observed is that the nmstate-cert-manager pod no longer exists. Maybe related to #1263 ?

seb2020 · 2024-09-19T11:37:27Z

We have the same issue with the same version of nmstate in OpenShift. We have reverted to last version (4.16.0-202409032335) and the issue is not present.

goern · 2024-09-19T12:02:45Z

After removing the operator, deleting the openshift-nmstate namespace and reinstalling it again, now works.

can confirm this.

bverschueren · 2024-09-19T19:01:07Z

Looks like cleaning up the nmstate-cert-manager deployment fails since we're passing in an empty namespace, as reported by the operator:

"failed deleting obsolete cert-manager deployment at openshift: an empty namespace may not be set when a resource name is provided"

Instead of using the namespace from the returned nmstate instance (which is presumably empty) we could get it from the env, like so (bverschueren@2d42efe):

err = r.Client.Delete(ctx, &appsv1.Deployment{
	ObjectMeta: metav1.ObjectMeta{
		Namespace: os.Getenv("HANDLER_NAMESPACE"),
		Name:      os.Getenv("HANDLER_PREFIX") + "nmstate-cert-manager",
	},
})

WDYT @qinqon ?

seb2020 · 2024-09-19T19:23:42Z

I have try to remove the operator, delete the namespace and after the operator installation I see that the pod cert-manager is not present. I believe it's linked to #1263 as @jorti said

qinqon · 2024-09-20T06:33:52Z

@bverschueren @seb2020 we are reverting this at 4.16, that was a mistake, then after proper testing we will cherry-pick but just at 4.17.

qinqon · 2024-09-30T11:40:30Z

I am closing this since we have revert the behaviour

goern · 2024-10-02T21:03:34Z

/reopen

with image: 'registry.redhat.io/openshift4/ose-kubernetes-nmstate-handler-rhel9@sha256:e12c4eedd684f661bcd26f4618367f32e6d81d4ab7da3158cd6d747bdf63d175' the problem is the same

kubevirt-bot · 2024-10-02T21:03:37Z

@goern: Reopened this issue.

In response to this:

/reopen

with image: 'registry.redhat.io/openshift4/ose-kubernetes-nmstate-handler-rhel9@sha256:e12c4eedd684f661bcd26f4618367f32e6d81d4ab7da3158cd6d747bdf63d175' the problem is the same

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

qinqon · 2024-10-11T06:10:48Z

@goern latest 4.16 was reverted and 4.17 is fixed for upgrade, it should work now.

qinqon · 2024-10-16T14:47:39Z

Closing, since it got fixed at proper place.

qinqon closed this as completed Sep 30, 2024

kubevirt-bot reopened this Oct 2, 2024

qinqon closed this as completed Oct 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cert-manager is creating/using invalid certs #1264

cert-manager is creating/using invalid certs #1264

goern commented Sep 19, 2024

jorti commented Sep 19, 2024

jorti commented Sep 19, 2024

seb2020 commented Sep 19, 2024

goern commented Sep 19, 2024

bverschueren commented Sep 19, 2024

seb2020 commented Sep 19, 2024

qinqon commented Sep 20, 2024

qinqon commented Sep 30, 2024

goern commented Oct 2, 2024

kubevirt-bot commented Oct 2, 2024

qinqon commented Oct 11, 2024

qinqon commented Oct 16, 2024

cert-manager is creating/using invalid certs #1264

cert-manager is creating/using invalid certs #1264

Comments

goern commented Sep 19, 2024

jorti commented Sep 19, 2024

jorti commented Sep 19, 2024

seb2020 commented Sep 19, 2024

goern commented Sep 19, 2024

bverschueren commented Sep 19, 2024

seb2020 commented Sep 19, 2024

qinqon commented Sep 20, 2024

qinqon commented Sep 30, 2024

goern commented Oct 2, 2024

kubevirt-bot commented Oct 2, 2024

qinqon commented Oct 11, 2024

qinqon commented Oct 16, 2024