-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[kubernetes] Collector cannot verify tls: CERTIFICATE_VERIFY_FAILED #288
Comments
Hi, Thanks for reaching out about this issue! It seems like we are already investigating this on one of our support tickets, so we will take back investigation and be coming back to you through this ticket. Regards |
Based on your documentation I'm trying to configure the certificates. This error message is quite misleading because this seems to be the real error:
But that's not the one you end up showing to the user. Doing something like this from the agent's pod gives a warning about self-signed certificates...
I think that's basically the same as the error message being reported by the python/requests library. I have no idea what pem file I could pass to it to make it happy. |
This may be caused by the way Bluemix is signing certificates. They may not be signed correctly to allow access to the kubelet from the agent's pod. We will continue to update you about this in the support ticket we have been using. |
We've shared this issue with the IBM Cloud team. Hopefully they'll chime in soon as well. |
If running inside a pod, the kubelet's certificate is validated against the cluster root CA, mounted in |
Thanks for you patience while we are looking into this! The problem can occur from different sources:
The Pod It's not mandatory for the kubelet to use certificates from the same PKI as the controller-manager. The kubelet can be started with different configurations: 1. TLS: 2. Self-signed: As a note: Kubernetes CSR: This is added as a note as it manages only communication between the kubelet client and the API server, not between te kubelet server API (e.g. In order to check if your configuration correctly matches one of those two, you can run this: curl -v --cacert /run/secrets/kubernetes.io/serviceaccount/ca.crt \
https://${status.nodeIP}:10250 \
-H "Authorization: Bearer $(< /run/secrets/kubernetes.io/serviceaccount/token)" If this doesn't work, then most probably the kubelet configuration is not correct to allow access from the agent pod. Aside of this configuration matters, during our investigation, we stumbled into an issue with Python requests version 2.11.1: verify = "/path/to/issuing_ca"
r = requests.get("https://192.168.1.1:10250/pods", verify=verify, cert=None) will produce the following stack trace:
(With the following details in the CA: Upgrading the library solves this issue, so we will at least work to provide a fix for this. This could actually be part of the issue you are encountering. We will keep looking into this and be updating you. |
After investigating the different certificates provided by Bluemix clusters, it seems like the certificate you need to access the kubernetes is located on the node in However, testing this with the agent is failing because of the already mentioned issue with the requests library which is bundled with the agent. The issue lies with an error handling the Subject Alternate Name that has to be used here. We are currently working on updating this library in the agent to fix this issue. |
I'll be honest, I haven't tried as I had basically given up on it as I just didn't have enough visibility into how the IBM servers were set up to figure it out. Thank you for following up on this. I'll give this another shot. I was looking and I realized that there is a helm chart so perhaps I'll give that a go as well, although I suspect I'll have to use my own hand-created yaml file to get the host mounts. You might want to look into making sure that thing knows how to mount the certs into the agent pod as well. I'll keep an eyes peeled for a release that closes this issue. |
@poswald Actually, given that requests issue, I would advise you to wait for the next release of the agent, which will update the library, before starting to try it again as it is deemed to fail until then. |
@antoinepouille Were you referring to 6.1? |
@irabinovitch The |
@poswald Agent 6.1.0 is now out. Could you test out the kubelet check with the new details we provided you? Please tell us if you still encounter issues then. |
I have the same issue. I just deployed agent 6.1.0 but the check breaks on not being able to verify the secure port's CA cert. This makes sense because the kubelet's server certificate is self-signed. Is there an option (envvar) I can set to ignore CA verification?
|
@mhulscher Are you using IBM cloud? The certificate setup is different according to your Kubernetes distribution. You should be able to disable it using this option, but we are working on a bug related to that option so it may not work until a next version of the agent. Cheers |
@poswald Closing the issue since it should now be all set. Don't hesitate to reach back if you need more help! |
fyi: This is still an issue, however we are in contact with IBM Cloud Container specialists to resolve this. A fix will be deployed this week to enable webhook authentication to kubelet. |
@msvechla thanks for the update, let us know how it goes 👍 |
@msvechla Do you have a ticket # or other details we could follow up with IBM on? |
hi, - hostPath:path: <HOST_PATH>name: <VOLUME_NAME>
agents.volumeMounts -- Specify additional volumes to mount in the dd-agent containervolumeMounts: - name: <VOLUME_NAME>mountPath: <CONTAINER_PATH>readOnly: true
i have add this env: fieldPath: spec.nodeName`finally I recreated the minikube cluster listening on port 6443 now i recived this error into agent log: `
is it normal that it tries to connect on other ports than 10250? |
@kerberos5 I am also receiving the exact same errors, did you end up solving this issue? |
When deploying to an IBM Cloud cluster (kubernetes 1.9) the Datadog collector does not work. This has reported to Datadog support as ticket #129722
It is still not clear to me if this is an issue with IBM or with Datadog.
There is a workaround: after disabling TLS verification
kubelet_tls_verify
in/etc/dd-agent/conf.d/kubernetes.yaml
it connects ok however, we cannot run this way in production. The bearer token path/var/run/secrets/kubernetes.io/serviceaccount/token
appears to be populated ok:**Output of the info page **
Output of the collector log:
Steps to reproduce the issue:
Describe the results you received:
Infrastructure host info in Datadog console reports:
The text was updated successfully, but these errors were encountered: