Open
Description
When deploying to an IBM Cloud cluster (kubernetes 1.9) the Datadog collector does not work. This has reported to Datadog support as ticket #129722
It is still not clear to me if this is an issue with IBM or with Datadog.
There is a workaround: after disabling TLS verification kubelet_tls_verify
in /etc/dd-agent/conf.d/kubernetes.yaml
it connects ok however, we cannot run this way in production. The bearer token path /var/run/secrets/kubernetes.io/serviceaccount/token
appears to be populated ok:
root@dd-agent-mbshp:/# ls -al /var/run/secrets/kubernetes.io/serviceaccount/token
lrwxrwxrwx 1 root root 12 Feb 20 03:01 /var/run/secrets/kubernetes.io/serviceaccount/token -> ..data/token
**Output of the info page **
# /etc/init.d/datadog-agent info
2018-02-20 04:02:23,039 | DEBUG | dd.collector | utils.service_discovery.config(config.py:31) | No configuration backend provided for service discovery. Only auto config templates will be used.
2018-02-20 04:02:23,345 | DEBUG | dd.collector | utils.cloud_metadata(cloud_metadata.py:77) | Collecting GCE Metadata failed HTTPConnectionPool(host='169.254.169.254', port=80): Max retries exceeded with url: /computeMetadata/v1/?recursive=true (Caused by ConnectTimeoutError(<requests.packages.urllib3.connection.HTTPConnection object at 0x2b39fa0afa50>, 'Connection to 169.254.169.254 timed out. (connect timeout=0.3)'))
2018-02-20 04:02:23,348 | DEBUG | dd.collector | docker.auth.auth(auth.py:227) | Trying paths: ['/root/.docker/config.json', '/root/.dockercfg']
2018-02-20 04:02:23,348 | DEBUG | dd.collector | docker.auth.auth(auth.py:234) | No config file found
====================
Collector (v 5.22.0)
====================
Status date: 2018-02-20 04:02:21 (2s ago)
Pid: 38
Platform: Linux-4.4.0-109-generic-x86_64-with-debian-8.10
Python Version: 2.7.14, 64bit
Logs: <stderr>, /var/log/datadog/collector.log
Clocks
======
NTP offset: -0.0047 s
System UTC time: 2018-02-20 04:02:23.861592
Paths
=====
conf.d: /etc/dd-agent/conf.d
checks.d: Not found
Hostnames
=========
socket-hostname: dd-agent-mbshp
hostname: kube-tok02-crbf29e27a18ff4db58ff3873f3c748f61-w1.cloud.ibm
socket-fqdn: dd-agent-mbshp
Checks
======
ntp (1.0.0)
-----------
- Collected 0 metrics, 0 events & 0 service checks
disk (1.1.0)
------------
- instance #0 [OK]
- Collected 52 metrics, 0 events & 0 service checks
network (1.4.0)
---------------
- instance #0 [OK]
- Collected 50 metrics, 0 events & 0 service checks
docker_daemon (1.8.0)
---------------------
- instance #0 [OK]
- Collected 216 metrics, 0 events & 1 service check
kubernetes (1.5.0)
------------------
- initialize check class [ERROR]: Exception('Unable to initialize Kubelet client. Try setting the host parameter. The Kubernetes check failed permanently.',)
Emitters
========
- http_emitter [OK]
2018-02-20 04:02:26,458 | DEBUG | dd.dogstatsd | utils.service_discovery.config(config.py:31) | No configuration backend provided for service discovery. Only auto config templates will be used.
====================
Dogstatsd (v 5.22.0)
====================
Status date: 2018-02-20 04:02:20 (6s ago)
Pid: 27
Platform: Linux-4.4.0-109-generic-x86_64-with-debian-8.10
Python Version: 2.7.14, 64bit
Logs: <stderr>, /var/log/datadog/dogstatsd.log
Flush count: 358
Packet Count: 1980
Packets per second: 2.0
Metric count: 20
Event count: 0
Service check count: 0
====================
Forwarder (v 5.22.0)
====================
Status date: 2018-02-20 04:02:30 (0s ago)
Pid: 20
Platform: Linux-4.4.0-109-generic-x86_64-with-debian-8.10
Python Version: 2.7.14, 64bit
Logs: <stderr>, /var/log/datadog/forwarder.log
Queue Size: 611 bytes
Queue Length: 1
Flush Count: 1110
Transactions received: 883
Transactions flushed: 882
Transactions rejected: 0
API Key Status: API Key is valid
======================
Trace Agent (v 5.22.0)
======================
Pid: 18
Uptime: 3642 seconds
Mem alloc: 958344 bytes
Hostname: dd-agent-mbshp
Receiver: 0.0.0.0:8126
API Endpoint: https://trace.agent.datadoghq.com
--- Receiver stats (1 min) ---
--- Writer stats (1 min) ---
Traces: 0 payloads, 0 traces, 0 bytes
Stats: 0 payloads, 0 stats buckets, 0 bytes
Services: 0 payloads, 0 services, 0 bytes
Output of the collector log:
root@dd-agent-mbshp:/# head -100 /var/log/datadog/collector.log
2018-02-20 03:02:15 UTC | DEBUG | dd.collector | utils.service_discovery.config(config.py:31) | No configuration backend provided for service discovery. Only auto config templates will be used.
2018-02-20 03:02:15 UTC | DEBUG | dd.collector | utils.cloud_metadata(cloud_metadata.py:77) | Collecting GCE Metadata failed HTTPConnectionPool(host='169.254.169.254', port=80): Max retries exceeded with url: /computeMetadata/v1/?recursive=true (Caused by ConnectTimeoutError(<requests.packages.urllib3.connection.HTTPConnection object at 0x2ab4b4a3aad0>, 'Connection to 169.254.169.254 timed out. (connect timeout=0.3)'))
2018-02-20 03:02:15 UTC | DEBUG | dd.collector | docker.auth.auth(auth.py:227) | Trying paths: ['/root/.docker/config.json', '/root/.dockercfg']
2018-02-20 03:02:15 UTC | DEBUG | dd.collector | docker.auth.auth(auth.py:234) | No config file found
2018-02-20 03:02:15 UTC | INFO | dd.collector | utils.pidfile(pidfile.py:35) | Pid file is: /opt/datadog-agent/run/dd-agent.pid
2018-02-20 03:02:15 UTC | INFO | dd.collector | collector(agent.py:559) | Agent version 5.22.0
2018-02-20 03:02:15 UTC | INFO | dd.collector | daemon(daemon.py:234) | Starting
2018-02-20 03:02:15 UTC | DEBUG | dd.collector | checks.check_status(check_status.py:163) | Persisting status to /opt/datadog-agent/run/CollectorStatus.pickle
2018-02-20 03:02:15 UTC | DEBUG | dd.collector | utils.service_discovery.config(config.py:31) | No configuration backend provided for service discovery. Only auto config templates will be used.
2018-02-20 03:02:15 UTC | DEBUG | dd.collector | utils.subprocess_output(subprocess_output.py:54) | Popen(['grep', 'model name', '/host/proc/cpuinfo'], stderr = <open file '<fdopen>', mode 'w+b' at 0x2ab4b4a09780>, stdout = <open file '<fdopen>', mode 'w+b' at 0x2ab4b4a096f0>) called
2018-02-20 03:02:16 UTC | DEBUG | dd.collector | collector(kubeutil.py:264) | Couldn't query kubelet over HTTP, assuming it's not in no_auth mode.
2018-02-20 03:02:16 UTC | WARNING | dd.collector | collector(kubeutil.py:273) | Couldn't query kubelet over HTTP, assuming it's not in no_auth mode.
2018-02-20 03:02:16 UTC | ERROR | dd.collector | collector(kubeutil.py:209) | Failed to initialize kubelet connection. Will retry 0 time(s). Error: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:661)
2018-02-20 03:02:16 UTC | INFO | dd.collector | config(config.py:998) | no bundled checks.d path (checks provided as wheels): /opt/datadog-agent/agent/checks.d
2018-02-20 03:02:16 UTC | DEBUG | dd.collector | config(config.py:1012) | No sdk integrations path found
2018-02-20 03:02:16 UTC | ERROR | dd.collector | config(config.py:1076) | Unable to initialize check kubernetes
Traceback (most recent call last):
File "/opt/datadog-agent/agent/config.py", line 1060, in _initialize_check
agentConfig=agentConfig, instances=instances)
File "/opt/datadog-agent/embedded/lib/python2.7/site-packages/datadog_checks/kubernetes/kubernetes.py", line 106, in __init__
raise Exception('Unable to initialize Kubelet client. Try setting the host parameter. The Kubernetes check failed permanently.')
Exception: Unable to initialize Kubelet client. Try setting the host parameter. The Kubernetes check failed permanently.
2018-02-20 03:02:16 UTC | DEBUG | dd.collector | config(config.py:1177) | Loaded kubernetes
2018-02-20 03:02:16 UTC | DEBUG | dd.collector | config(config.py:1177) | Loaded docker_daemon
2018-02-20 03:02:16 UTC | DEBUG | dd.collector | config(config.py:1177) | Loaded ntp
2018-02-20 03:02:16 UTC | DEBUG | dd.collector | config(config.py:1177) | Loaded disk
2018-02-20 03:02:16 UTC | DEBUG | dd.collector | config(config.py:1177) | Loaded network
2018-02-20 03:02:16 UTC | DEBUG | dd.collector | config(config.py:1177) | Loaded agent_metrics
2018-02-20 03:02:16 UTC | INFO | dd.collector | config(config.py:973) | Fetching service discovery check configurations.
2018-02-20 03:02:16 UTC | ERROR | dd.collector | utils.service_discovery.sd_docker_backend(sd_docker_backend.py:123) | kubelet client not initialized, cannot retrieve pod list.
Steps to reproduce the issue:
- Create a new cluster
- Deploy the dd-agent.yml as documented
Describe the results you received:
Infrastructure host info in Datadog console reports:
Datadog's kubernetes integration is reporting:
Instance #initialization[ERROR]:Exception('Unable to initialize Kubelet client. Try setting the host parameter. The Kubernetes check failed permanently.',)
Metadata
Metadata
Assignees
Labels
No labels