Description
Hey All,
We've been using the amazon-cloudwatch-agent
for awhile now and so far we have been loving it. We set it up using the documentation found here: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Container-Insights-setup-logs-FluentBit.html
However during a security audit we recently discovered that all our AWS (EKS) EC2 instances are running IMDS with open/unauthenticated access. As this is a security vulnerability we wanted to remove the concern. To that end we updated our terraform EC2 instance templates to switch the IMDS interface to authenticated and 1 hop
http_endpoint = "enabled"
http_tokens = "required"
http_put_response_hop_limit = 1
}
This worked as expected the IMDS interface is still open on HTTPS and required authentication which we assume should work through the service account provisioned for the cluster. We currently have 2 node-groups setup NodeGroupA was configured for IMDs https/auth/1 hop and NodeGroupB was left the default http/open access. However after we made the change, we didn't notice until several days later that all the cloudwatch-agents
on the NodeGroupA started failing continuously
NAME READY STATUS RESTARTS AGE
cloudwatch-agent-5hbk8 1/1 Running 0 38d
cloudwatch-agent-lp688 1/1 Running 0 38d
cloudwatch-agent-tzzpm 1/1 Running 5938 38d
cwagent-prometheus-57d597f5c-65cvw 1/1 Running 0 52d
fluent-bit-245ns 1/1 Running 0 64d
fluent-bit-k67d8 1/1 Running 0 64d
fluent-bit-s84tg 1/1 Running 0 52d
This has resulted in a total lose of cloudwatch logs for any pods in the NodeGroupA and all CloudInsight Performance data is also being lost for the same nodegroup. We are hoping someone can help us with this issue, as it would seem best practice not to leave the IMDs with open access. We've temporarily enabled aws-for-fluent-bit
on the cluster to at least ensure our cloudwatch logs keep coming in but we'd like to stick with your tool if we can.
Here are the logs for a standard failure on NodeGroupA:
+ cloudwatch-agent-tzzpm › cloudwatch-agent
cloudwatch-agent-tzzpm cloudwatch-agent 2021/07/26 20:20:34 I! 2021/07/26 20:20:31 E! ec2metadata is not available
cloudwatch-agent-tzzpm cloudwatch-agent 2021/07/26 20:20:31 I! attempt to access ECS task metadata to determine whether I'm running in ECS.
cloudwatch-agent-tzzpm cloudwatch-agent 2021/07/26 20:20:32 W! retry [0/3], unable to get http response from http://169.254.170.2/v2/metadata, error: unable to get response from http://169.254.170.2/v2/metadata, error: Get "http://169.254.170.2/v2/metadata": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
cloudwatch-agent-tzzpm cloudwatch-agent 2021/07/26 20:20:33 W! retry [1/3], unable to get http response from http://169.254.170.2/v2/metadata, error: unable to get response from http://169.254.170.2/v2/metadata, error: Get "http://169.254.170.2/v2/metadata": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
cloudwatch-agent-tzzpm cloudwatch-agent 2021/07/26 20:20:34 W! retry [2/3], unable to get http response from http://169.254.170.2/v2/metadata, error: unable to get response from http://169.254.170.2/v2/metadata, error: Get "http://169.254.170.2/v2/metadata": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
cloudwatch-agent-tzzpm cloudwatch-agent 2021/07/26 20:20:34 I! access ECS task metadata fail with response unable to get response from http://169.254.170.2/v2/metadata, error: Get "http://169.254.170.2/v2/metadata": context deadline exceeded (Client.Timeout exceeded while awaiting headers), assuming I'm not running in ECS.
cloudwatch-agent-tzzpm cloudwatch-agent I! Detected the instance is OnPrem
cloudwatch-agent-tzzpm cloudwatch-agent 2021/07/26 20:20:34 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/bin/default_linux_config.json ...
cloudwatch-agent-tzzpm cloudwatch-agent /opt/aws/amazon-cloudwatch-agent/bin/default_linux_config.json does not exist or cannot read. Skipping it.
cloudwatch-agent-tzzpm cloudwatch-agent 2021/07/26 20:20:34 Reading json config file path: /etc/cwagentconfig/..2021_06_18_17_23_34.900788054/cwagentconfig.json ...
cloudwatch-agent-tzzpm cloudwatch-agent 2021/07/26 20:20:34 Find symbolic link /etc/cwagentconfig/..data
cloudwatch-agent-tzzpm cloudwatch-agent 2021/07/26 20:20:34 Find symbolic link /etc/cwagentconfig/cwagentconfig.json
cloudwatch-agent-tzzpm cloudwatch-agent 2021/07/26 20:20:34 Reading json config file path: /etc/cwagentconfig/cwagentconfig.json ...
cloudwatch-agent-tzzpm cloudwatch-agent Valid Json input schema.
cloudwatch-agent-tzzpm cloudwatch-agent Got Home directory: /root
cloudwatch-agent-tzzpm cloudwatch-agent No csm configuration found.
cloudwatch-agent-tzzpm cloudwatch-agent No metric configuration found.
cloudwatch-agent-tzzpm cloudwatch-agent Configuration validation first phase succeeded
cloudwatch-agent-tzzpm cloudwatch-agent
cloudwatch-agent-tzzpm cloudwatch-agent 2021/07/26 20:20:34 I! Config has been translated into TOML /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml
cloudwatch-agent-tzzpm cloudwatch-agent 2021-07-26T20:20:34Z I! Starting AmazonCloudWatchAgent 1.247347.6
cloudwatch-agent-tzzpm cloudwatch-agent 2021-07-26T20:20:34Z I! Loaded inputs: k8sapiserver cadvisor
cloudwatch-agent-tzzpm cloudwatch-agent 2021-07-26T20:20:34Z I! Loaded aggregators:
cloudwatch-agent-tzzpm cloudwatch-agent 2021-07-26T20:20:34Z I! Loaded processors: ec2tagger k8sdecorator
cloudwatch-agent-tzzpm cloudwatch-agent 2021-07-26T20:20:34Z I! Loaded outputs: cloudwatchlogs
cloudwatch-agent-tzzpm cloudwatch-agent 2021-07-26T20:20:34Z I! Tags enabled:
cloudwatch-agent-tzzpm cloudwatch-agent 2021-07-26T20:20:34Z I! [agent] Config: Interval:1m0s, Quiet:false, Hostname:"ip-10-0-2-81.us-east-2.compute.internal", Flush Interval:1s
cloudwatch-agent-tzzpm cloudwatch-agent 2021-07-26T20:20:34Z I! [logagent] starting
cloudwatch-agent-tzzpm cloudwatch-agent 2021-07-26T20:20:34Z I! [logagent] found plugin cloudwatchlogs is a log backend
cloudwatch-agent-tzzpm cloudwatch-agent 2021-07-26T20:24:35Z E! [processors.ec2tagger] ec2tagger: Unable to retrieve InstanceId. This plugin must only be used on an EC2 instance
cloudwatch-agent-tzzpm cloudwatch-agent 2021-07-26T20:24:35Z E! [telegraf] Error running agent: could not initialize processor ec2tagger: ec2tagger: Unable to retrieve InstanceId. This plugin must only be used on an EC2 instance
We noticed the lines about failing to connect to IMDS and then it mentioned Detected the instance is OnPrem
which after some google search we found this issue: aws-samples/amazon-cloudwatch-container-insights#56, so we attempted to edit the daemonset config and add the RUN_IN_AWS=True
the logs did change to indicate it picked up the option, but we are still getting failures:
+ cloudwatch-agent-9w9zh › cloudwatch-agent
cloudwatch-agent-r48hd cloudwatch-agent 2021-07-26T20:52:17Z I! [processors.ec2tagger] ec2tagger: EC2 tagger has started initialization.
cloudwatch-agent-r48hd cloudwatch-agent 2021-07-26T20:52:17Z I! k8sapiserver Switch New Leader: ip-10-0-2-62.us-east-2.compute.internal
cloudwatch-agent-9w9zh cloudwatch-agent 2021/07/26 20:52:21 I! I! Detected from ENV instance is EC2
cloudwatch-agent-9w9zh cloudwatch-agent 2021/07/26 20:52:14 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/bin/default_linux_config.json ...
cloudwatch-agent-9w9zh cloudwatch-agent /opt/aws/amazon-cloudwatch-agent/bin/default_linux_config.json does not exist or cannot read. Skipping it.
cloudwatch-agent-9w9zh cloudwatch-agent 2021/07/26 20:52:14 Reading json config file path: /etc/cwagentconfig/..2021_07_26_20_48_04.766796875/cwagentconfig.json ...
cloudwatch-agent-9w9zh cloudwatch-agent 2021/07/26 20:52:14 Find symbolic link /etc/cwagentconfig/..data
cloudwatch-agent-9w9zh cloudwatch-agent 2021/07/26 20:52:14 Find symbolic link /etc/cwagentconfig/cwagentconfig.json
cloudwatch-agent-9w9zh cloudwatch-agent 2021/07/26 20:52:14 Reading json config file path: /etc/cwagentconfig/cwagentconfig.json ...
cloudwatch-agent-9w9zh cloudwatch-agent Valid Json input schema.
cloudwatch-agent-9w9zh cloudwatch-agent 2021/07/26 20:52:18 E! ec2metadata is not available
cloudwatch-agent-9w9zh cloudwatch-agent 2021/07/26 20:52:18 I! attempt to access ECS task metadata to determine whether I'm running in ECS.
cloudwatch-agent-9w9zh cloudwatch-agent 2021/07/26 20:52:19 W! retry [0/3], unable to get http response from http://169.254.170.2/v2/metadata, error: unable to get response from http://169.254.170.2/v2/metadata, error: Get "http://169.254.170.2/v2/metadata": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
cloudwatch-agent-9w9zh cloudwatch-agent 2021/07/26 20:52:20 W! retry [1/3], unable to get http response from http://169.254.170.2/v2/metadata, error: unable to get response from http://169.254.170.2/v2/metadata, error: Get "http://169.254.170.2/v2/metadata": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
cloudwatch-agent-9w9zh cloudwatch-agent 2021/07/26 20:52:21 W! retry [2/3], unable to get http response from http://169.254.170.2/v2/metadata, error: unable to get response from http://169.254.170.2/v2/metadata, error: Get "http://169.254.170.2/v2/metadata": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
cloudwatch-agent-9w9zh cloudwatch-agent 2021/07/26 20:52:21 I! access ECS task metadata fail with response unable to get response from http://169.254.170.2/v2/metadata, error: Get "http://169.254.170.2/v2/metadata": context deadline exceeded (Client.Timeout exceeded while awaiting headers), assuming I'm not running in ECS.
cloudwatch-agent-9w9zh cloudwatch-agent No csm configuration found.
cloudwatch-agent-9w9zh cloudwatch-agent No metric configuration found.
cloudwatch-agent-9w9zh cloudwatch-agent Configuration validation first phase succeeded
cloudwatch-agent-9w9zh cloudwatch-agent
cloudwatch-agent-9w9zh cloudwatch-agent 2021/07/26 20:52:21 I! Config has been translated into TOML /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml
cloudwatch-agent-9w9zh cloudwatch-agent 2021-07-26T20:52:21Z I! Starting AmazonCloudWatchAgent 1.247347.6
cloudwatch-agent-9w9zh cloudwatch-agent 2021-07-26T20:52:21Z I! Loaded inputs: cadvisor k8sapiserver
cloudwatch-agent-9w9zh cloudwatch-agent 2021-07-26T20:52:21Z I! Loaded aggregators:
cloudwatch-agent-9w9zh cloudwatch-agent 2021-07-26T20:52:21Z I! Loaded processors: ec2tagger k8sdecorator
cloudwatch-agent-9w9zh cloudwatch-agent 2021-07-26T20:52:21Z I! Loaded outputs: cloudwatchlogs
cloudwatch-agent-9w9zh cloudwatch-agent 2021-07-26T20:52:21Z I! Tags enabled:
cloudwatch-agent-9w9zh cloudwatch-agent 2021-07-26T20:52:21Z I! [agent] Config: Interval:1m0s, Quiet:false, Hostname:"ip-10-0-2-81.us-east-2.compute.internal", Flush Interval:1s
cloudwatch-agent-9w9zh cloudwatch-agent 2021-07-26T20:52:21Z I! [logagent] starting
cloudwatch-agent-9w9zh cloudwatch-agent 2021-07-26T20:52:21Z I! [logagent] found plugin cloudwatchlogs is a log backend
cloudwatch-agent-qp4sq cloudwatch-agent 2021-07-26T20:52:26Z I! [processors.ec2tagger] ec2tagger: EC2 tagger has started initialization.
cloudwatch-agent-qp4sq cloudwatch-agent 2021-07-26T20:52:26Z I! k8sapiserver Switch New Leader: ip-10-0-2-62.us-east-2.compute.internal
cloudwatch-agent-qp4sq cloudwatch-agent 2021-07-26T20:52:26Z I! k8sapiserver OnStartedLeading: ip-10-0-2-62.us-east-2.compute.internal
cloudwatch-agent-r48hd cloudwatch-agent 2021-07-26T20:56:18Z I! [processors.ec2tagger] ec2tagger: Initial retrieval of tags succeded
cloudwatch-agent-r48hd cloudwatch-agent 2021-07-26T20:56:18Z I! [processors.ec2tagger] ec2tagger: EC2 tagger has started, finished initial retrieval of tags and Volumes
cloudwatch-agent-9w9zh cloudwatch-agent 2021-07-26T20:56:21Z E! [processors.ec2tagger] ec2tagger: Unable to retrieve InstanceId. This plugin must only be used on an EC2 instance
cloudwatch-agent-9w9zh cloudwatch-agent 2021-07-26T20:56:21Z E! [telegraf] Error running agent: could not initialize processor ec2tagger: ec2tagger: Unable to retrieve InstanceId. This plugin must only be used on an EC2 instance
- cloudwatch-agent-9w9zh › cloudwatch-agent
We appreciate any insights this work is being done on AWS Region us-east-2 and we are seeing it in two different accounts that we enabled the IMDS w/ auth.