-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prometheus metrics scrapes of linkerd-proxy
are not TLS protected (occassionally)
#12634
Comments
Turns out the metrics endpoint on port 4191 is not supposed to be served behind TLS; only traffic intended for the main container is. You can verify this by looking at the logs at any
which means inbound traffic to those ports (including 4191) is let through untouched and not forwarded to the proxy. Only traffic to the proxy is then wrapped in mTLS. |
Port 4191 is the admin port for the sidecar proxy. I believe the rule you highlighted is intended to ensure that traffic to this port isn't forwarded via the proxy but rather handled directly by the proxy itself. That said, I don't think traffic to the admin port is supposed to be unencrypted. For example, while I have highlighted some instances where it is not, the vast majority (~98%) of requests to the query: |
My bad, you're actually right, traffic to 4191 is supposed to be encrypted. There are no rules enforcing that though as you can see. |
If it is important to encrypt this traffic for a specific applicaiton, could an AuthorizationPolicy (etc) be created that would enforce that requirement, and deny non-encrypted requests during these transient periods? |
Thanks for the clarification. However, I do want to note that this doesn't appear to be a transient issue during startup. It continues to affect some pods for their entire lifetime. Perhaps once the unauthenticated TCP connection is established, it is reused indefinitely? While I am not sure what other endpoints the admin port exposes, it seems somewhat concerning that anything can access it without authentication or encryption. You would know better than I do about the implications here, but is there an easy way to completely disable all non-mTLS traffic to this port across the entire cluster? |
You can change the default policy at the cluster level (via the option proxy.defaultInboundPolicy="all-authenticated") or at the namespace or workload level as explained in the docs. That will however deny all traffic to meshed pods from unmeshed pods. To specifically deny traffic to the metrics endpoint you could set up a Server resource for the apiVersion: policy.linkerd.io/v1beta2
kind: Server
metadata:
namespace: emojivoto
name: metrics
spec:
podSelector: {}
port: linkerd-admin
proxyProtocol: HTTP/1 and then an AuthorizationPolicy (also one per namespace) that would grant access only to the prometheus ServiceAccount (adjust SA and namespace according to your case): apiVersion: policy.linkerd.io/v1alpha1
kind: AuthorizationPolicy
metadata:
namespace: emojivoto
name: web-metrics
labels:
linkerd.io/extension: viz
spec:
targetRef:
group: policy.linkerd.io
kind: Server
name: metrics
requiredAuthenticationRefs:
- kind: ServiceAccount
name: prometheus
namespace: linkerd-viz |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions. |
What is the issue?
When checking my Linkerd metrics to ensure that all cluster traffic is encrypted as expected, it appears that sometimes communicating with the Linkerd2 proxies metrics endpoint happens without encryption.
There does not appear to be a discernible pattern:
How can it be reproduced?
Install promtheus via prometheus-operator (not via the Linkerd helm charts)
Install Linkerd via Helm chart with the following settings:
Ensures all pods have linkerd sizecar running
Run query
sum(rate(request_total{direction="outbound", tls!="true", target_addr=~".*4191"}[5m])) by (namespace, pod, target_addr, dst_namespace, no_tls_reason, dst_service, dst_pod_template_hash) * 5 * 60 > 0
Logs, error output, etc
Metrics from Grafana
Logs of of prometheus (grepped for the string
4191
):Logs from the prometheus sidecar:
Note the following:
target_addr
shown in the metrics does not appear in the logs even though everything was taken concurrently.output of
linkerd check -o short
Environment
1.29
on EKS2024.5.1
Possible solution
No response
Additional context
The scraping itself completes successfully with no errors.
Would you like to work on fixing this bug?
None
The text was updated successfully, but these errors were encountered: