You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The issue this time was that the metrics didn't include the node for a scheduled pod.
The pod is mma-ai-7489bc5b98-ccf64 in the namespace nerc-demo-5b7ce1.
If you query kube_pod_resource_request{unit="cores"} unless on(pod, namespace) kube_pod_status_unschedulable for December 9, 2024 from the thanos querier endpoint, then it doesn't include the node name. The node name isn't a problem for cpu metrics, but it is a problem for GPU metrics. I then ran the same query to gather data from the prometheus endpoint which correctly returned the associated node.
As a solution, I was thinking I could do something like
The issue this time was that the metrics didn't include the node for a scheduled pod.
The pod is
mma-ai-7489bc5b98-ccf64
in the namespacenerc-demo-5b7ce1
.If you query
kube_pod_resource_request{unit="cores"} unless on(pod, namespace) kube_pod_status_unschedulable
for December 9, 2024 from the thanos querier endpoint, then it doesn't include the node name. The node name isn't a problem for cpu metrics, but it is a problem for GPU metrics. I then ran the same query to gather data from the prometheus endpoint which correctly returned the associated node.As a solution, I was thinking I could do something like
but this just explicitly ignores pods without a node name and is not ideal.
This is the 3rd issue that I've run into when trying to gather data from thanos which didn't affect prometheus.
The text was updated successfully, but these errors were encountered: