More issues with Thanos Querier #93

naved001 · 2024-12-10T21:44:30Z

The issue this time was that the metrics didn't include the node for a scheduled pod.

The pod is mma-ai-7489bc5b98-ccf64 in the namespace nerc-demo-5b7ce1.

If you query kube_pod_resource_request{unit="cores"} unless on(pod, namespace) kube_pod_status_unschedulable for December 9, 2024 from the thanos querier endpoint, then it doesn't include the node name. The node name isn't a problem for cpu metrics, but it is a problem for GPU metrics. I then ran the same query to gather data from the prometheus endpoint which correctly returned the associated node.

As a solution, I was thinking I could do something like

kube_pod_resource_request{unit="cores", node!=""} unless on(pod, namespace) kube_pod_status_unschedulable

but this just explicitly ignores pods without a node name and is not ideal.

This is the 3rd issue that I've run into when trying to gather data from thanos which didn't affect prometheus.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More issues with Thanos Querier #93

More issues with Thanos Querier #93

naved001 commented Dec 10, 2024

More issues with Thanos Querier #93

More issues with Thanos Querier #93

Comments

naved001 commented Dec 10, 2024