Updation of Metrics Issue #636

1d9akash · 2024-04-29T08:55:17Z

1d9akash
Apr 29, 2024

Hi Everyone,

Let me give a brief on the task I'm working on.

A Python script that is designed to monitor the health of APIs running on pods
in a Kubernetes cluster. Here's a brief overview of its functionality:

The script starts by loading configuration details from a JSON file configuration file. AWS credentials and other details like the cluster name, namespace, and service port are also loaded from the configuration file.

And Two Prometheus metrics are defined:
REQUEST_TIME and REQUEST_STATUS. These metrics will be used to monitor the time spent processing requests and the status of the requests, respectively. The script sets AWS credentials and connects to an EKS cluster using the AWS CLI The script then enters a main loop where it wipes all metrics using admin API of pushgateway, information about the running pods, and sends requests to the APls running on these pods. The response time and status code of these requests are recorded as Prometheus metrics. If there's an error during the execution of the subprocess commands or the main loop, the script logs the error and sets the script status to 'ERROR'. The script sleeps for 30 seconds before starting the next iteration of the main loop.

The issue I'm facing is. Let us assume 2 pods are running and this script sends a request to both the pods and pushes metrics but if any 1 pod gets deleted then the script sends a request to 1 pod and pushes that particular pod's metrics but the old pod's metrics are still present at /metrics endpoint although I'm using admin api to wipe the job(delete all metrics) completely at the start of each iteration. Can someone suggest some workarounds to fix this?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updation of Metrics Issue #636

{{title}}

Replies: 0 comments

Select a reply

Updation of Metrics Issue #636

1d9akash Apr 29, 2024

Replies: 0 comments

1d9akash
Apr 29, 2024