You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A Python script that is designed to monitor the health of APIs running on pods
in a Kubernetes cluster. Here's a brief overview of its functionality:
The script starts by loading configuration details from a JSON file configuration file. AWS credentials and other details like the cluster name, namespace, and service port are also loaded from the configuration file.
And Two Prometheus metrics are defined:
REQUEST_TIME and REQUEST_STATUS. These metrics will be used to monitor the time spent processing requests and the status of the requests, respectively. The script sets AWS credentials and connects to an EKS cluster using the AWS CLI The script then enters a main loop where it wipes all metrics using admin API of pushgateway, information about the running pods, and sends requests to the APls running on these pods. The response time and status code of these requests are recorded as Prometheus metrics. If there's an error during the execution of the subprocess commands or the main loop, the script logs the error and sets the script status to 'ERROR'. The script sleeps for 30 seconds before starting the next iteration of the main loop.
The issue I'm facing is. Let us assume 2 pods are running and this script sends a request to both the pods and pushes metrics but if any 1 pod gets deleted then the script sends a request to 1 pod and pushes that particular pod's metrics but the old pod's metrics are still present at /metrics endpoint although I'm using admin api to wipe the job(delete all metrics) completely at the start of each iteration. Can someone suggest some workarounds to fix this?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hi Everyone,
Let me give a brief on the task I'm working on.
A Python script that is designed to monitor the health of APIs running on pods
in a Kubernetes cluster. Here's a brief overview of its functionality:
The script starts by loading configuration details from a JSON file configuration file. AWS credentials and other details like the cluster name, namespace, and service port are also loaded from the configuration file.
And Two Prometheus metrics are defined:
REQUEST_TIME and REQUEST_STATUS. These metrics will be used to monitor the time spent processing requests and the status of the requests, respectively. The script sets AWS credentials and connects to an EKS cluster using the AWS CLI The script then enters a main loop where it wipes all metrics using admin API of pushgateway, information about the running pods, and sends requests to the APls running on these pods. The response time and status code of these requests are recorded as Prometheus metrics. If there's an error during the execution of the subprocess commands or the main loop, the script logs the error and sets the script status to 'ERROR'. The script sleeps for 30 seconds before starting the next iteration of the main loop.
The issue I'm facing is. Let us assume 2 pods are running and this script sends a request to both the pods and pushes metrics but if any 1 pod gets deleted then the script sends a request to 1 pod and pushes that particular pod's metrics but the old pod's metrics are still present at /metrics endpoint although I'm using admin api to wipe the job(delete all metrics) completely at the start of each iteration. Can someone suggest some workarounds to fix this?
Beta Was this translation helpful? Give feedback.
All reactions