-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: ‘alert -P’ gets an Unknown if everything is fine #65
Comments
Hi @wattebausch, |
Yes. ./check_prometheus_develop -H metrics.clean-cluster.example.com -p 80 alert -P
[UNKNOWN] - 0 Alerts: 0 Firing - 0 Pending - 0 Inactive
# echo $?
3 ./check_prometheus_v0.2.0_Linux_x86_64 -H metrics.clean-cluster.example.com -p 80 alert -P
[UNKNOWN] - 0 Alerts: 0 Firing - 0 Pending - 0 Inactive
|
# echo $?
3 |
Hi, right now that is just the logic of the plugin. I think the reasoning behind it was, if there's nothing there, we cannot determine the status. But I agree that it is somewhat odd behavior. If there are no alert rule defined it would also return UNKNOWN. We can improve that, just gotta keep all possible scenarios in mind for example:
What does everybody think? |
@wattebausch Thanks again for your feedback. It's very helpful! |
Without knowing much about prometheus this sounds about right, when checking for alerts it is probably OK if there are none. |
@wattebausch I reworked the logic a bit, you can check out the new code here #66 This should be more expected and flexible behavior |
Check my Version git checkout
Your branch is up to date with 'origin/no-alerts-logic'. ./check_prometheus --version
check_prometheus version development
commit: HEAD
date: latest i don't know whether the first four lines should appear in the help. FYI ./check_prometheus alert --help
Checks the status of a Prometheus alert and evaluates the status of the alert:
firing = 2
pending = 1
inactive = 0
Usage:
check_prometheus alert [flags]
Examples:
...
...
Flags:
-h, --help help for alert
-n, --name strings The name of one or more specific alerts to check.
This parameter can be repeated e.G.: '--name alert1 --name alert2'
If no name is given, all alerts will be evaluated
-T, --no-alerts-state string State to assign when no alerts are found (0, 1, 2, 3, OK, WARNING, CRITICAL, UNKNOWN). If not set this defaults to OK (default "OK")
-P, --problems Display only alerts which status is not inactive/OK. Note that in combination with the --name flag this might result in no alerts being displayed
Global Flags:
-b, --bearer string Specify the Bearer Token for server authentication (CHECK_PROMETHEUS_BEARER)
... Unfortunately, the behaviour is still the same. As far as I can see, I have compiled the correct version ./check_prometheus -H metrics.clean-cluster.example.com -p 80 alert -P -T 0
[UNKNOWN] - 0 Alerts: 0 Firing - 0 Pending - 0 Inactive
./check_prometheus -H metrics.clean-cluster.example.com -p 80 alert -P -T OK
[UNKNOWN] - 0 Alerts: 0 Firing - 0 Pending - 0 Inactive
./check_prometheus -H metrics.clean-cluster.example.com -p 80 alert -P -T 1
[UNKNOWN] - 0 Alerts: 0 Firing - 0 Pending - 0 Inactive
./check_prometheus -H metrics.clean-cluster.example.com -p 80 alert -P -T CRITICAL
[UNKNOWN] - 0 Alerts: 0 Firing - 0 Pending - 0 Inactive
./check_prometheus -H metrics.clean-cluster.example.com -p 80 alert -P --no-alerts-state OK
[UNKNOWN] - 0 Alerts: 0 Firing - 0 Pending - 0 Inactive
./check_prometheus -H metrics.clean-cluster.example.com -p 80 alert -P --no-alerts-state 0
[UNKNOWN] - 0 Alerts: 0 Firing - 0 Pending - 0 Inactive
./check_prometheus -H metrics.clean-cluster.example.com -p 80 alert -P -T OK ; echo $?
[UNKNOWN] - 0 Alerts: 0 Firing - 0 Pending - 0 Inactive
3 Here I check whether, if there is ‘Firing’, it does not mistakenly go to OK. perfect, works as it should. ./check_prometheus -H metrics.some-probelm-cluster.example.com -p 80 alert -P -T OK
[CRITICAL] - 1 Alerts: 1 Firing - 0 Pending - 0 Inactive
\_ [CRITICAL] [Watchdog] is firing - value: 1.00 ./check_prometheus -H metrics.some-probelm-cluster.example.com -p 80 alert -P -T 0
[CRITICAL] - 1 Alerts: 1 Firing - 0 Pending - 0 Inactive
\_ [CRITICAL] [Watchdog] is firing - value: 1.00 |
Thanks for the feedback, I'll have a another look |
Hi, these are the results when checking Prometheus with the new logic:
As far as I can tell, this is the behavior we want. |
Hi @martialblog many thanks. ./check_prometheus -H metrics.clean-cluster.example.com -p 80 alert -P
[UNKNOWN] - 0 Alerts: 0 Firing - 0 Pending - 0 Inactive ./check_prometheus -H metrics.clean-cluster.example.com -p 80 alert -P -T 0
[UNKNOWN] - 0 Alerts: 0 Firing - 0 Pending - 0 Inactive go run main.go -H metrics.clean-cluster.example.com -p 80 alert -P
[UNKNOWN] - 0 Alerts: 0 Firing - 0 Pending - 0 Inactive
exit status 3 go run main.go -H metrics.clean-cluster.example.com -p 80 alert -P -T 0
[UNKNOWN] - 0 Alerts: 0 Firing - 0 Pending - 0 Inactive
exit status 3 go version go1.22.2 linux/amd64 |
@wattebausch Did you use the current
|
Haven't tested it with the "-P" flag, but it's applied after checking if there are any alerts. I can give it another look. |
Yes, I made another >git pull
Already up to date.
>git log
commit a2cc3770ec1ec2714b8abdd7766fcfe690a56ea9 (HEAD -> main, origin/main, origin/HEAD)
Merge: 6d8740a 1318445
Author: Markus Opolka <[email protected]>
Date: Thu Dec 19 09:01:20 2024 +0100
Merge pull request #69 from NETWAYS/fix/update_x_net_html
Update dependencies
commit 6d8740a3ff0fc4ff90c4eb443952d4356df15cd8
Merge: 67fb855 fa2e5bd
we use kubernetes, i can only test it from our icinga server, not on local. our hosts all have multiple rules. should i delete all rules so that the prometheus is empty? ./check_prometheus -H metrics.clean-cluster.example.com -p 80 alert
[OK] - 127 Alerts: 0 Firing - 0 Pending - 127 Inactive
\_ [OK] [FluxHelmReleaseNotReady] is inactive
\_ [OK] [FluxGitRepositorySyncFailed] is inactive
... |
I just started a simple Prometheus locally, with a minimal config:
So yeah something isn't right yet.
|
Please try to fill out as much of the information below as you can. Thank you!
Which version contains the bug?
0.2.0 & development
Describe the bug
Hello there
We use check_prometheus in the ‘alert -P’ method. We only want to output errors. If we run without ‘-P’ the SMS/email will be too long with over 100 checks when an error occurs.
We have compared version ‘v0.2.0’ and ‘development’. We noticed that the versions behave differently with ‘alert’. However, both are the same, if our Prometheus is all set to ‘Inactive’, i.e. OK, then we get an ‘UNKNOWN’ back
Can you check if this is the same for you?
Thanks for your great work
How to recreate the bug?
Version: 0.2.0 (Check against a cluster without alerts)
./check_prometheus_v0.2.0_Linux_x86_64 -H metrics.clean-cluster.example.com -p 80 alert [OK] - Alerts inactive | total=123 firing=0 pending=0 inactive=123
Version: 0.2.0 (check against a cluster with problem)
Version: development (Check against a cluster without alerts)
./check_prometheus_develop -H metrics.clean-cluster.example.com -p 80 alert -P [UNKNOWN] - 0 Alerts: 0 Firing - 0 Pending - 0 Inactive # echo $? 3
Version: development (check against a cluster with problem)
./check_prometheus_develop -H metrics.some-probelm-cluster.example.com -p 80 alert -P [CRITICAL] - 1 Alerts: 1 Firing - 0 Pending - 0 Inactive \_ [CRITICAL] [Watchdog] is firing - value: 1.00
The text was updated successfully, but these errors were encountered: