Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for keep_firing_for in the parser #713

Merged
merged 3 commits into from
Sep 11, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion cmd/pint/tests/0121_rule_for.txt
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,10 @@ rules/0001.yml:6 Bug: this alert rule must have a 'for' field with a minimum dur
rules/0001.yml:9 Bug: this alert rule must have a 'for' field with a maximum duration of 10m (rule/for)
9 | for: 13m

level=info msg="Problems found" Bug=2
rules/0001.yml:10 Bug: this alert rule must have a 'for' field with a minimum duration of 5m (rule/for)
10 | - alert: none

level=info msg="Problems found" Bug=3
level=fatal msg="Fatal error" error="found 1 problem(s) with severity Bug or higher"
-- rules/0001.yml --
- alert: ok
Expand All @@ -22,6 +25,8 @@ level=fatal msg="Fatal error" error="found 1 problem(s) with severity Bug or hig
- alert: 13m
expr: up == 0
for: 13m
- alert: none
expr: up == 0

-- .pint.hcl --
parser {
Expand Down
15 changes: 15 additions & 0 deletions cmd/pint/tests/0142_keep_firing_for.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
pint.ok --no-color lint rules
! stdout .
cmp stderr stderr.txt

-- stderr.txt --
level=info msg="Loading configuration file" path=.pint.hcl
-- rules/0001.yml --
- alert: Instance Is Down 1
expr: up == 0
keep_firing_for: 5m

-- .pint.hcl --
parser {
relaxed = [".*"]
}
41 changes: 41 additions & 0 deletions cmd/pint/tests/0143_keep_firing_for.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
pint.error --no-color lint rules
! stdout .
cmp stderr stderr.txt

-- stderr.txt --
level=info msg="Loading configuration file" path=.pint.hcl
rules/0001.yml:6 Bug: this alert rule must have a 'keep_firing_for' field with a minimum duration of 5m (rule/for)
6 | keep_firing_for: 3m

rules/0001.yml:9 Bug: this alert rule must have a 'keep_firing_for' field with a maximum duration of 10m (rule/for)
9 | keep_firing_for: 13m

rules/0001.yml:10 Bug: this alert rule must have a 'keep_firing_for' field with a minimum duration of 5m (rule/for)
10 | - alert: none

level=info msg="Problems found" Bug=3
level=fatal msg="Fatal error" error="found 1 problem(s) with severity Bug or higher"
-- rules/0001.yml --
- alert: ok
expr: up == 0
keep_firing_for: 5m
- alert: 3m
expr: up == 0
keep_firing_for: 3m
- alert: 13m
expr: up == 0
keep_firing_for: 13m
- alert: none
expr: up == 0

-- .pint.hcl --
parser {
relaxed = [".*"]
}
rule {
keep_firing_for {
severity = "bug"
min = "5m"
max = "10m"
}
}
10 changes: 10 additions & 0 deletions docs/changelog.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,15 @@
# Changelog

## v0.46.0

### Added

- Added support for `keep_firing_for` in alerting rules - #713.
- Added `rule/keep_firing_for` check - #713.
- Added `alerts/count` check will now estimate alerts using
`keep_firing_for` field if set - #713.
- Configuration rule `match` block supports a new filter `keep_firing_for`.

## v0.45.0

### Added
Expand Down
2 changes: 1 addition & 1 deletion docs/checks/alerts/count.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ grand_parent: Documentation
This check is used to estimate how many times given alert would fire.
It will run `expr` query from every alert rule against selected Prometheus
servers and report how many unique alerts it would generate.
If `for` is set on alerts it will be used to adjust results.
If `for` and/or `keep_firing_for` are set on alerts they will be used to adjust results.

## Configuration

Expand Down
4 changes: 2 additions & 2 deletions docs/checks/alerts/for.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@ grand_parent: Documentation

# alerts/for

This check will warn if an alert rule uses invalid `for` value
or if it passes default value that can be removed to simplify rule.
This check will warn if an alert rule uses invalid `for` or `keep_firing_for`
value or if it passes default value that can be removed to simplify rule.

## Configuration

Expand Down
47 changes: 43 additions & 4 deletions docs/checks/rule/for.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,21 +6,24 @@ grand_parent: Documentation

# rule/for

This check allows to enforce the presence of `for` field on alerting
rules.
This check allows to enforce the presence of `for` or `keep_firing_for` field
on alerting rules.
You can configure it to enforce some minimal and/or maximum duration
set on alerts via `for` field.
set on alerts via `for` and/or `keep_firing_for` fields.

## Configuration

This check doesn't have any configuration options.

## How to enable it

This check uses either `for` or `keep_firing_for` configuration
blocks, depending on which alerting rule field you want to enforce.

Syntax:

```js
for {
for|keep_firing_for {
severity = "bug|warning|info"
min = "5m"
max = "10m"
Expand All @@ -33,6 +36,42 @@ for {
- `max` - maximum allowed `for` value for matching alerting rules.
- If not set maximum `for` duration won't be enforced.

Example:

Enforce that all alerts have `for` fields of `5m` or more:

```js
for {
severity = "bug"
min = "5m"
max = "10m"
}
```

Enforce that all alerts have `keep_firing_for` fields with no more than `1h`:

```js
keep_firing_for {
severity = "bug"
max = "1h"
}
```

To enforce both at the same time:

```js
for {
severity = "bug"
min = "5m"
max = "10m"
}

keep_firing_for {
severity = "bug"
max = "1h"
}
```

## How to disable it

You can disable this check globally by adding this config block:
Expand Down
11 changes: 11 additions & 0 deletions docs/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -388,6 +388,8 @@ rule {
field present and matching provided value will be checked by this rule. Recording rules
will never match it as they don't have `for` field.
Syntax is `OP DURATION` where `OP` can be any of `=`, `!=`, `>`, `>=`, `<`, `<=`.
- `match:keep_firing_for` - optional alerting rule `keep_firing_for` filter. Works the same
way as `for` match filter.
- `ignore` - works exactly like `match` but does the opposite - any alerting or recording rule
matching all conditions defined on `ignore` will not be checked by this `rule` block.

Expand Down Expand Up @@ -433,3 +435,12 @@ rule {
[ check applied only to alerting rules with "for" field value that is >= 5m ]
}
```

```js
rule {
match {
keep_firing_for = "> 15m"
}
[ check applied only to alerting rules with "keep_firing_for" field value that is > 15m ]
}
```
10 changes: 9 additions & 1 deletion internal/checks/alerts_count.go
Original file line number Diff line number Diff line change
Expand Up @@ -89,10 +89,15 @@ func (c AlertsCheck) Check(ctx context.Context, _ string, rule parser.Rule, _ []
if rule.AlertingRule.For != nil {
forDur, _ = model.ParseDuration(rule.AlertingRule.For.Value.Value)
}
var keepFiringForDur model.Duration
if rule.AlertingRule.KeepFiringFor != nil {
keepFiringForDur, _ = model.ParseDuration(rule.AlertingRule.KeepFiringFor.Value.Value)
}

var alerts int
for _, r := range qr.Series.Ranges {
if r.End.Sub(r.Start) > time.Duration(forDur) {
// If `keepFiringFor` is not defined its Duration will be 0
if r.End.Sub(r.Start) > (time.Duration(forDur) + time.Duration(keepFiringForDur)) {
alerts++
}
}
Expand All @@ -106,6 +111,9 @@ func (c AlertsCheck) Check(ctx context.Context, _ string, rule parser.Rule, _ []
if rule.AlertingRule.For != nil {
lines = append(lines, rule.AlertingRule.For.Lines()...)
}
if rule.AlertingRule.KeepFiringFor != nil {
lines = append(lines, rule.AlertingRule.KeepFiringFor.Lines()...)
}
sort.Ints(lines)

delta := qr.Series.Until.Sub(qr.Series.From)
Expand Down
144 changes: 144 additions & 0 deletions internal/checks/alerts_count_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -645,6 +645,150 @@ func TestAlertsCountCheck(t *testing.T) {
},
},
},
{
description: "keep_firing_for: 10m",
content: "- alert: Foo Is Down\n keep_firing_for: 10m\n expr: up{job=\"foo\"} == 0\n",
checker: newAlertsCheck,
prometheus: newSimpleProm,
problems: func(uri string) []checks.Problem {
return []checks.Problem{
{
Fragment: `up{job="foo"} == 0`,
Lines: []int{2, 3},
Reporter: "alerts/count",
Text: alertsText("prom", uri, 2, "1d"),
Severity: checks.Information,
},
}
},
mocks: []*prometheusMock{
{
conds: []requestCondition{
requireRangeQueryPath,
formCond{key: "query", value: `up{job="foo"} == 0`},
},
resp: matrixResponse{
samples: []*model.SampleStream{
generateSampleStream(
map[string]string{"job": "foo"},
time.Now().Add(time.Hour*-24),
time.Now().Add(time.Hour*-24).Add(time.Minute*6),
time.Minute,
),
generateSampleStream(
map[string]string{"job": "foo"},
time.Now().Add(time.Hour*-23),
time.Now().Add(time.Hour*-23).Add(time.Minute*6),
time.Minute,
),
generateSampleStream(
map[string]string{"job": "foo"},
time.Now().Add(time.Hour*-22),
time.Now().Add(time.Hour*-22).Add(time.Minute),
time.Minute,
),
generateSampleStream(
map[string]string{"job": "foo"},
time.Now().Add(time.Hour*-21),
time.Now().Add(time.Hour*-21).Add(time.Minute*16),
time.Minute,
),
generateSampleStream(
map[string]string{"job": "foo"},
time.Now().Add(time.Hour*-20),
time.Now().Add(time.Hour*-20).Add(time.Minute*9).Add(time.Second*59),
time.Minute,
),
generateSampleStream(
map[string]string{"job": "foo"},
time.Now().Add(time.Hour*-18),
time.Now().Add(time.Hour*-18).Add(time.Hour*2),
time.Minute,
),
},
},
},
{
conds: []requestCondition{
requireRangeQueryPath,
formCond{key: "query", value: `count(up)`},
},
resp: respondWithSingleRangeVector1D(),
},
},
},
{
description: "for: 10m + keep_firing_for: 10m",
content: "- alert: Foo Is Down\n for: 10m\n keep_firing_for: 10m\n expr: up{job=\"foo\"} == 0\n",
checker: newAlertsCheck,
prometheus: newSimpleProm,
problems: func(uri string) []checks.Problem {
return []checks.Problem{
{
Fragment: `up{job="foo"} == 0`,
Lines: []int{2, 3, 4},
Reporter: "alerts/count",
Text: alertsText("prom", uri, 1, "1d"),
Severity: checks.Information,
},
}
},
mocks: []*prometheusMock{
{
conds: []requestCondition{
requireRangeQueryPath,
formCond{key: "query", value: `up{job="foo"} == 0`},
},
resp: matrixResponse{
samples: []*model.SampleStream{
generateSampleStream(
map[string]string{"job": "foo"},
time.Now().Add(time.Hour*-24),
time.Now().Add(time.Hour*-24).Add(time.Minute*6),
time.Minute,
),
generateSampleStream(
map[string]string{"job": "foo"},
time.Now().Add(time.Hour*-23),
time.Now().Add(time.Hour*-23).Add(time.Minute*6),
time.Minute,
),
generateSampleStream(
map[string]string{"job": "foo"},
time.Now().Add(time.Hour*-22),
time.Now().Add(time.Hour*-22).Add(time.Minute),
time.Minute,
),
generateSampleStream(
map[string]string{"job": "foo"},
time.Now().Add(time.Hour*-21),
time.Now().Add(time.Hour*-21).Add(time.Minute*16),
time.Minute,
),
generateSampleStream(
map[string]string{"job": "foo"},
time.Now().Add(time.Hour*-20),
time.Now().Add(time.Hour*-20).Add(time.Minute*9).Add(time.Second*59),
time.Minute,
),
generateSampleStream(
map[string]string{"job": "foo"},
time.Now().Add(time.Hour*-18),
time.Now().Add(time.Hour*-18).Add(time.Hour*2),
time.Minute,
),
},
},
},
{
conds: []requestCondition{
requireRangeQueryPath,
formCond{key: "query", value: `count(up)`},
},
resp: respondWithSingleRangeVector1D(),
},
},
},
}

runTests(t, testCases)
Expand Down
Loading
Loading