Skip to content

Commit

Permalink
mixin: Exclude cache "add" operations from alerting (#9658)
Browse files Browse the repository at this point in the history
* mixin: Exclude cache "add" operations from alerting

Exclude alerts from firing about cache "add" operations failing since
this is expected during normal operation.

Related #9386

Signed-off-by: Nick Pillitteri <[email protected]>

* Build helm tests

Signed-off-by: Nick Pillitteri <[email protected]>

---------

Signed-off-by: Nick Pillitteri <[email protected]>
  • Loading branch information
56quarters authored Oct 17, 2024
1 parent a98e096 commit 16f7f5f
Show file tree
Hide file tree
Showing 5 changed files with 19 additions and 16 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@
* [ENHANCEMENT] Unify ingester autoscaling panels on 'Mimir / Writes' dashboard to work for both ingest-storage and non-ingest-storage autoscaling. #9617
* [BUGFIX] Dashboards: Fix autoscaling metrics joins when series churn. #9412 #9450 #9432
* [BUGFIX] Alerts: Fix autoscaling metrics joins in `MimirAutoscalerNotActive` when series churn. #9412
* [BUGFIX] Alerts: Exclude failed cache "add" operations from alerting since failures are expected in normal operation. #9658

### Jsonnet

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -119,15 +119,15 @@ spec:
expr: |
(
sum by(cluster, namespace, name, operation) (
rate(thanos_memcached_operation_failures_total[1m])
rate(thanos_memcached_operation_failures_total{operation!="add"}[1m])
or
rate(thanos_cache_operation_failures_total[1m])
rate(thanos_cache_operation_failures_total{operation!="add"}[1m])
)
/
sum by(cluster, namespace, name, operation) (
rate(thanos_memcached_operations_total[1m])
rate(thanos_memcached_operations_total{operation!="add"}[1m])
or
rate(thanos_cache_operations_total[1m])
rate(thanos_cache_operations_total{operation!="add"}[1m])
)
) * 100 > 5
for: 5m
Expand Down
8 changes: 4 additions & 4 deletions operations/mimir-mixin-compiled-baremetal/alerts.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -107,15 +107,15 @@ groups:
expr: |
(
sum by(cluster, namespace, name, operation) (
rate(thanos_memcached_operation_failures_total[1m])
rate(thanos_memcached_operation_failures_total{operation!="add"}[1m])
or
rate(thanos_cache_operation_failures_total[1m])
rate(thanos_cache_operation_failures_total{operation!="add"}[1m])
)
/
sum by(cluster, namespace, name, operation) (
rate(thanos_memcached_operations_total[1m])
rate(thanos_memcached_operations_total{operation!="add"}[1m])
or
rate(thanos_cache_operations_total[1m])
rate(thanos_cache_operations_total{operation!="add"}[1m])
)
) * 100 > 5
for: 5m
Expand Down
8 changes: 4 additions & 4 deletions operations/mimir-mixin-compiled/alerts.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -107,15 +107,15 @@ groups:
expr: |
(
sum by(cluster, namespace, name, operation) (
rate(thanos_memcached_operation_failures_total[1m])
rate(thanos_memcached_operation_failures_total{operation!="add"}[1m])
or
rate(thanos_cache_operation_failures_total[1m])
rate(thanos_cache_operation_failures_total{operation!="add"}[1m])
)
/
sum by(cluster, namespace, name, operation) (
rate(thanos_memcached_operations_total[1m])
rate(thanos_memcached_operations_total{operation!="add"}[1m])
or
rate(thanos_cache_operations_total[1m])
rate(thanos_cache_operations_total{operation!="add"}[1m])
)
) * 100 > 5
for: 5m
Expand Down
10 changes: 6 additions & 4 deletions operations/mimir-mixin/alerts/alerts.libsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -202,18 +202,20 @@ local utils = import 'mixin-utils/utils.libsonnet';
},
{
alert: $.alertName('CacheRequestErrors'),
// Specifically exclude "add" operations which are used for cache invalidation and "locking" since
// they are expected to sometimes fail in normal operation (such as when a "lock" already exists).
expr: |||
(
sum by(%(group_by)s, name, operation) (
rate(thanos_memcached_operation_failures_total[%(range_interval)s])
rate(thanos_memcached_operation_failures_total{operation!="add"}[%(range_interval)s])
or
rate(thanos_cache_operation_failures_total[%(range_interval)s])
rate(thanos_cache_operation_failures_total{operation!="add"}[%(range_interval)s])
)
/
sum by(%(group_by)s, name, operation) (
rate(thanos_memcached_operations_total[%(range_interval)s])
rate(thanos_memcached_operations_total{operation!="add"}[%(range_interval)s])
or
rate(thanos_cache_operations_total[%(range_interval)s])
rate(thanos_cache_operations_total{operation!="add"}[%(range_interval)s])
)
) * 100 > 5
||| % {
Expand Down

0 comments on commit 16f7f5f

Please sign in to comment.