Skip to content

Commit

Permalink
Merge pull request #42 from qonto/add-80-percent-disk-space-warning
Browse files Browse the repository at this point in the history
feat(RDS): create RDSLowDiskSpaceCount to display instances with less than 80% disk space
  • Loading branch information
qfritz authored Oct 7, 2024
2 parents c23b3fd + 00ccb5b commit 03272c2
Show file tree
Hide file tree
Showing 3 changed files with 58 additions and 0 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
rule_files:
- rules.yml

evaluation_interval: 1m

tests:

- name: RDSLowDiskSpaceCount
interval: 1m
input_series:
- series: 'rds_free_storage_bytes{aws_account_id="111111111111",aws_region="eu-west-3",dbidentifier="db1"}'
values: '3221225472x10' # 3GB
- series: 'rds_allocated_storage_bytes{aws_account_id="111111111111",aws_region="eu-west-3",dbidentifier="db1"}'
values: '21474836480x15' # 20GB
alert_rule_test:
- alertname: RDSLowDiskSpaceCount
eval_time: 15m
exp_alerts:
- exp_labels:
aws_account_id: 111111111111
aws_region: eu-west-3
severity: warning
exp_annotations:
description: "One or more RDS instances has <20% free disk space"
summary: "Less than 20% free disk space on at least one instance"
runbook_url: "https://qonto.github.io/database-monitoring-framework/0.0.0/runbooks/rds/RDSLowDiskSpaceCount"
9 changes: 9 additions & 0 deletions charts/prometheus-rds-alerts/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,15 @@ rules:
summary: "{{ $labels.instance }} is reporting errors"
description: "{{ $labels.instance }} is reporting {{ $value }} errors per minute"

RDSLowDiskSpaceCount:
expr: count(10 < max by (aws_account_id, aws_region, dbidentifier) (rds_free_storage_bytes{} * 100 / rds_allocated_storage_bytes{}) < 20) by (aws_account_id,aws_region) >= 1
for: 15m
labels:
severity: warning
annotations:
summary: "Less than 20% free disk space on at least one instance"
description: 'One or more RDS instances has <20% free disk space'

RDSDiskSpaceLimit:
expr: max by (aws_account_id, aws_region, dbidentifier) (rds_free_storage_bytes{} * 100 / rds_allocated_storage_bytes{}) < 10
for: 15m
Expand Down
23 changes: 23 additions & 0 deletions content/runbooks/rds/RDSLowDiskSpaceCount.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
---
title: Free disk space is under 20 percent for at least one instance
---

# RDSLowDiskSpaceCount

## Meaning

Alert is triggered when at least one RDS instance is under the threshold on storage left.

## Impact

The PostgreSQL instance(s) might stop to prevent data corruption if no more disk space is available.

## Diagnosis

1. Find affected instance list in prometheus with:

```promql
max by (aws_account_id, aws_region, dbidentifier) (rds_free_storage_bytes{} * 100 / rds_allocated_storage_bytes{}) < 20
```

1. Refer to [RDSDiskSpaceLimit](RDSDiskSpaceLimit.md) for each of them as it's the same alert just ringing a bit earlier.

0 comments on commit 03272c2

Please sign in to comment.