WIP [main] Implement alerting #769

jbiers · 2025-05-22T19:56:33Z

Issue: #672

charts/rancher-backup/values.yaml

diogoasouza · 2025-05-22T20:43:18Z

charts/rancher-backup/templates/alerting-rules.yaml

+  namespace: {{ .Release.Namespace }}
+spec:
+  groups:
+    - name: backup-restore


No need to support interval or limit?

mallardduck · 2025-05-22T20:45:29Z

charts/rancher-backup/values.yaml

+    ## Define custom alerting rules here.
+    ## The "BackupFailed" alert is included by default when .Values.monitoring.alerts.enabled is set to true and rancher-monitoring is installed.
+    alertingRules:
+    - alert: BackupFailed


Nice.

Only thing I'm wondering about is, "are there any alerts that BRO should provide by default that cannot be disabled". In other words, are there any features of this new monitoring/alerting area of BRO that (when enabled) are required? I guess this thought goes just as much for these alerting rules as it does for any metrics collection things too - since I could imagine this factor relates more to dashboards than alerts TBH. 🤔

charts/rancher-backup/values.yaml

alexandreLamarre · 2025-05-22T21:21:42Z

charts/rancher-backup/values.yaml

+
+    ## Define custom alerting rules here.
+    ## The "BackupFailed" alert is included by default when .Values.monitoring.alerts.enabled is set to true and rancher-monitoring is installed.
+    alertingRules:


I would also suggest putting this default alert rule definition in helm template in something like : templates/default-alerting-rules.yaml behind a template flag like .Values.useDefaultAlerts or something and the current way it works could be behind something like .Values.additionalRules

Edit: if we do put the default alerting rules behind that flag, make sure to pass in some form : .defaultAlerts.AdditionalLabels to the alerting rule, since AlertManager uses label matchers to send stuff to specific remote integrations, so users will want to pass in their own labels

alexandreLamarre · 2025-05-27T19:27:52Z

pkg/monitoring/metrics.go

Let's cleanup some metric name best practices :

https://prometheus.io/docs/concepts/escaping_schemes/

https://prometheus.io/docs/practices/naming/

https://prometheus.io/docs/practices/instrumentation/

Some best practices for rules/alerting:

https://prometheus.io/docs/practices/alerting/

https://prometheus.io/docs/practices/rules/

Initial alerting implementation

7b18cbe

jbiers requested a review from a team as a code owner May 22, 2025 19:56

diogoasouza reviewed May 22, 2025

View reviewed changes

mallardduck reviewed May 22, 2025

View reviewed changes

alexandreLamarre requested changes May 22, 2025

View reviewed changes

charts/rancher-backup/values.yaml Outdated Show resolved Hide resolved

charts/rancher-backup/values.yaml Outdated Show resolved Hide resolved

alexandreLamarre reviewed May 22, 2025

View reviewed changes

jbiers changed the title ~~[main] Implement alerting~~ WIP [main] Implement alerting May 26, 2025

jbiers marked this pull request as draft May 26, 2025 19:17

jbiers force-pushed the bro-alerting branch 4 times, most recently from cbc9ab3 to 309f039 Compare May 26, 2025 19:53

alexandreLamarre reviewed May 27, 2025

View reviewed changes

jbiers force-pushed the bro-alerting branch 2 times, most recently from cd097df to 4294661 Compare May 28, 2025 15:39

Renaming metrics

27a9f27

jbiers force-pushed the bro-alerting branch from 4294661 to 27a9f27 Compare May 28, 2025 15:44

Improvements to alerting template

0c6dd21

jbiers force-pushed the bro-alerting branch from 652b2bf to 0c6dd21 Compare May 28, 2025 19:51

jbiers requested a review from alexandreLamarre May 28, 2025 19:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

WIP [main] Implement alerting #769

WIP [main] Implement alerting #769

Uh oh!

jbiers commented May 22, 2025

Uh oh!

Uh oh!

diogoasouza May 22, 2025

Uh oh!

mallardduck May 22, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

alexandreLamarre May 22, 2025 •

edited

Loading

Uh oh!

alexandreLamarre May 27, 2025 •

edited

Loading

Uh oh!

Uh oh!

WIP [main] Implement alerting #769

Are you sure you want to change the base?

WIP [main] Implement alerting #769

Uh oh!

Conversation

jbiers commented May 22, 2025

Uh oh!

Uh oh!

diogoasouza May 22, 2025

Choose a reason for hiding this comment

Uh oh!

mallardduck May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

alexandreLamarre May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alexandreLamarre May 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mallardduck May 22, 2025 •

edited

Loading

alexandreLamarre May 22, 2025 •

edited

Loading

alexandreLamarre May 27, 2025 •

edited

Loading