-
Notifications
You must be signed in to change notification settings - Fork 136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Opsgenie Alias for deduplication of alerts #460
Comments
This would be nice to have. Would also be great to have the alerts solve themself automatically once they are not happening anymore. |
It would generally be nice to be able to customize the fields content. If you have a large OpsGenie/JSM instance where alerts from multiple systems are processed, you want to have some more info than e.g. "Kustomization/somecomponent" in the title of the alert as this is no very specific. |
event.Metadata gets injected in the payload sent to OpsGenie so you can add cluster name, region, etc. Can't this be used for deduplication? |
In JSM I see this in the created alert: which according to the Jira API documentation matches this field: And
With some Jira automation rules deduplication and title manipulation could work (need to check with some admin there on our side). Customization of the fields on Flux side would be a bit easier in my eyes, but see it as a feature request 😃. |
For Opsgenie i just set the alias to the description, works out most of the time, as long as the description does not contain a time string that is always different. What would be great however would be a message that could also close the alert. So like a "recovery" message. |
Flux is stateless, there is no way to send recovery messages as notification-controller doesn't know it has send a previous error alert. |
That's also possible. So, for me the As the |
@stefanprodan True, guess the only way that would work would be to send messages for every run that was ok, which would be a little noisy. |
@al-lac we do send a success event only once, when the recovery happens, see https://github.com/fluxcd/kustomize-controller/blob/e9f5628eccbfbc722a7637ecbf7f66580e2e4416/internal/controller/kustomization_controller.go#L910-L914 |
@PatrickZeier-SAG how did you manage to enrich it with the cluster name? Did you just add more information to @stefanprodan i guess i would need to set the |
@al-lac Exactly.
And this I can then access like described above with |
@PatrickZeier-SAG thanks that works perfectly! Now i just need to find a way to differentiate between errors, infos and recovery messages 😁 |
@al-lac I would be happy to read about your solution if you find something 😃 . Especially the recovery message (I did not yet get that out of the code Stefan linked). Idea for differentiation between severity types: You could add one Flux alert per severity but with different value in the |
@PatrickZeier-SAG ah yeah that is one way of handling this. Thanks for the tip! Yeah me neither, i don't see a way on how a recovery message is different from the rest. Maybe @stefanprodan can elaborate further. |
For the same revision, Flux will emit a single info event and not spam. If let's say for some new Git commit the health check fails, if it recovers you get 2 events error and info. |
Ok, i thought of doing it the way like @PatrickZeier-SAG suggested it. So to have two alerts for info and error. But as the info also contains the error part i cannot use it to close the alert as they would always get in the way of each other. @stefanprodan ok that is good to know. But how will i be able to differentiate between error and info if this info does not get sent to the provider? If i would have the error level (info / error), i could just match on the revision and resolve the alert once a new info message comes in with the same resource id. So i would set the following as an alias on OpsGenie: -main@ But without the information if it is an error or info i cannot do the closing :-( |
Feel free to open a PR, all you need is adding |
So with the changes from #796 i managed to set the However, i seem to not get enough ---
apiVersion: notification.toolkit.fluxcd.io/v1beta3
kind: Alert
metadata:
name: gitops-notifications-opsgenie
namespace: flux-system
spec:
summary: Alert from flux for cluster a
providerRef:
name: opsgenie
eventSeverity: info
eventSources:
- kind: GitRepository
name: '*'
namespace: cluster-a
- kind: Kustomization
name: '*'
namespace: cluster-a
- kind: HelmRelease
name: '*'
namespace: cluster-a Should i not also get an alert then for every I let one kustomization fail and repaired it again, but i never got any recovery message or info message in Opsgenie... The only thing i see in the Opsgenie log is this alert coming in every time a sync runs: |
We currently utilize Opsgenie for paging and found the integration with flux works pretty well. The main issue of contention is missing the alias for deduplication. Currently when an alert is triggered it will continuous fire/create new pages. Ideally we can set an alias and fire once and let Opsgenie handle additional notification/triggers.
notification-controller/internal/notifier/opsgenie.go
Line 68 in 505345c
Opsgenie API docs
The text was updated successfully, but these errors were encountered: