chore: adding alert manager #46

gabrielmer · 2024-04-29T13:16:54Z

Adding alert manager service in order to eventually send alarms to Discord when defined conditions are met.

Configured 2 conditions already used in fleets: high nim memory usage and drop in libp2p peers.

alrevuelta · 2024-04-30T09:06:46Z

Unsure by now if we want to add more complexity to waku-simulator. Perhaps if we want alerts (which Im note sure we need them) we can have them separate from waku-simulator.

gabrielmer · 2024-04-30T11:03:01Z

Unsure by now if we want to add more complexity to waku-simulator. Perhaps if we want alerts (which Im note sure we need them) we can have them separate from waku-simulator.

So when we created waku-org/nwaku#2342 the idea we agreed on was to add alerts, because there was an issue that could have been caught looking at the simulator and we didn't because of not properly keeping track of it.

We can avoid adding complexity, but I guess that we need to find another solution to keep track of its performance. Otherwise there's not much point to the simulator's deployment I guess.

lmk what do you think, if you have any other ideas of how to track the simulator's performance, or if simply it's not the right time and I can leave the PR open until it becomes appropriate.

alrevuelta · 2024-04-30T14:05:11Z

By now I would block it until we have the waku-simulator stabilized with onchain rln etc. Then we can reassess.

One thought from the top of my head is that alerts are useful when something has to be taken care of immediately. In the case of waku-simulator, well, immediate action is not needed. So a simpler approach where we look at the dashboard daily might be enough. This also forces us to interpret the data, and having a human in the loop is useful since some things are difficult to express with just a threshold + alert.

gabrielmer · 2024-04-30T14:31:50Z

By now I would block it until we have the waku-simulator stabilized with onchain rln etc. Then we can reassess.

Sounds good, so let's block this for now then :)

One thought from the top of my head is that alerts are useful when something has to be taken care of immediately. In the case of waku-simulator, well, immediate action is not needed. So a simpler approach where we look at the dashboard daily might be enough. This also forces us to interpret the data, and having a human in the loop is useful since some things are difficult to express with just a threshold + alert.

Agree, so if we make sure that there will be a human regularly analyzing it then alerts are not needed. Alerts in this case was simply a notification system because the simulator is not being regularly tracked in an organized way.

We can figure out how to do it efficiently once it's stabilized and ready to be monitored

gabrielmer added 6 commits April 24, 2024 14:25

adding alert manager service

8551bff

Having first alert working

722c58a

adding libp2p_peers drop alert

8a0c708

improving alertmanager-config

83ae900

adding to do comment

60c9f35

setting up discord receiver

fc02470

adding support for discord webhook in env

59fce1f

gabrielmer mentioned this pull request Apr 30, 2024

chore: review waku-simulator deployment and improve tracking processes waku-org/nwaku#2342

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: adding alert manager #46

chore: adding alert manager #46

gabrielmer commented Apr 29, 2024 •

edited

Loading

alrevuelta commented Apr 30, 2024

gabrielmer commented Apr 30, 2024

alrevuelta commented Apr 30, 2024

gabrielmer commented Apr 30, 2024

chore: adding alert manager #46

Are you sure you want to change the base?

chore: adding alert manager #46

Conversation

gabrielmer commented Apr 29, 2024 • edited Loading

alrevuelta commented Apr 30, 2024

gabrielmer commented Apr 30, 2024

alrevuelta commented Apr 30, 2024

gabrielmer commented Apr 30, 2024

gabrielmer commented Apr 29, 2024 •

edited

Loading