Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: adding alert manager #46

Draft
wants to merge 7 commits into
base: master
Choose a base branch
from
Draft

Conversation

gabrielmer
Copy link
Contributor

@gabrielmer gabrielmer commented Apr 29, 2024

Adding alert manager service in order to eventually send alarms to Discord when defined conditions are met.

Configured 2 conditions already used in fleets: high nim memory usage and drop in libp2p peers.

@alrevuelta
Copy link
Collaborator

Unsure by now if we want to add more complexity to waku-simulator. Perhaps if we want alerts (which Im note sure we need them) we can have them separate from waku-simulator.

@gabrielmer
Copy link
Contributor Author

Unsure by now if we want to add more complexity to waku-simulator. Perhaps if we want alerts (which Im note sure we need them) we can have them separate from waku-simulator.

So when we created waku-org/nwaku#2342 the idea we agreed on was to add alerts, because there was an issue that could have been caught looking at the simulator and we didn't because of not properly keeping track of it.

We can avoid adding complexity, but I guess that we need to find another solution to keep track of its performance. Otherwise there's not much point to the simulator's deployment I guess.

lmk what do you think, if you have any other ideas of how to track the simulator's performance, or if simply it's not the right time and I can leave the PR open until it becomes appropriate.

@alrevuelta
Copy link
Collaborator

By now I would block it until we have the waku-simulator stabilized with onchain rln etc. Then we can reassess.

One thought from the top of my head is that alerts are useful when something has to be taken care of immediately. In the case of waku-simulator, well, immediate action is not needed. So a simpler approach where we look at the dashboard daily might be enough. This also forces us to interpret the data, and having a human in the loop is useful since some things are difficult to express with just a threshold + alert.

@gabrielmer
Copy link
Contributor Author

By now I would block it until we have the waku-simulator stabilized with onchain rln etc. Then we can reassess.

Sounds good, so let's block this for now then :)

One thought from the top of my head is that alerts are useful when something has to be taken care of immediately. In the case of waku-simulator, well, immediate action is not needed. So a simpler approach where we look at the dashboard daily might be enough. This also forces us to interpret the data, and having a human in the loop is useful since some things are difficult to express with just a threshold + alert.

Agree, so if we make sure that there will be a human regularly analyzing it then alerts are not needed. Alerts in this case was simply a notification system because the simulator is not being regularly tracked in an organized way.

We can figure out how to do it efficiently once it's stabilized and ready to be monitored

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants