Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gather metrics and configure alerts to find out when installed CHT version is out of date #47

Open
garethbowen opened this issue May 15, 2023 · 1 comment
Labels
Type: Improvement Make something better

Comments

@garethbowen
Copy link
Member

We have an OGSM metric targeting reducing the number of instances that are running versions of the Core Framework which is no longer supported according to the CHT support matrix. It is possible that one reason people don't upgrade is they don't know that an upgrade is available. Other than loading the Admin > Upgrade page, or watching the Forum, it's not that easy to find out.

A couple of measures I can think of are...

  • Alert when a new service pack is release for the current major + minor - ie: your version has bugs, fix these by upgrading today!
  • Metric and alert when the currently installed version is no longer supported.

Both of these will need a new data ingress point, either to the market, or the docs site somehow.

@garethbowen garethbowen added the Type: Improvement Make something better label May 15, 2023
@mrjones-plip
Copy link
Contributor

This is an interesting idea - thanks for the submission @garethbowen!

tl;dr - We currently have an issue of too much alert noise right now and not enough signal. I don't think we should alert on this (yet?). Let's see how we can possibly highlight it in the short term and re-consider in the long term


We've been in the process auditing (see #35) the existing alerts (list is here, tl;dr - there's 9 currently (which count one an un-created one and one will likely be removed)). To get a better sense of how important & actionable the alerts are, we enabled all 9 alerts on the 30+ production CHT instances that Medic hosts. So far, of all the alerts we've gotten, truth be told - only two or three of them are actionable.

Also, we already have a feed of the releases we post to the forum which show up in this handy panel for every Watchdog instances:

image

Maybe this is enought?

But, short term, no new alerts. Only alerts that are going to cause outages. And only ones that are actionable. At a later date, when we have a much better signal to noise ration, we might consider adding a panel that shows both how many versions your current one is behind current, and how many versions behind yours is from not being EOL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Improvement Make something better
Projects
None yet
Development

No branches or pull requests

2 participants