Error reporting backchannel + auto-monitoring #585
Labels
component:infra
Shell scripts, cron scripts, web server, etc
task:enhancement
New feature or request
use-case
It appears that mail is not set up on most compute nodes and so the MAILTO in crontab won't work (manifestly does not work on Fox). It's not clear to me what will happen other than logging if something goes wrong when sonar is run by systemd either.
This is sort of a big deal - we need some type of auto-monitoring for the system, there are too many things that can go wrong, witness what happened when the nvidia-smi format changed.
I think we need two things:
The text was updated successfully, but these errors were encountered: