-
Notifications
You must be signed in to change notification settings - Fork 15
Service monitoring
Fernando Barreiro edited this page Mar 28, 2019
·
7 revisions
Harvester service metrics can be pushed out. It requires psutil >= 5.4.8
[service_monitor]
active = True
disk_volumes = data,data1
pidfile = /var/log/harvester/panda_harvester.pid
- disk_volumes is optional, and supports a comma separated list of volumes
- pidfile is only mandatory when using uwsgi
Once harvester is pushing out service metrics, you need to configure the thresholds and alerts (https://github.com/PanDAWMS/harvester_service_monitoring). The completed xml file will have to be added to the configuration directory:
<?xml version="1.0"?>
<instances>
<instance harvesterid="YOUR HARVESTER ID" instanceisenable="True">
<hostlist>
<host hostname="THE HOST RUNNING HARVESTER" hostisenable="True">
<contacts>
<email>WHO TO NOTIFY 1</email>
<email>WHO TO NOTIFY 2</email>
</contacts>
<metrics>
<metric name="lastsubmittedworker" enable="True">
<value>30</value>
</metric>
<metric name="lastheartbeat" enable="True">
<value>30</value>
</metric>
<metric name="memory" enable="True">
<memory_warning>50</memory_warning>
<memory_critical>80</memory_critical>
</metric>
<metric name="cpu" enable="True">
<cpu_warning>50</cpu_warning>
<cpu_critical>80</cpu_critical>
</metric>
<metric name="disk" enable="True">
<disk_warning>75</disk_warning>
<disk_critical>80</disk_critical>
</metric>
</metrics>
</host>
... YOU CAN ADD MULTIPLE HOSTS
</hostlist>
</instance>
</instances>
- lastsubmittedworker and lastheartbeat examples: 30 (minutes), 60d... (you can disable the metric in cases where you don't expect regular worker submission)
- disk_warning/critical, cpu_warning/critical, memory_warning/critical: 50 (expressed in %)
Getting started |
---|
Installation and configuration |
Testing and running |
Debugging |
Work with Middleware |
Admin FAQ |
Development guides |
---|
Development workflow |
Tagging |
Production & commissioning |
---|
Condor experiences |
Commissioning on the grid |
Production servers |
Service monitoring |
Auto Queue Configuration with AGIS |
GCE setup |
Kubernetes setup |
SSH+RPC middleware setup |