Measures the resource utilization of a specific process over time.
Also measures the utilization/saturation of system-wide resources: this helps putting the process-specific metrics into context.
Built for Linux. Windows and Mac OS support might come.
For a list of the currently supported metrics see below.
- Comes with a data plotting tool separate from the data acquisition program.
- High sampling rate: the default sampling interval of
0.5 s
makes narrow spikes visible. - Values measurement correctness highly (see technical notes).
- Values interoperability: data files can be read with any HDF5 reader such as with pandas.read_hdf(), or with PyTables. See tips and tricks.
- Can monitor a program subject to process ID changes (for longevity experiments where the monitored process may occasionally restart).
- Can run indefinitely, with predictable disk space requirements (output file rotation and retention policy).
- Keeps your data organized: the time series data is written into a structured HDF5 file annotated with metadata (including program invocation time, system hostname, a custom label, the Goeffel software version, and others).
The name, Göffel, is German for spork:
Convenient, right?
The latest release can be downloaded and installed from PyPI, via pip:
$ pip install goeffel
pip can also install the latest development version:
$ pip install git+https://github.com/jgehrcke/goeffel
You can invoke Goeffel with the --pid <pid>
argument.
In this mode, goeffel
stops the measurement and terminates itself once the process with the given ID goes away. Example:
$ goeffel --pid 29019
[... snip ...]
190809-15:46:57.914 INFO: Updated HDF5 file: wrote 20 sample(s) in 0.01805 s
[... snip ...]
190809-15:56:13.842 INFO: Cannot inspect process: process no longer exists (pid=29019)
190809-15:56:13.843 INFO: Wait for producer buffer to become empty
190809-15:56:13.843 INFO: Wait for consumer process to terminate
190809-15:56:13.854 INFO: Updated HDF5 file: wrote 13 sample(s) in 0.01077 s
190809-15:56:13.856 INFO: Sample consumer process terminated
For measuring beyond the process lifetime use --pid-command <command>
.
In the following example, I use the pgrep utility for discovering a certain process (which is based on stress in this case):
$ goeffel --pid-command 'pgrep stress --newest'
[... snip ...]
190809-15:47:47.337 INFO: New process ID from PID command: 25890
[... snip ...]
190809-15:47:57.863 INFO: Updated HDF5 file: wrote 20 sample(s) in 0.01805 s
190809-15:48:06.850 INFO: Cannot inspect process: process no longer exists (pid=25890)
190809-15:48:06.859 INFO: PID command returned non-zero
[... snip ...]
190809-15:48:09.916 INFO: PID command returned non-zero
190809-15:48:10.926 INFO: New process ID from PID command: 28086
190809-15:48:12.438 INFO: Updated HDF5 file: wrote 20 sample(s) in 0.01013 s
190809-15:48:22.446 INFO: Updated HDF5 file: wrote 20 sample(s) in 0.01062 s
[... snip ...]
In this mode, goeffel
runs forever until manually terminated via SIGINT
or SIGTERM
.
Process ID changes are detected by periodically running the discovery command until it returns a valid process ID on stdout.
This is useful for longevity experiments where the monitored process occasionally restarts, for instance as of fail-over scenarios.
Note: goeffel-analysis
provides an opinionated and limited approach to visualizing data. For advanced and thorough data analysis I recommend building a custom (maybe even ad-hoc) data analysis pipeline using pandas
and matplotlib
, or using the tooling of your choice.
Also note: The command line interface provided by goeffel-analysis
,
especially for the plot commands, might change in the future. Suggestions for
improvement are welcome, of course.
Use goeffel-analysis inspect <path-to-HDF5-file>
for inspecting the contents
of a Goeffel output file. Example:
$ goeffel-analysis inspect mwst18-master1-journal_20190801_111952.hdf5
Measurement metadata:
System hostname: int-master1-mwt18.foo.bar
Invocation time (local): 20190801_111952
PID command: pgrep systemd-journal
PID: None
Sampling interval: 1.0 s
Table properties:
Number of rows: 24981
Number of columns: 38
Number of data points (rows*columns): 9.49E+05
First row's (local) time: 2019-08-01T11:19:53.613377
Last row's (local) time: 2019-08-01T18:52:49.954582
Time span: 7h 32m 56s
Column names:
unixtime
... snip ...
system_mem_inactive
The goeffel-analysis plot <path-to-hdf5-file>
command plots a pre-selected
set of metrics in an opinionated way. More metrics can be added to the plot with
the --metric <metric-name>
option. Example command:
goeffel-analysis plot \
mwst18-master2-mesosmaster_20190801_112136.hdf5 \
--metric proc_num_ip_sockets_open
This command can be used for example for comparing multiple time series.
Say you have monitored the same program across multiple replicas in a distributed system and would like to compare the time evolution of a certain metric across these replicas.
Then the goeffel-analysis flexplot
command is here to help, invoked with multiple --series
arguments:
$ goeffel-analysis flexplot \
--series mwst18-master1-journal_20190801_111952.hdf5 master1 \
--series mwst18-master2-journal_20190801_112136.hdf5 master2 \
--series mwst18-master3-journal_20190801_112141.hdf5 master3 \
--series mwst18-master4-journal_20190801_112151.hdf5 master4 \
--series mwst18-master5-journal_20190801_112157.hdf5 master5 \
--column proc_cpu_util_percent_total \
'CPU util (total) / %' \
'systemd journal CPU utilization ' 15 \
--subtitle 'MWST18, measured with Goeffel' \
--legend-loc 'upper center'
This was born out of a need for solid tooling. We started with pidstat from
sysstat, launched as
pidstat -hud -p $PID 1 1
. We found that it does not properly account for
multiple threads running in the same process and that various issues in that
regard exist in this program across various versions (see
here,
here, and
here).
The program cpustat open-sourced by Uber has a delightful README about the general measurement methodology and overall seems to be a great tool. However, it seems to be optimized for interactive usage (whereas we were looking for a robust measurement program which can be pointed at a process and then be left unattended for a significant while) and there does not seem to be a well-documented approach towards persisting the collected time series data on disk for later inspection.
The program psrecord (which effectively wraps psutil) has a similar fundamental approach as Goeffel; it however only measures few metrics, and it does not have a clear separation of concerns between persisting the data to disk, performing the measurement itself, and analyzing/plotting the data.
-
The core sampling loop does little work besides the measurement itself: it writes each sample to a queue. A separate process consumes this queue and persists the time series data to disk, for later inspection. This keeps the sampling rate predictable upon disk write latency spikes, or generally upon backpressure. This matters especially in cloud environments where we sometimes see fsync latencies of multiple seconds.
-
The sampling loop is (supposed to be, feedback welcome) built so that timing-related systematic measurement errors are minimized.
-
Goeffel tries to not asymmetrically hide measurement uncertainty. For example, you might see it measure a CPU utilization of a single-threaded process slightly larger than 100 %. That's simply the measurement error. In related tooling such as
sysstat
it seems to be common practice to asymmetrically hide measurement uncertainty by capping values when they are known to in theory not exceed a certain threshold (example). -
goeffel
must be run withroot
privileges. -
The value
-1
has a special meaning for some metrics (NaN, which cannot be represented properly in HDF5). Example: A disk write latency of-1 ms
means that no write happened in the corresponding time interval. -
The highest meaningful sampling rate is limited by the kernel's timer and bookkeeping system.
Measurand is a word! This section attempts to describe the individual data columns ("metrics"), their units, and their meaning. There are four main categories:
The timestamp corresponding to the right boundary of the sampled time interval.
-
unixtime
encodes the wall time. It is a canonical Unix timestamp (seconds since epoch, double-precision floating point number); with sub-second precision and no timezone information. This is compatible with a wide range of tooling and therefore the general-purpose timestamp column for time series analysis (also see How to convert theunixtime
column into apandas.DatetimeIndex
). Note: this is subject to system clock drift. In extreme case, this might go backward, have discontinuities, and be a useless metric. In that case, themonotime
metric helps (see below). -
isotime_local
is a human-readable version of the same timestamp as stored inunixtime
. It is a 26 character long text representation of the local time using an ISO 8601 notation (and therefore also machine-readable). Likeunixtime
this metric is subject to system clock drift and might become pretty useless in extreme cases. -
monotime
is based on a so-called monotonic clock source that is not subject to (accidental or well-intended) system clock drift. This column encodes most accurately the relative time difference between any two samples in the time series. The timestamps encoded in this column only make sense relative to each other; the difference between any two values in this column is a wall time difference in seconds, with sub-second precision.
The process ID of the monitored process. It can change if Goeffel was invoked
with the --pid-command
option.
Momentary state at sampling time.
The CPU utilization of the process in percent
.
Mean over the past sampling interval.
If the inspected process is known to contain just a single thread then this can still sometimes be larger than 100 % as of measurement errors. If the process runs more than one thread then this can go far beyond 100 %.
This is based on the sum of the time spent in user space and in kernel space.
For a more fine-grained picture the following two metrics are also available:
proc_cpu_util_percent_user
, and proc_cpu_util_percent_system
.
The ID of the CPU that this process is currently running on.
Momentary state at sampling time.
The rate of (voluntary and
involuntary) context switches in Hz
.
Mean over the past sampling interval.
The number of threads in the process.
Momentary state at sampling time.
The number of sockets currently being open. This includes IPv4 and IPv6 and does not distinguish between TCP and UDP, and the connection state also does not matter.
Momentary state at sampling time.
The number of file descriptors currently opened by this process.
Momentary state at sampling time.
The disk I/O throughput of the inspected process, in MiB/s
.
Based on Linux' /proc/<pid>/io
rchar
and wchar
. Relevant
Linux kernel documentation (emphasis mine):
rchar
: The number of bytes which this task has caused to be read from storage. This is simply the sum of bytes which this process passed to read() and pread(). It includes things like tty IO and it is unaffected by whether or not actual physical disk IO was required (the read might have been satisfied from pagecache).
wcar
: The number of bytes which this task has caused, or shall cause to be written to disk. Similar caveats apply here as with rchar.
Mean over the past sampling interval.
The rate of read/write system calls issued by the process as inferred from the
Linux /proc
file system. The relevant syscr
/syscw
counters are as of now
only documented with "read I/O operations, i.e. syscalls like read() and
pread()" and "write I/O operations, i.e. syscalls like write() and pwrite()".
Reference:
Documentation/filesystems/proc.txt
Mean over the past sampling interval.
Fraction of process resident set size
(RSS) relative to the machine's physical memory size in percent
. This is
equivalent to what top
shows in the %MEM
column.
Momentary state at sampling time.
Various memory usage metrics of the monitored process. See the psutil docs for a quick summary of what the values mean. However, note that the values need careful interpretation, as shown by discussions like this and this.
Momentary snapshot at sampling time.
Only collected if Goeffel is invoked with the --diskstats <DEV>
argument. The
resulting data column names contain the device name <DEV>
(note however that
dashes in <DEV>
get removed when building the column names).
Note that the conclusiveness of some of these disk metrics is limited. I believe that this blog post nicely covers a few basic Linux disk I/O concepts that should be known prior to read a meaning into these numbers.
This implements iostat's disk %util
metric.
I like to think of it as the ratio between the actual (wall) time elapsed in the sampled time interval, and the corresponding device's "busy time" in the very same time interval, expressed in percent. The iostat documentation describes this metric in the following words:
Percentage of elapsed time during which I/O requests were issued to the device (bandwidth utilization for the device).
This is the mean over the sampling interval.
Note: In the case of modern storage systems 100 %
utilization usually does
not mean that the device is saturated. I would like to quote Marc
Brooker:
As a measure of general IO busyness
%util
is fairly handy, but as an indication of how much the system is doing compared to what it can do, it's terrible.
This implements iostat's w_await
which is documented with
The average time (in milliseconds) for write requests issued to the device to be served. This includes the time spent by the requests in queue and the time spent servicing them.
On Linux, this is built using /proc/diskstats
documented
here.
Specifically, this uses field 8 ("number of milliseconds spent writing") and
field 5 ("number of writes completed"). Notably, the latter it is not the
merged write count but the user space write count (which seems to be what iostat
uses for calculating w_await
).
This can be a useful metric, but please be aware of its meaning and limitations.
To put this into perspective, in an experiment I have seen that the following
can happen within a second of real-time (observed via iostat -x 1 | grep xvdh
and via direct monitoring of /proc/diskstats
): 3093 userspace write requests
served, merged into 22 device write requests, yielding a total of 120914
milliseconds "spent writing", resulting in a mean write latency of 25 ms. But
what do these 25 ms really mean here? On average, humans have less than two
legs, for sure. The current implementation method reproduces iostat output,
which was the initial goal. Suggestions for improvement are very welcome.
This is the mean over the sampling interval.
The same considerations hold true for r_await
, correspondingly.
The merged read and write request rate.
The Linux kernel attempts to merge individual user space requests before passing them to the storage hardware. For non-random I/O patterns this greatly reduces the rate of individual reads and writes issued to disk.
Built using fields 2 and 6 in /proc/diskstats
documented
here.
This is the mean over the sampling interval.
The read and write request rate issued from user space point of view (before merges).
Built using fields 1 and 5 in /proc/diskstats
documented
here.
This is the mean over the sampling interval.
system_loadavg1
system_loadavg5
system_loadavg15
system_mem_available
system_mem_total
system_mem_used
system_mem_free
system_mem_shared
system_mem_buffers
system_mem_cached
system_mem_active
system_mem_inactive
I recommend to de-serialize and re-serialize using pandas. Example one-liner:
python -c 'import sys; import pandas as pd; df = pd.read_hdf(sys.argv[1], key="goeffel_timeseries"); df.to_csv(sys.argv[2], index=False)' goeffel_20190718_213115.hdf5.0001 /tmp/hdf5-as-csv.csv
Note that this significantly inflates the file size (e.g., from 50 MiB to 300 MiB).
At some point, you might feel inclined to poke around in an HDF5 file created by
Goeffel or to do custom data inspection/processing. In that case, I recommend
using one of the various available open-source HDF5 tools for managing and
viewing HDF5 files. One GUI tool I have frequently used is
ViTables. Install it with pip install vitables
and
then do e.g.
vitables goeffel_20190718_213115.hdf5
This opens a GUI which allows for browsing the tabular time series data, for viewing the metadata in the file, for exporting data as CSV, for querying the data, and various other things.
I recommend to start an IPython REPL:
pip install ipython # if you have not done so yet
ipython
Load the HDF5 file into a pandas
data frame:
In [1]: import pandas as pd
In [2]: df = pd.read_hdf('goeffel_timeseries__20190806_213704.hdf5', key='goeffel_timeseries')
From here you can do anything.
For example, let's have a look at the mean value of the actual sampling interval used in this specific Goeffel time series:
In [3]: df['unixtime'].diff().mean()
Out[3]: 0.5003192476604296
Or, let's see how many threads the monitored process used at most during the entire observation period:
In [4]: df['proc_num_threads'].max()
Out[4]: 1
The HDF5 file contains a unixtime
column which contains canonical Unix
timestamp data ready to be consumed by a plethora of tools. If you are like me
and like to use pandas
then it is good to know how to convert this into a
native pandas.DateTimeIndex
:
In [1]: import pandas as pd
In [2]: df = pd.read_hdf('goeffel_timeseries__20190807_174333.hdf5', key='goeffel_timeseries')
# Now the data frame has an integer index.
In [3]: type(df.index)
Out[3]: pandas.core.indexes.numeric.Int64Index
# Parse unixtime column.
In [4]: timestamps = pd.to_datetime(df['unixtime'], unit='s')
# Replace the index of the data frame.
In [5]: df.index = timestamps
# Now the data frame has a DatetimeIndex.
In [6]: type(df.index)
Out[6]: pandas.core.indexes.datetimes.DatetimeIndex
# Let's look at some values.
In [7]: df.index[:5]
Out[7]:
DatetimeIndex(['2019-08-07 15:43:33.798929930',
'2019-08-07 15:43:34.300590992',
'2019-08-07 15:43:34.801260948',
'2019-08-07 15:43:35.301798105',
'2019-08-07 15:43:35.802226067'],
dtype='datetime64[ns]', name='unixtime', freq=None)
External references on the subject matter that I found useful during development.
About system performance measurement, and kernel time bookkeeping:
- http://www.brendangregg.com/usemethod.html
- https://www.vividcortex.com/blog/monitoring-and-observability-with-use-and-red
- https://github.com/uber-common/cpustat/blob/master/README.md
- https://elinux.org/Kernel_Timer_Systems
- https://github.com/Leo-G/DevopsWiki/wiki/How-Linux-CPU-Usage-Time-and-Percentage-is-calculated
About disk I/O statistics:
- https://www.xaprb.com/blog/2010/01/09/how-linux-iostat-computes-its-results/
- https://www.kernel.org/doc/Documentation/iostats.txt
- https://blog.serverfault.com/2010/07/06/777852755/ (interpreting iostat output)
- https://unix.stackexchange.com/a/462732 (What are merged writes?)
- https://stackoverflow.com/a/8512978 (what is
%util
in iostat?) - https://brooker.co.za/blog/2014/07/04/iostat-pct.html
- https://coderwall.com/p/utc42q/understanding-iostat
- https://www.percona.com/doc/percona-toolkit/LATEST/pt-diskstats.html
Others:
- https://serverfault.com/a/85481/121951 (about system memory statistics)
Musings about HDF5:
- https://cyrille.rossant.net/moving-away-hdf5/
- http://hdf-forum.184993.n3.nabble.com/File-corruption-and-hdf5-design-considerations-td4025305.html
- https://pytables-users.narkive.com/QH2WlyqN/corrupt-hdf5-files
- https://www.hdfgroup.org/2015/05/whats-coming-in-the-hdf5-1-10-0-release/
- https://stackoverflow.com/q/35837243/145400