Skip to content

Commit

Permalink
Changes
Browse files Browse the repository at this point in the history
  • Loading branch information
incaseoftrouble committed May 27, 2024
1 parent 9e89fd8 commit 3833b15
Show file tree
Hide file tree
Showing 2 changed files with 69 additions and 48 deletions.
9 changes: 5 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,6 @@ SPDX-License-Identifier: Apache-2.0
[![PyPI version](https://img.shields.io/pypi/v/BenchExec.svg)](https://pypi.python.org/pypi/BenchExec)
[![DOI](https://zenodo.org/badge/30758422.svg)](https://zenodo.org/badge/latestdoi/30758422)

> [!NOTE]
> To get started with reliably benchmarking right away, follow the
> [quickstart guide](doc/quickstart.md).
**News and Updates**:
- Two projects accepted for BenchExec as part of [Google Summer of Code](https://summerofcode.withgoogle.com/)!
We are happy that [Haoran Yang](https://summerofcode.withgoogle.com/programs/2024/projects/UzhlnEel)
Expand All @@ -33,6 +29,11 @@ SPDX-License-Identifier: Apache-2.0
you can read [Reliable Benchmarking: Requirements and Solutions](https://doi.org/10.1007/s10009-017-0469-y) online.
We also provide a set of [overview slides](https://www.sosy-lab.org/research/prs/Latest_ReliableBenchmarking.pdf).

> To help new or inexperienced users get started with reliable benchmarking
> right away, we offer a [quickstart guide](doc/quickstart.md) that contains
> a brief explanation of the issues of "standard" setups as well as the (few)
> steps necessary to setup and use BenchExec instead.
BenchExec is a framework for reliable benchmarking and resource measurement
and provides a standalone solution for benchmarking
that takes care of important low-level details for accurate, precise, and reproducible measurements
Expand Down
108 changes: 64 additions & 44 deletions doc/quickstart.md
Original file line number Diff line number Diff line change
@@ -1,40 +1,48 @@
# A Quickstart Guide to Proper Benchmarking with BenchExec
# A Beginner's Guide to Reliable Benchmarking

This guide provides a brief summary of instructions to set up reliable
benchmark measurements using BenchExec and important points to consider. It is
meant for users who either want to set up benchmarking from scratch or already
have a simple setup using, e.g., `time`, `timeout`, `taskset`, `ulimit`, etc.
and will guide you how to use `runexec` as a simple but much more reliable
"drop-in" replacement for these tools.

## Guiding Example

> [!IMPORTANT]
> If your current setup looks similar to the below example (or you are thinking
> about such a setup), we strongly recommend following this guide for a much
> more reliable process.
As an example, suppose that you want to measure the performance of your
tool `program` with arguments `--foo` and `--bar` on the input files
`input_1.in` to `input_9.in`. To measure the runtime of the tool, one may run
```
$ /usr/bin/time program --foo input_1.in
$ /usr/bin/time program --bar input_1.in
$ /usr/bin/time program --foo input_2.in
...
```
etc. and note the results. In case resource limitations are desired (e.g.
limiting to 1 CPU and 60 sec of wallclock time), the calls might be
```
$ taskset -c 0 timeout 60s /usr/bin/time program ...
```
or similar.
## Audience

## Benchmarking with BenchExec

The following steps guide you to increase the reliability and quality of
measurements drastically by using BenchExec instead of these small
utilities.
This guide provides a brief summary of instructions to set up reliable
benchmark measurements using BenchExec and important points to consider. It is
meant for users who either want to set up benchmarking for a small number of
runs from scratch or already have a simple setup using, e.g., `time`,
`timeout`, `taskset`, `ulimit`, etc. Concretely, this guide will show you how
to use `runexec` as a simple but much more reliable "drop-in" replacement for
these tools. If you want to benchmark large number of runs or get the most out
of what BenchExec provides as a benchmarking framework, consider using the tool
`benchexec` instead (further details below).

## Why Should I use BenchExec?

As a simple example, suppose that you want to measure the performance of your
newly implemented tool `program` with arguments `--foo` and `--bar` on the
input files `input_1.in` to `input_9.in`. To measure the runtime of the tool,
you might run `$ /usr/bin/time program --foo input_1.in` etc. and note the
results. In case resource limitations are desired (e.g. limiting to 1 CPU and
60 sec of wallclock time), the calls might be
`$ taskset -c 0 timeout 60s /usr/bin/time program ...` or similar.

While useful, these utilities (i.e. `time`, `ulimit`, etc.) unfortunately are
not suitable for reliable benchmarking, especially when parallelism or
sub-processes are involved and may give you *completely wrong* results.
BenchExec takes care of most of the problems for you, which is why we recommend
using it instead. By following this guide, you thus significantly increase the
reliability of your results without much effort.

For further details and insights into peculiarities and pitfalls of reliable
benchmarking (as well as how BenchExec is mitigating them where possible), we
recommend the
[overview slides](https://www.sosy-lab.org/research/prs/Latest_ReliableBenchmarking.pdf)
and [the corresponding paper](https://doi.org/10.1007/s10009-017-0469-y).

## Reliable Benchmarking with BenchExec

The following steps show you how to increase the reliability and quality of
measurements by using BenchExec instead of the standard system utilities.

### Step 1. Install BenchExec

Expand All @@ -60,24 +68,36 @@ think about which executions you want to measure, what resource limits should
be placed on the benchmarked tool(s), such as CPU time, CPU count, memory, etc.
Also consider how timeouts should be treated.

For more complicated setups, please also refer to the
[benchmarking setup guide](benchmarking.md). For example, in case you want to
execute multiple benchmarks in parallel, think about how to deal with shared
resources (e.g. the memory bus and CPU cache). For such cases, we recommend
using `benchexec` instead, which takes care of managing parallel invocations.

Independently of using BenchExec, we strongly recommend the following the
guidelines of the [benchmarking guide](benchmarking.md).

### Step 3. Gather Measurements using runexec

Using the example from above, suppose that we want to limit the process to 60s
wall time, 1 GB of memory, and cpu core 0. Then, simply run
Using the example from above, suppose that we want to measure `program` on
input `input_1.in`. Then, simply run
```
$ runexec --quiet --walltimelimit 60s --memlimit 1GB --cores 0 --output output_1_foo.log -- \
program --foo input_1.in
$ runexec --output output_1_foo.log -- program --foo input_1.in
```
This executes `program --foo input_1.in` and prints measurements to standard
output, such as walltime, cputime, memory, I/O, etc. in a simple to read and
parse format. The output of program is redirected to `output_1_foo.log`.
This executes `program --foo input_1.in`, redirecting output to
`output_1_foo.log`. Then `runexec` prints relevant measurements, such as
walltime, cputime, memory, I/O, etc., to standard output in a simple to read
and parse format, for example:
```
starttime=2000-01-01T00:01:01.000001+00:00
returnvalue=0
walltime=0.0027799380040960386s
cputime=0.002098s
memory=360448B
pressure-cpu-some=0s
pressure-io-some=0s
pressure-memory-some=0s
```
See the [run result](run-results.md) documentation for further details on the
precise meaning of these values.

In case you want to limit the process to 60s wall time, 1 GB of memory, and one CPU core (by
[pinning](https://en.wikipedia.org/wiki/Processor_affinity) the process to CPU
number 0), simply run `$ runexec --walltimelimit 60s --memlimit 1GB --cores 0 ...` instead.

The tool `runexec` offers several other features, run `runexec --help` for
further information or refer to the [documentation](runexec.md).
Expand Down

0 comments on commit 3833b15

Please sign in to comment.