Skip to content

Commit

Permalink
Fixes
Browse files Browse the repository at this point in the history
  • Loading branch information
incaseoftrouble committed Jun 7, 2024
1 parent 474a08f commit d6ff92b
Show file tree
Hide file tree
Showing 2 changed files with 84 additions and 66 deletions.
145 changes: 81 additions & 64 deletions doc/benchexec-in-container.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,11 @@ containerized environments (such as Docker or Podman) and shows how you can
create your own interactive Docker image (or adapt existing ones) to use
BenchExec within it.

There is a difference between executing only BenchExec inside a
(non-interactive) container and a fully interactive one which contains
multiple processes, with the former being significantly simpler. Both are
discussed here.

Note: It is possible to use BenchExec inside other container environments, e.g.
Docker, but we strongly recommend to use [Podman](https://podman.io/)
(compatible with Docker), as it provides "rootless" containers, i.e. its
Expand All @@ -22,17 +27,31 @@ containers). However, as the setup is largely the same, we provide it for
either case. In case you do want to use Docker, please also consider following
the (easy to implement) [security recommendations](#securing-docker-execution).

If you only require a non-interactive setup, i.e. only run a single BenchExec
process inside a container, you can avoid most of the complications. See
[below](#non-interactive-setups) for more details.

## Executing BenchExec in Docker and Podman
## Non-interactive Setups

In case you intend to only execute one BenchExec process and nothing else, the
setup is quite simple. It is sufficient to execute the BenchExec process as
entry point in a container with all required dependencies installed. For
example, if BenchExec is installed in a container with tag `my-container`, you
can simply execute
```
podman run --security-opt unmask=/sys/fs/cgroup --cgroups=split --security-opt unmask=/proc/* --security-opt seccomp=unconfined -t my-benchexec my-container <arguments>
```
or
```
docker run --privileged --cap-drop=all -t my-container benchexec <arguments>
```

Below follows a step-by-step guide to create a Docker / Podman image with

## BenchExec in Interactive Containers

Next follows a step-by-step guide to create a Docker / Podman image with
BenchExec (assuming cgroups v2, the standard nowadays). Some further background
and reasoning is provided later. Summarized, the main reason why BenchExec
needs a "custom" setup for containers is due to how cgroups work in combination
with containers; we need to "manually" set up a separate cgroup for BenchExec.
A brief guidline for the outdated cgroups v1 is provided [below](#cgroups-v1).

While this setup should work on most recent system, we cannot guarantee this,
since there simply are too many variables. In some cases, you may need to
Expand Down Expand Up @@ -68,15 +87,16 @@ are working with a standard Docker image, create a file `Dockerfile`
FROM debian:bookworm-slim
# Or whichever base image you are using

# Install python -- add further dependencies of your application here
RUN apt-get update && apt-get -y install \
python3-minimal \
&& rm -rf /var/lib/apt/lists/*
# Install all dependencies of your tool and BenchExec
# Example:
# RUN apt-get update && apt-get -y install \
# python3-minimal \
# && rm -rf /var/lib/apt/lists/*

# TODO Install BenchExec with any method (apt install, pip install, or just copy the .whl)
# Install BenchExec with any method (apt install, pip install, or just copy the .whl)
# Examples:
# RUN pip install benchexec
# RUN wget https://github.com/sosy-lab/benchexec/releases/download/<current release>.whl -O /opt/benchexec.whl
# ... or any other method

# Copy the created script
COPY init.sh /init.sh
Expand All @@ -97,19 +117,19 @@ Dockerfile is located to build the container.

Start the image with
```
podman run --security-opt unmask=/sys/fs/cgroup --cgroups=split --security-opt unmask=/proc/* --security-opt seccomp=unconfined -it <tag>
podman run --security-opt unmask=/sys/fs/cgroup --cgroups=split --security-opt unmask=/proc/* --security-opt seccomp=unconfined -it <tag>
```
or
```
docker run --privileged --cap-drop=all -it <tag>
docker run --privileged --cap-drop=all -it <tag>
```
<!--
In principle, `--security-opt systempaths=unconfined --security-opt seccomp=unconfined`
should also be sufficient as Docker arguments,
but then mounting within the container still fails.
-->

> **IMPORTANT**: The `--privileged` argument, gives the Docker container *full
> **IMPORTANT**: The `--privileged` argument gives the Docker container *full
> root access* to the host, so make sure to include the `--cap-drop=all` flag,
> use this command only with trusted images, and configure your Docker
> container such that everything in it is executed under a different user
Expand All @@ -123,8 +143,8 @@ need to execute
`PYTHONPATH=/opt/benchexec.whl python3 -m benchexec.runexecutor <program>`
instead.)

In case you want to modify this setup, please consider the notes mentioned
[below](#background-and-technical-details).
In case you want to modify this setup, please consider the
[background information below](#background-and-technical-details).

### Securing Docker Execution

Expand All @@ -136,10 +156,10 @@ As such, a small mistake in, e.g., an evaluation script that deletes temporary
files, could easily wreak havoc on your system. Using Podman directly mitigates
this issue, as the Podman container runs with the permissions of the current
user, not `root`, and thus cannot mess with system files (in particular, the
"`root`" user in a Podman container is "mapped" to your current user).
`root` user in a Podman container is mapped to your current user).

To significantly increase security of your Docker execution, add
```docker
```dockerfile
# Create non-root user and set it as default
RUN useradd -ms /bin/bash user
WORKDIR /home/user
Expand All @@ -150,19 +170,26 @@ at the end of your Dockerfile. Note: This is still worse than using Podman
permissions on the host) but drastically reduces the chance for accidental
problems.

### Non-interactive Setups

In case you intend to only execute one BenchExec process inside the container,
the setup simplifies notably. In particular, then, it is sufficient to execute
the BenchExec process as entry point, without requiring the `init.sh` script.
For example, if BenchExec is installed in a container with tag `my-benchexec`,
you can simply execute
`podman run <same flags as above> -t my-benchexec benchexec <arguments>`
(analogous with Docker).
## Cgroups v1

With cgroups v1, use the following command line to start your container
```
podman run -v /sys/fs/cgroup:/sys/fs/cgroup:rw --security-opt unmask=/sys/fs/cgroup --cgroups=split --security-opt unmask=/proc/* --security-opt seccomp=unconfined -it <tag>
```
or
```
docker run -v /sys/fs/cgroup:/sys/fs/cgroup:rw --privileged --cap-drop=all -it <tag>
```
Note that this will not work with cgroups v2.

The special setup for interactive containers is not necessary in this case,
since cgroup v1 is designed differently. However, be advised that in either
case, the container unavoidably has access to the root cgroup of your entire
system, and, in case of Docker, full write access to this cgroup. Thus,
following the [security advice](#securing-docker-execution) is especially
recommended here.

This is mostly useful for executing a large benchmark suite with `benchexec`
inside a container with the benchmarked tools installed, but processing the
results separately.

## Background and Technical Details

Expand All @@ -171,19 +198,29 @@ isolate processes. Thus, in order to get them working nicely together, some
additional tricks are required, especially in an interactive container (e.g.
one with a shell, where multiple commands can be executed).

The main complication of using BenchExec with Docker is that benchexec needs a
"clean" cgroup together with enabled controllers. On a standard system, cgroups
are usually managed by, for example, systemd, from which BenchExec can
transparently obtain its cgroup. In contrast, in an interactive container, the
root process usually is just a shell, which is placed in the "root" cgroup of
the container. Due to the "no internal processes" rule (see the
[cgroup documentation](https://man.archlinux.org/man/cgroups.7) for more
information) any (non-root) cgroup can *either* contain processes *or* delegate
controllers to a child group. Thus, the shell in the root cgroup of the
container (which is not the root cgroup of the overall system) prevents child
cgroups with controllers being created. So, we need to move the init process
(and thus all subsequent ones) into a separte cgroup, which is the purpose of
the `init.sh` script.
BenchExec needs a dedicated cgroup together with enabled controllers to cleanly
isolate and precisely measure each tool execution. On a standard system,
cgroups are usually managed by systemd, from which BenchExec can automatically
obtain such a cgroup. In contrast, in an interactive container, the init
process usually is just a shell, which is placed in the "root" cgroup of
the container, without dedicated cgroup management. Thus, the cgroup needs to
be managed manually.

One might assume that BenchExec should be able to do this on its own.
Unfortunately, due to the "no internal processes" rule (see the
[cgroup documentation](https://www.man7.org/linux/man-pages/man7/cgroups.7.html)
for more information) any (non-root) cgroup can *either* contain processes *or*
delegate controllers to a child group. Thus, the shell in the root cgroup of
the container (which is not the root cgroup of the overall system) prevents
child cgroups with controllers being created. So, we need to move the init
process (and thus all subsequent ones) into a separate cgroup, which is the
purpose of the `init.sh` script. Only then can we create a separate, empty
cgroup dedicated to BenchExec.

Note that this also is the reason why non-interactive setups do not this. Then,
BenchExec is the sole root process, so it can use the root cgroup of the
container.


### Adaptation

Expand All @@ -192,8 +229,8 @@ Docker may choose to adapt the above procedure. There are a few peculiarities
to be aware of.

The goal is to create a cgroup called `/benchexec` with relevant controllers
enabled. here, note that it is important that the cgroup has exactly this path,
as benchexec uses this as a "fallback" on non-systemd setups.
enabled. Note that it is important that the cgroup has exactly this path, as
BenchExec uses this as a "fallback" on non-systemd setups.

Additionally, you should be aware of the implications of the "no internal
process"-rule. Effectively, this means that there cannot be any processes in
Expand All @@ -211,23 +248,3 @@ prohibited for cgroups by design. For debugging, it is useful to inspect the
`cgroup.procs` and `cgroup.subtree_control` of the cgroup in question. Here,
note that calls like `cat` do also create a separate process.

Chowning the cgroup and only then switching to the created user is only
required for Podman. In Docker, it is sufficient to add `USER user` to the
Dockerfile and at the end of `init.sh` simply add `exec "$@"`. This is due to
the docker image being executed with `--privileged`, which essentially
disables proper permissions handling on, e.g., the cgroups, and any user in the
image can modify the cgroups.

<!--
## Cgroups v1
For other cases, e.g., with cgroups v1,
please use the following command line argument
to mount the cgroup hierarchy within the container when starting it
(same for Podman):
docker run -v /sys/fs/cgroup:/sys/fs/cgroup:rw ...
Note that you additionally need some flags for container mode,
which are explained in the [container documentation](container.md#using-benchexec-in-a-dockerpodman-container).
-->
5 changes: 3 additions & 2 deletions doc/quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,9 +78,10 @@ You should be able to execute the command
`python3 -m benchexec.check_cgroups` without issues. See
[the installation guide](INSTALL.md) for further details and troubleshooting.

> Note: To run inside a docker (or similar) container, unfortunately some
> Note: To run inside a Docker (or similar) container, unfortunately some
> more tinkering might be required due to how process isolation with cgroups
> works. A step-by-step guide for docker follows at the end of the document.
> works. We provide a separate
> [step-by-step guide](#benchexec-in-container.md).
### Step 2. Consider the Setup of Your Benchmark

Expand Down

0 comments on commit d6ff92b

Please sign in to comment.