Fixes

sosy-lab · Jun 7, 2024 · d6ff92b · d6ff92b
1 parent 474a08f
commit d6ff92b
Show file tree

Hide file tree

Showing 2 changed files with 84 additions and 66 deletions.
diff --git a/doc/benchexec-in-container.md b/doc/benchexec-in-container.md
@@ -14,6 +14,11 @@ containerized environments (such as Docker or Podman) and shows how you can
 create your own interactive Docker image (or adapt existing ones) to use
 BenchExec within it.
 
+There is a difference between executing only BenchExec inside a
+(non-interactive) container and a fully interactive one which contains
+multiple processes, with the former being significantly simpler. Both are
+discussed here.
+
 Note: It is possible to use BenchExec inside other container environments, e.g.
 Docker, but we strongly recommend to use [Podman](https://podman.io/)
 (compatible with Docker), as it provides "rootless" containers, i.e. its
@@ -22,17 +27,31 @@ containers). However, as the setup is largely the same, we provide it for
 either case. In case you do want to use Docker, please also consider following
 the (easy to implement) [security recommendations](#securing-docker-execution).
 
-If you only require a non-interactive setup, i.e. only run a single BenchExec
-process inside a container, you can avoid most of the complications. See
-[below](#non-interactive-setups) for more details.
 
-## Executing BenchExec in Docker and Podman
+## Non-interactive Setups
+
+In case you intend to only execute one BenchExec process and nothing else, the
+setup is quite simple. It is sufficient to execute the BenchExec process as
+entry point in a container with all required dependencies installed. For
+example, if BenchExec is installed in a container with tag `my-container`, you
+can simply execute
+```
+podman run --security-opt unmask=/sys/fs/cgroup --cgroups=split --security-opt unmask=/proc/* --security-opt seccomp=unconfined -t my-benchexec my-container <arguments>
+```
+or
+```
+  docker run --privileged --cap-drop=all -t my-container benchexec <arguments>
+```
 
-Below follows a step-by-step guide to create a Docker / Podman image with
+
+## BenchExec in Interactive Containers
+
+Next follows a step-by-step guide to create a Docker / Podman image with
 BenchExec (assuming cgroups v2, the standard nowadays). Some further background
 and reasoning is provided later. Summarized, the main reason why BenchExec
 needs a "custom" setup for containers is due to how cgroups work in combination
 with containers; we need to "manually" set up a separate cgroup for BenchExec.
+A brief guidline for the outdated cgroups v1 is provided [below](#cgroups-v1).
 
 While this setup should work on most recent system, we cannot guarantee this,
 since there simply are too many variables. In some cases, you may need to
@@ -68,15 +87,16 @@ are working with a standard Docker image, create a file `Dockerfile`
 FROM debian:bookworm-slim
 # Or whichever base image you are using
 
-# Install python -- add further dependencies of your application here
-RUN apt-get update && apt-get -y install \
-    python3-minimal \
-  && rm -rf /var/lib/apt/lists/*
+# Install all dependencies of your tool and BenchExec
+# Example:
+# RUN apt-get update && apt-get -y install \
+#     python3-minimal \
+#   && rm -rf /var/lib/apt/lists/*
 
-# TODO Install BenchExec with any method (apt install, pip install, or just copy the .whl)
+# Install BenchExec with any method (apt install, pip install, or just copy the .whl)
+# Examples:
 # RUN pip install benchexec
 # RUN wget https://github.com/sosy-lab/benchexec/releases/download/<current release>.whl -O /opt/benchexec.whl
-# ... or any other method
 
 # Copy the created script
 COPY init.sh /init.sh
@@ -97,19 +117,19 @@ Dockerfile is located to build the container.
 
 Start the image with
 ```
-  podman run --security-opt unmask=/sys/fs/cgroup --cgroups=split --security-opt unmask=/proc/* --security-opt seccomp=unconfined -it <tag>
+podman run --security-opt unmask=/sys/fs/cgroup --cgroups=split --security-opt unmask=/proc/* --security-opt seccomp=unconfined -it <tag>
 ```
 or
 ```
-  docker run --privileged --cap-drop=all -it <tag>
+docker run --privileged --cap-drop=all -it <tag>
 ```
 <!--
 In principle, `--security-opt systempaths=unconfined --security-opt seccomp=unconfined`
 should also be sufficient as Docker arguments,
 but then mounting within the container still fails.
 -->
 
-> **IMPORTANT**: The `--privileged` argument, gives the Docker container *full
+> **IMPORTANT**: The `--privileged` argument gives the Docker container *full
 > root access* to the host, so make sure to include the `--cap-drop=all` flag,
 > use this command only with trusted images, and configure your Docker
 > container such that everything in it is executed under a different user
@@ -123,8 +143,8 @@ need to execute
 `PYTHONPATH=/opt/benchexec.whl python3 -m benchexec.runexecutor <program>`
 instead.)
 
-In case you want to modify this setup, please consider the notes mentioned
-[below](#background-and-technical-details).
+In case you want to modify this setup, please consider the
+[background information below](#background-and-technical-details).
 
 ### Securing Docker Execution
 
@@ -136,10 +156,10 @@ As such, a small mistake in, e.g., an evaluation script that deletes temporary
 files, could easily wreak havoc on your system. Using Podman directly mitigates
 this issue, as the Podman container runs with the permissions of the current
 user, not `root`, and thus cannot mess with system files (in particular, the
-"`root`" user in a Podman container is "mapped" to your current user).
+`root` user in a Podman container is mapped to your current user).
 
 To significantly increase security of your Docker execution, add
-```docker
+```dockerfile
 # Create non-root user and set it as default
 RUN useradd -ms /bin/bash user
 WORKDIR /home/user
@@ -150,19 +170,26 @@ at the end of your Dockerfile. Note: This is still worse than using Podman
 permissions on the host) but drastically reduces the chance for accidental
 problems.
 
-### Non-interactive Setups
 
-In case you intend to only execute one BenchExec process inside the container,
-the setup simplifies notably. In particular, then, it is sufficient to execute
-the BenchExec process as entry point, without requiring the `init.sh` script.
-For example, if BenchExec is installed in a container with tag `my-benchexec`,
-you can simply execute
-`podman run <same flags as above> -t my-benchexec benchexec <arguments>`
-(analogous with Docker).
+## Cgroups v1
+
+With cgroups v1, use the following command line to start your container
+```
+podman run -v /sys/fs/cgroup:/sys/fs/cgroup:rw --security-opt unmask=/sys/fs/cgroup --cgroups=split --security-opt unmask=/proc/* --security-opt seccomp=unconfined -it <tag>
+```
+or
+```
+docker run -v /sys/fs/cgroup:/sys/fs/cgroup:rw --privileged --cap-drop=all -it <tag>
+```
+Note that this will not work with cgroups v2.
+
+The special setup for interactive containers is not necessary in this case,
+since cgroup v1 is designed differently. However, be advised that in either
+case, the container unavoidably has access to the root cgroup of your entire
+system, and, in case of Docker, full write access to this cgroup. Thus,
+following the [security advice](#securing-docker-execution) is especially
+recommended here.
 
-This is mostly useful for executing a large benchmark suite with `benchexec`
-inside a container with the benchmarked tools installed, but processing the
-results separately.
 
 ## Background and Technical Details
 
@@ -171,19 +198,29 @@ isolate processes. Thus, in order to get them working nicely together, some
 additional tricks are required, especially in an interactive container (e.g.
 one with a shell, where multiple commands can be executed).
 
-The main complication of using BenchExec with Docker is that benchexec needs a
-"clean" cgroup together with enabled controllers. On a standard system, cgroups
-are usually managed by, for example, systemd, from which BenchExec can
-transparently obtain its cgroup. In contrast, in an interactive container, the
-root process usually is just a shell, which is placed in the "root" cgroup of
-the container. Due to the "no internal processes" rule (see the
-[cgroup documentation](https://man.archlinux.org/man/cgroups.7) for more
-information) any (non-root) cgroup can *either* contain processes *or* delegate
-controllers to a child group. Thus, the shell in the root cgroup of the
-container (which is not the root cgroup of the overall system) prevents child
-cgroups with controllers being created. So, we need to move the init process
-(and thus all subsequent ones) into a separte cgroup, which is the purpose of
-the `init.sh` script.
+BenchExec needs a dedicated cgroup together with enabled controllers to cleanly
+isolate and precisely measure each tool execution. On a standard system,
+cgroups are usually managed by systemd, from which BenchExec can automatically
+obtain such a cgroup. In contrast, in an interactive container, the init
+process usually is just a shell, which is placed in the "root" cgroup of
+the container, without dedicated cgroup management. Thus, the cgroup needs to
+be managed manually.
+
+One might assume that BenchExec should be able to do this on its own.
+Unfortunately, due to the "no internal processes" rule (see the
+[cgroup documentation](https://www.man7.org/linux/man-pages/man7/cgroups.7.html)
+for more information) any (non-root) cgroup can *either* contain processes *or*
+delegate controllers to a child group. Thus, the shell in the root cgroup of
+the container (which is not the root cgroup of the overall system) prevents
+child cgroups with controllers being created. So, we need to move the init
+process (and thus all subsequent ones) into a separate cgroup, which is the
+purpose of the `init.sh` script. Only then can we create a separate, empty
+cgroup dedicated to BenchExec.
+
+Note that this also is the reason why non-interactive setups do not this. Then,
+BenchExec is the sole root process, so it can use the root cgroup of the
+container.
+
 
 ### Adaptation
 
@@ -192,8 +229,8 @@ Docker may choose to adapt the above procedure. There are a few peculiarities
 to be aware of.
 
 The goal is to create a cgroup called `/benchexec` with relevant controllers
-enabled. here, note that it is important that the cgroup has exactly this path,
-as benchexec uses this as a "fallback" on non-systemd setups.
+enabled. Note that it is important that the cgroup has exactly this path, as
+BenchExec uses this as a "fallback" on non-systemd setups.
 
 Additionally, you should be aware of the implications of the "no internal
 process"-rule. Effectively, this means that there cannot be any processes in
@@ -211,23 +248,3 @@ prohibited for cgroups by design. For debugging, it is useful to inspect the
 `cgroup.procs` and `cgroup.subtree_control` of the cgroup in question. Here,
 note that calls like `cat` do also create a separate process.
 
-Chowning the cgroup and only then switching to the created user is only
-required for Podman. In Docker, it is sufficient to add `USER user` to the
-Dockerfile and at the end of `init.sh` simply add `exec "$@"`. This is due to
-the docker image being executed with `--privileged`, which essentially
-disables proper permissions handling on, e.g., the cgroups, and any user in the
-image can modify the cgroups.
-
-<!--
-## Cgroups v1
-
-For other cases, e.g., with cgroups v1,
-please use the following command line argument
-to mount the cgroup hierarchy within the container when starting it
-(same for Podman):
-
-    docker run -v /sys/fs/cgroup:/sys/fs/cgroup:rw ...
-
-Note that you additionally need some flags for container mode,
-which are explained in the [container documentation](container.md#using-benchexec-in-a-dockerpodman-container).
--->
diff --git a/doc/quickstart.md b/doc/quickstart.md
@@ -78,9 +78,10 @@ You should be able to execute the command
 `python3 -m benchexec.check_cgroups` without issues. See
 [the installation guide](INSTALL.md) for further details and troubleshooting.
 
-> Note: To run inside a docker (or similar) container, unfortunately some
+> Note: To run inside a Docker (or similar) container, unfortunately some
 > more tinkering might be required due to how process isolation with cgroups
-> works. A step-by-step guide for docker follows at the end of the document.
+> works. We provide a separate
+> [step-by-step guide](#benchexec-in-container.md).
 
 ### Step 2. Consider the Setup of Your Benchmark