Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running inside an interactive container #1048

Closed
incaseoftrouble opened this issue May 29, 2024 · 19 comments
Closed

Running inside an interactive container #1048

incaseoftrouble opened this issue May 29, 2024 · 19 comments
Labels

Comments

@incaseoftrouble
Copy link
Contributor

incaseoftrouble commented May 29, 2024

Here is a (hopefully somewhat reliable) way of getting benchexec inside an interactive container.
The main trick is to fuse the "move init into sub-cgroup" step directly into the entrypoint.
The other two scripts are just there to move benchexec into a fresh cgroup and would be superfluous if benchexec gets a --cgroup /xy (or --create-cgroup) switch.
With the latter, it would look kinda neat I think.

EDIT: Such a behavior has been added


Setup in Containerized Environments

This guide explains the complications of using Docker with benchexec and shows
how you can create your own interactive Docker image (or adapt existing ones)
to use BenchExec within it (with cgroups v2, the standard nowadays).

Note that the following setup is not guaranteed to work for every setup, since
there are too many variables, and you may need to slightly adapt some parts.

Background

Just like BenchExec, tools like Docker and Podman make heavy use of cgroups to
isolate processes. Thus, in order to get them working nicely together, some
additional tricks are required, especially in an interactive container (e.g.
one with a shell, where multiple commands can be executed).

The main complication of using BenchExec with Docker is that benchexec needs a
"clean" cgroup on its own, however, a interactive Docker container has the
shell inside the "root" cgroup, which prevents child cgroups with controllers
being created. So, we need to move the init process (and thus all subsequent
ones) into a separte cgroup. (See the "no internal processes" rule in the
cgroup documentation for more
information.)

Also, note that there is a difference between cgroups version 1 and 2, and
between Docker and Podman. We recommend to use Podman (compatible with Docker)
as it provides "rootless" containers, but provide setup for either cases. We
assume cgroups v2, since it is the standard for recent Linux distributions (you
are running on cgroups v2 if the file /sys/fs/cgroup/cgroup.controllers
exists).

Creating the Image

First, create the script init.sh with the following content:

#!/bin/sh
set -eu

# Create new sub-cgroups
# Note: While "init" can be renamed, the name "benchexec" is important
mkdir -p /sys/fs/cgroup/init /sys/fs/cgroup/benchexec
# Move the init process to that cgroup
echo $$ > /sys/fs/cgroup/init/cgroup.procs

# Enable controllers in subtrees for benchexec to use
for controller in $(cat /sys/fs/cgroup/cgroup.controllers); do
  echo "+$controller" > /sys/fs/cgroup/cgroup.subtree_control
  echo "+$controller" > /sys/fs/cgroup/benchexec/cgroup.subtree_control
done

# ... or whatever your init process should be
exec bash "$@"

and set it executable (chmod +x init.sh).

Now, pack this script into your Docker image and set it as entry point. If you
are working with a standard Docker image, create a file Dockerfile

FROM debian:bookworm-slim
# Or whichever base image you are using

# Install python and wget -- add further dependencies of your application here
RUN apt-get update && apt-get -y install wget python3-minimal \
  && rm -rf /var/lib/apt/lists/*

# Download BenchExec -- Instead, you can also use, for example, the pip-based setup or COPY the .whl
RUN wget https://github.com/sosy-lab/benchexec/releases/download/3.21/BenchExec-3.21-py3-none-any.whl -O /opt/benchexec.whl

# Copy the created scripts
COPY init.sh /init.sh
# Set init.sh as the entrypoint -- It is important to use brackets here so that init.sh is the first and only process
ENTRYPOINT [ "/init.sh" ]

If you already have a Dockerfile, you only need to get the .whl into it and
add the last two commands (i.e. copying the script and setting the entrypoint).

In case you want to modify this setup, please consider the notes mentioned
below.

Running BenchExec in the Container

With this setup finished, run docker build -t <tag> . or
podman build -t <tag> . in the directory where the Dockerfile is located.

Then, start the image with docker run --privileged --cap-drop=all -it <tag>
or
podman run --security-opt unmask=/sys/fs/cgroup --cgroups=split --security-opt unmask=/proc/* --security-opt seccomp=unconfined -it <tag>

With this, you should be able to run, for example,
PYTHONPATH=/opt/benchexec.whl python3 -m benchexec.runexecutor <program>
inside the Docker container -- this is runexec.

Technical Details for Adaptation

There are many ways to achieve the required setup and users familiar with
Docker may choose to adapt the above procedure. There are a few peculiarities
to be aware of.

The goal is to create a cgroup called /benchexec with relevant controllers
enabled. here, note that it is important that the cgroup has exactly this path,
as benchexec uses this as a "fallback" on non-systemd setups.

Additionally, you should be aware of the implications of the "no internal
process"-rule. Effectively, this means that there cannot be any processes in
the root cgroup of the container (which, notably, is not the root cgroup of
the system). So, for the provided init.sh to work, it needs to be directly
executed (with ENTRYPOINT [ "/init.sh" ]). In particular, using, for example,
ENTRYPOINT "/init.sh" would not work, since this creates a shell that then
runs init.sh as a sub-process, meaning that the root cgroup would not be
empty when we try to enable subtree control.

In case you run into Device or resource busy errors, the problem likely is
that you want to enable subtree control in a non-empty cgroup or, vice versa,
move a process to cgroup with enabled subtree control, both of which is
prohibited for cgroups by design. For debugging, it is useful to inspect the
cgroup.procs and cgroup.subtree_control of the cgroup in question. Here,
note that calls like cat do also create a separate process.


I'm "archiving" the cgexec and cleanup script here

cgexec.sh

#!/bin/sh

[ "$#" -ge 2 ] || exit 1
# Move this process into the given group
echo $$ > "/sys/fs/cgroup/$1/cgroup.procs" || exit 1
shift 1
exec "$@"

benchexec.sh

#!/bin/sh
set -eu

name="benchexec_$(date +%s)"
# Create a fresh cgroup
mkdir -p "/sys/fs/cgroup/${name}"
# Run benchexec in this cgroup
! PYTHONPATH=/opt/benchexec.whl /usr/local/bin/cgexec.sh "${name}" "$@"
# Clean up the cgroup
! rmdir /sys/fs/cgroup/${name}/benchexec_process_* 2>/dev/null
rmdir /sys/fs/cgroup/${name}
@PhilippWendler
Copy link
Member

PhilippWendler commented May 29, 2024

@Po-Chun-Chien You were recently interested in this use case as well, right? How did you solve it? Would you mind sharing information and looking at the above? Would be interested to hear what you think about this as our artifact expert. :-)

I will look at this next week.

@Po-Chun-Chien
Copy link
Member

My use case was a bit different -- I did not need an interactive container.
(Running BencheExec inside a non-interacitve container is covered by existing doc.)
But this does seem like an interesting tutorial,
and having an interactive container would make it easier for testing some tools.

@PhilippWendler
Copy link
Member

@Po-Chun-Chien Ok, I thought you were trying to get BenchExec running as a subprocess of some script inside the container, and this is the same case as an interactive container. Thanks for reading it anyway!

The other two scripts are just there to move benchexec into a fresh cgroup and would be superfluous if benchexec gets a --cgroup /xy (or --create-cgroup) switch.
With the latter, it would look kinda neat I think.

For cgroups v1, we provide a systemd service (either for manual installation or as part of our Debian package) that configures a cgroup with a specific name (/system.slice/benchexec-cgroup.service). And if BenchExec has no other way to get a usable cgroup, it uses a hard-coded fallback and tries that cgroup. How about doing something like this for the current use case? I.e., if BenchExec cannot use the current cgroup and pystemd is not available, then try to use /benchexec or so? That would avoid the need to always remember passing this parameter.

I think we will definitively either add this logic or a parameter such that this tutorial can be simplified.

If we go with the parameter, I envision something like --cgroupParent /foo which would tell BenchExec to use /foo as the parent cgroup for all the cgroup business it does, i.e., it would only create cgroups within that one (and not use that cgroup for anything else beyond child-cgroup creation). If /foo doesn't exist yet it could even try to create it recursively. This would mean that for the use case here both --cgroupParent / and --cgroupParent /benchexec would work, and that users do not need to bother with ensuring a unique name.

@incaseoftrouble
Copy link
Contributor Author

How about doing something like this for the current use case?

I am torn on this. "Automagical" things are nice until they aren't. But I think there is little to no harm in creating a cgroup if it doesn't exist. I think in either case, you should have a parameter -- either it is required or it can change the default behavior.

then try to use /benchexec or so? That would avoid the need to always remember passing this parameter.

I would maybe rather try /benchexec_<random string>?

But then again, if for some obscure reason the "hardcoded" /benchexec is not usable either, benchexec could then (and only then) terminate with an error like "I couldn't find any cgroup to use, please specify a usable one with --cgroupParent /foo" (paraphrasing). Just magically using /benchexec should work in 99.99% of the cases I assume.

@PhilippWendler
Copy link
Member

How about doing something like this for the current use case?

I am torn on this. "Automagical" things are nice until they aren't. But I think there is little to no harm in creating a cgroup if it doesn't exist.

Ok, I'll implement what I proposed.

I think in either case, you should have a parameter -- either it is required or it can change the default behavior.

I will implement it once we have a concrete feature request.

then try to use /benchexec or so? That would avoid the need to always remember passing this parameter.

I would maybe rather try /benchexec_<random string>?

Inside that /benchexec BenchExec would create one with a unique name. But I think it is nicer to group all of them in /benchexec.

Actually, there is also a stronger argument: I always found it important that BenchExec "plays nicely" with cgroups and does not disturb other uses and tools of cgroups (for example, following rules like this). A part of this is that subprocesses should only modify the cgroup hierarchy below the cgroup they are in, i.e., not move processes to some other unrelated part of the hierarchy, in order to not escape limits and measurements placed on their original cgroup. So far we always follow this, with two exceptions:

  • The /system.slice/benchexec-cgroup.service for cgroups v1. Here we assume that if somebody configured their system such that this is present, it is intended that BenchExec uses it.
  • When we ask systemd to create and delegate a cgroup for us for cgroups v2. We assume that this is expected behavior of applications on systems with systemd. Plus, systemd does not allow us to create something below our current cgroup.

So to follow this principle, we should not blindly assume that / is intended for BenchExec to use if BenchExec itself is in a different cgroup. But if /benchexec exists, then we can make that assumption (similar to the v1 case).

And in the case discussed in this issue it is trivial to provide /benchexec.

But then again, if for some obscure reason the "hardcoded" /benchexec is not usable either, benchexec could then (and only then) terminate with an error like "I couldn't find any cgroup to use, please specify a usable one with --cgroupParent /foo" (paraphrasing).

If neither the current cgroup nor /benchexec is working, telling the user to specify some other cgroup won't be useful: They will not have another cgroup that is ready to use and BenchExec just needs to be told about it. Instead they will need to add something to their setup such that there is a useable cgroup. And we cannot tell them how to do this in the general case (where we can make an educated guess we try to do so).

PhilippWendler added a commit that referenced this issue Jun 3, 2024
For cgroups v1 we already use /system.slice/benchexec-cgroup.service
as a fallback cgroup if it exists and we cannot use our current cgroup.
For cgroups v2 we do not yet have such a fallback.
However, implementing this makes it much easier for people
on systems without systemd (e.g., in a container)
to setup and configure cgroups for BenchExec, so let's do this.

Background discussion is in #1048,
from which also the suggested commands in the documentation are taken.

Thanks to @incaseoftrouble for reviewing the design of this.
@PhilippWendler
Copy link
Member

Done in 29480b3. Would you mind testing this and updating the above container tutorial?

@incaseoftrouble
Copy link
Contributor Author

incaseoftrouble commented Jun 3, 2024

Ok, I'll implement what I proposed.

Thanks a lot! I will integrate this into the tutorial once it is ready.

Just to be clear: The proposal is to try to configure the /benchexec cgroup (and create it if it doesn't exist)?

EDIT: Ah! So the proposal is to "just" use the cgroup if it exists. Let me try it.

If neither the current cgroup nor /benchexec is working, telling the user to specify some other cgroup won't be useful

The case I am imagining is that the benchexec group already exists and is in use for something else (for whatever reason). Then I could just create the cgroup benchexec2 (or tell BenchExec to create it for me). But, again, I admit that this is niche.

@PhilippWendler
Copy link
Member

Just to be clear: The proposal is to try to configure the /benchexec cgroup (and create it if it doesn't exist)?

EDIT: Ah! So the proposal is to "just" use the cgroup if it exists. Let me try it.

Indeed, for the reason explained above. If we ever add a parameter --cgroupParent, we could try to create and configure the passed cgroup, but not for the automagic case. (This was a little bit mixed maybe in the discussion because I noticed the important distinction in this regard late.)

@incaseoftrouble
Copy link
Contributor Author

incaseoftrouble commented Jun 3, 2024

Two observations:

  1. /benchexec doesn't exist
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/opt/benchexec.whl/benchexec/runexecutor.py", line 1282, in <module>
  File "/opt/benchexec.whl/benchexec/runexecutor.py", line 228, in main
  File "/opt/benchexec.whl/benchexec/runexecutor.py", line 335, in __init__
  File "/opt/benchexec.whl/benchexec/runexecutor.py", line 341, in _init_cgroups
  File "/opt/benchexec.whl/benchexec/cgroups.py", line 89, in initialize
  File "/opt/benchexec.whl/benchexec/cgroupsv2.py", line 112, in initialize
  File "/opt/benchexec.whl/benchexec/cgroupsv2.py", line 206, in _try_fallback_cgroup
  File "/opt/benchexec.whl/benchexec/cgroupsv2.py", line 518, in get_all_tasks
FileNotFoundError: [Errno 2] No such file or directory: '/sys/fs/cgroup/benchexec/cgroup.procs'
  1. It does exist: Works, but the sub-cgroups are not cleaned up. I think this would be nice to add.

@PhilippWendler
Copy link
Member

Thanks, the crash is fixed.

Cleanup of created cgroups is too tricky as explained before and I won't attempt this.

@incaseoftrouble
Copy link
Contributor Author

incaseoftrouble commented Jun 4, 2024

First, cgroup cleanup is not of the highest priority. Cleaning up the cgroups from the outside is not too difficult and can be scripted. Let's move on with the tutorial. I adapted the above text.

A few words on the cleanup: I could imagine this to work reasonably in that concrete case (being given a fresh cgroup to move to). Effectively, it should "only" be reversal of all actions. I see two options:

  1. move the benchexec process back to the initial cgroup and then remove the created sub-groups
  2. When benchexec detects that it isn't running in the cgroup it wants to run in (which is always the case, right? It always moves itself to a new cgroup), instead of creating and moving itself into the cgroup, we could create the cgroup and then spawn a child-process inside that cgroup which does the heavy lifting and the parent simply cleans up all the cgroups it created, similar to what the two scripts above were doing (this seems quite clean TBH)

I can have a go at this if you want (I like option 2)! (just no promises when, but it seems like a train-ride-activity ;) ) -- If you want to, just mention it, then I will create an issue that gathers relevant points.

Reason: I am worried about runexec being used for a few hundred calls, which would create several hundred cgroups, and I am not sure about the overhead. Also, I do believe whatever the process creates it should also destroy.

@PhilippWendler
Copy link
Member

A few words on the cleanup: I could imagine this to work reasonably in that concrete case (being given a fresh cgroup to move to). Effectively, it should "only" be reversal of all actions. I see two options:

  1. move the benchexec process back to the initial cgroup and then remove the created sub-groups

This is tricky, because we need to reliably know when we should do the move back. If runexec is used from within a Python program we cannot even know this. And then there are lots of failure cases like the original cgroup being no longer available etc. I don't want to risk this.

  1. When benchexec detects that it isn't running in the cgroup it wants to run in (which is always the case, right? It always moves itself to a new cgroup), instead of creating and moving itself into the cgroup, we could create the cgroup and then spawn a child-process inside that cgroup which does the heavy lifting and the parent simply cleans up all the cgroups it created (this seems quite clean TBH)

This won't work because the parent process cannot stay in its original cgroup, because then we cannot create the new cgroup. This is the whole reason we move the BenchExec process itself in the first place. (The new automagic fallback is the only situation where this could work, but I want to keep it as similar to the standard case.)

Reason: I am worried about runexec being used for a few hundred calls, which would create several hundred cgroups, and I am not sure about the overhead.

If this turns out to be a problem, we may reconsider.

Also, I do believe whatever the process creates it should also destroy.

Sure, but this is nice-to-have and the added complexity and failure potential does not seem worth it.

@incaseoftrouble
Copy link
Contributor Author

Fair points, cgroups are complicated :)

If this turns out to be a problem, we may reconsider.

One idea for that case: Inside the cgroup that benchexec "owns" (say /benchexec) there could be one dedicated "cleanup" cgroup (/benchexec/cleanup) that every concrete execution moves to (from /benchexec/benchexec_process_<id> to /benchexec/cleanup) and then deletes the unique sub-group it created. This would at least upper-bound the leftovers.

@incaseoftrouble
Copy link
Contributor Author

I adapted the draft again; I think it is now in a good shape. Should this be incorporated into #1044 or a separate MR? If so, where should it go? I think you mentioned that this should probably not just be part of the quickstart but in the general container setup, right?

@PhilippWendler
Copy link
Member

Great, thank you very much! I personally would have not added that much cgroups background info, but I guess it might be nice to have for some readers.

My biggest suggestion would be to skip the installation part of BenchExec in the Dockerfile. Some might prefer pip, some wget. Hardcoding a version number in the example will inevitably lead people to copy-and-paste this without thinking and use an old version in the future. So how about something like # install BenchExec (e.g., with pip or by copying the .whl file) and all other requirements for your container here instead of the two RUN lines?

Apart from this I would probably only have minor wording stuff, which would be easier to handle in a PR. Can be separate or in #1044, I don't have a strong opinion here. Feel free to use #1044 for simplicity, especially because I think you want to link the new docs from the tutorial.

As to where to put it: Yes, I think it would be too hidden in the tutorial, because it is also relevant for people with knowledge about BenchExec. I am wondering whether we should have a new Markdown file grouping everything for running BenchExec in a container, and also move the descriptions from INSTALL.md there? First describe how to run the container in the easy case and then your new text? doc/container.md is unfortunately already taken and docker.md too specific, maybe doc/benchexec-in-container.md? Suggestions welcome.

@incaseoftrouble
Copy link
Contributor Author

incaseoftrouble commented Jun 5, 2024

I personally would have not added that much cgroups background info, but I guess it might be nice to have for some readers.

Do you mean the "background" in the beginning or "technical details"?

The former I am also unsure about. I personally like it when there is some understandable reason why things have to be "complicated", something along the lines "this is the issue, trust me, if you want to know more, here are some links", so that I can understand why I am guided to do certain things. But I think this could be cut down. I will think about it.

The technical details I want to strongly separate from the rest, its not part of the tutorial but sort of "for advanced users". But I think it should be there.

My biggest suggestion would be to skip the installation part of BenchExec in the Dockerfile.

Agree! I just like the fact that you can put a simple .whl in there and be done with it. I was looking for a way to point to "latest" release (but this isn't ideal for containers, either). I will rephrase when moving to the PR.

Feel free to use #1044 for simplicity, especially because I think you want to link the new docs from the tutorial.

Yes, agree.

doc/container.md is unfortunately already taken and docker.md too specific, maybe doc/benchexec-in-container.md

Well, container.md currently is a mix of container mode and running inside a container. I think it would be fine to split this up into isolation-with-container-mode.md (or so) and benchexec-in-container.md? In the context of BenchExec, container.md will be ambiguous either way (maybe this could be left in and say "if you are looking for (a) then go here, if you are looking for (b) go there").

(I think this is the only open point, but should be discussed before I integrate this into the PR; I would then check that all links are updated correctly.)

@PhilippWendler
Copy link
Member

I personally would have not added that much cgroups background info, but I guess it might be nice to have for some readers.

Do you mean the "background" in the beginning or "technical details"?

Both.

doc/container.md is unfortunately already taken and docker.md too specific, maybe doc/benchexec-in-container.md

Well, container.md currently is a mix of container mode and running inside a container.

For the latter part, it is only about what to consider with respect to BenchExec's container mode when running inside a container. So it is specifically about BenchExec's container mode.

I think it would be fine to split this up into isolation-with-container-mode.md (or so) and benchexec-in-container.md? In the context of BenchExec, container.md will be ambiguous either way (maybe this could be left in and say "if you are looking for (a) then go here, if you are looking for (b) go there").

I don't want to rename the existing file, it is fine that container.md is about BenchExec's container feature. And there are probably lots of links out there that would break. But we can add a note "If you are looking for information about how to use BenchExec in a container, ...".

@incaseoftrouble
Copy link
Contributor Author

So it is specifically about BenchExec's container mode.

Ah, I see; I think I misunderstood that when skimming through it.

But we can add a note "If you are looking for information about how to use BenchExec in a container, ...".

Fine with me; will do. I'll put the rest into benchexec-in-container.md as working title. We can still re-name that file.

@incaseoftrouble
Copy link
Contributor Author

Further discussion in #1044

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

3 participants