From b1f733b8ed2813ff65fd6bb357315fcaf4f90fc4 Mon Sep 17 00:00:00 2001 From: Aleksa Sarai Date: Mon, 4 Nov 2024 20:41:21 +1100 Subject: [PATCH 1/4] dmz: overlay: set xino=off to disable dmesg spam If /run/runc and /usr/bin are on different filesystems, overlayfs may enable the xino feature which results in the following log message: kernel: overlayfs: "xino" feature enabled using 3 upper inode bits. Each time we have to protect /proc/self/exe. So disable xino to remove the log message (we don't care about the inode numbers of the files anyway). Signed-off-by: Aleksa Sarai (cherry picked from commit 9bc42d61bb6ce280c48a6b491357ec773b2adf45) Signed-off-by: lfbzhm --- libcontainer/dmz/overlayfs_linux.go | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/libcontainer/dmz/overlayfs_linux.go b/libcontainer/dmz/overlayfs_linux.go index 92cb1944e59..b81b7025895 100644 --- a/libcontainer/dmz/overlayfs_linux.go +++ b/libcontainer/dmz/overlayfs_linux.go @@ -84,6 +84,13 @@ func sealedOverlayfs(binPath, tmpDir string) (_ *os.File, Err error) { return nil, fmt.Errorf("fsconfig set overlayfs lowerdir=%s: %w", lowerDirStr, err) } + // We don't care about xino (Linux 4.17) but it will be auto-enabled on + // some systems (if /run/runc and /usr/bin are on different filesystems) + // and this produces spurious dmesg log entries. We can safely ignore + // errors when disabling this because we don't actually care about the + // setting and we're just opportunistically disabling it. + _ = unix.FsconfigSetString(int(overlayCtx.Fd()), "xino", "off") + // Get an actual handle to the overlayfs. if err := unix.FsconfigCreate(int(overlayCtx.Fd())); err != nil { return nil, os.NewSyscallError("fsconfig create overlayfs", err) From 2421b592675b3c5b4cc13d9c80b9b50552d16407 Mon Sep 17 00:00:00 2001 From: Aleksa Sarai Date: Mon, 4 Nov 2024 20:47:07 +1100 Subject: [PATCH 2/4] memfd-bind: mention that overlayfs obviates the need for it Signed-off-by: Aleksa Sarai (cherry picked from commit aa505bfa89a6feaae5378a7a6e5166886f8bc0fc) Signed-off-by: lfbzhm --- contrib/cmd/memfd-bind/README.md | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/contrib/cmd/memfd-bind/README.md b/contrib/cmd/memfd-bind/README.md index a83cc78208c..e529eacfeaf 100644 --- a/contrib/cmd/memfd-bind/README.md +++ b/contrib/cmd/memfd-bind/README.md @@ -1,6 +1,15 @@ ## memfd-bind ## -`runc` normally has to make a binary copy of itself when constructing a +> **NOTE**: Since runc 1.2.0, runc will now use a private overlayfs mount to +> protect the runc binary. This protection is far more light-weight than +> memfd-bind, and for most users this should obviate the need for `memfd-bind` +> entirely. Rootless containers will still make a memfd copy (unless you are +> using `runc` itself inside a user namespace -- a-la +> [`rootlesskit`][rootlesskit]), but `memfd-bind` is not particularly useful +> for rootless container users anyway (see [Caveats](#Caveats) for more +> details). + +`runc` sometimes has to make a binary copy of itself when constructing a container process in order to defend against certain container runtime attacks such as CVE-2019-5736. @@ -38,6 +47,8 @@ much memory usage they can use: container process setup takes up about 10MB per process spawned inside the container by runc (both pid1 and `runc exec`). +[rootlesskit]: https://github.com/rootless-containers/rootlesskit + ### Caveats ### There are several downsides with using `memfd-bind` on the `runc` binary: From 82f3af85538d07a901af00fc5953c4faf7960859 Mon Sep 17 00:00:00 2001 From: Aleksa Sarai Date: Mon, 4 Nov 2024 20:49:42 +1100 Subject: [PATCH 3/4] readme: drop unused memfd-bind reference Fixes: 871057d863e8 ("drop runc-dmz solution according to overlay solution") Signed-off-by: Aleksa Sarai (cherry picked from commit b9dfb22dbfefe0b211adfe634d5348cd1ad39266) Signed-off-by: lfbzhm --- README.md | 2 -- 1 file changed, 2 deletions(-) diff --git a/README.md b/README.md index 8cbe1fe6878..50fcd4e9222 100644 --- a/README.md +++ b/README.md @@ -113,8 +113,6 @@ The following build tags were used earlier, but are now obsoleted: - **apparmor** (since runc v1.0.0-rc93 the feature is always enabled) - **selinux** (since runc v1.0.0-rc93 the feature is always enabled) - [contrib-memfd-bind]: /contrib/cmd/memfd-bind/README.md - ### Running the test suite `runc` currently supports running its test suite via Docker. From eb676de15170c02da1bca0ed00ff1c844a8e5bb0 Mon Sep 17 00:00:00 2001 From: Aleksa Sarai Date: Wed, 13 Nov 2024 01:19:46 +1100 Subject: [PATCH 4/4] memfd-bind: elaborate kernel requirements for overlayfs protection Arguably these docs should live elsewhere (especially if we plan to remove memfd-bind in the future), but for now this is the only place that fully explains this issue. Suggested-by: Rodrigo Campos Signed-off-by: Aleksa Sarai (cherry picked from commit ac435895b909edba7c7fbca6e88a53ca11a3cb95) Signed-off-by: lfbzhm --- contrib/cmd/memfd-bind/README.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/contrib/cmd/memfd-bind/README.md b/contrib/cmd/memfd-bind/README.md index e529eacfeaf..93229250259 100644 --- a/contrib/cmd/memfd-bind/README.md +++ b/contrib/cmd/memfd-bind/README.md @@ -1,13 +1,13 @@ ## memfd-bind ## > **NOTE**: Since runc 1.2.0, runc will now use a private overlayfs mount to -> protect the runc binary. This protection is far more light-weight than -> memfd-bind, and for most users this should obviate the need for `memfd-bind` -> entirely. Rootless containers will still make a memfd copy (unless you are -> using `runc` itself inside a user namespace -- a-la -> [`rootlesskit`][rootlesskit]), but `memfd-bind` is not particularly useful -> for rootless container users anyway (see [Caveats](#Caveats) for more -> details). +> protect the runc binary (if you are on Linux 5.1 or later). This protection +> is far more light-weight than memfd-bind, and for most users this should +> obviate the need for `memfd-bind` entirely. Rootless containers will still +> make a memfd copy (unless you are using `runc` itself inside a user namespace +> -- a-la [`rootlesskit`][rootlesskit] -- and are on Linux 5.11 or later), but +> `memfd-bind` is not particularly useful for rootless container users anyway +> (see [Caveats](#Caveats) for more details). `runc` sometimes has to make a binary copy of itself when constructing a container process in order to defend against certain container runtime attacks