Skip to content

Commit

Permalink
MPI hook: added switch for execution under rootless runtime
Browse files Browse the repository at this point in the history
  • Loading branch information
Madeeks committed Sep 4, 2024
1 parent ede25d1 commit 49cbcf2
Show file tree
Hide file tree
Showing 9 changed files with 63 additions and 20 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).

- MPI hook: added support for the environment variable `MPI_COMPATIBILITY_TYPE` that defines the behaviour of the compatibility check of the libraries
that the hook mounts. Valid values are `major`, `full` and `strict`. Default value is `major`.
- MPI hook: added support for the `HOOK_ROOTLESS` environment variable, which enables to use the hook under rootless container runtimes
- SSH Hook: added a poststop functionality that kills the Dropbear process in case the hook does not join the container's PID namespace.
- Added the `sarus ps` command to list running containers
- Added the `sarus kill` command to terminate (and subsequently remove) containers
Expand Down
2 changes: 1 addition & 1 deletion CI/installation/install_packages_opensuseleap:15.5.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
set -ex

# Install packages
sudo zypper install -y gcc-c++ glibc-static wget which git gzip bzip2 tar \
sudo zypper install -y gcc-c++ glibc-static wget which git gzip bzip2 tar procps \
make autoconf automake squashfs cmake zlib-devel zlib-devel-static \
runc tini-static skopeo umoci \
libboost_filesystem1_75_0-devel \
Expand Down
32 changes: 24 additions & 8 deletions doc/config/mpi-hook.rst
Original file line number Diff line number Diff line change
Expand Up @@ -32,21 +32,23 @@ Hook configuration
==================

The program is meant to be run as a **createContainer** hook and does not accept
arguments, but its actions are controlled through a few environment variables:
arguments. The following environment variables must be defined:

* ``LDCONFIG_PATH`` (REQUIRED): Absolute path to a trusted ``ldconfig``
* ``LDCONFIG_PATH``: Absolute path to a trusted ``ldconfig``
program **on the host**.

* ``MPI_LIBS`` (REQUIRED): Colon separated list of full paths to the host's
* ``MPI_LIBS``: Colon separated list of full paths to the host's
libraries that will substitute the container's libraries. The ABI
compatibility is checked by comparing the version numbers specified in
the libraries' file names according to the specifications selected with the
variable ``MPI_COMPATIBILITY_TYPE``.

* ``MPI_COMPATIBILITY_TYPE`` (OPTIONAL): String determining the logic adopted
The following optional environment variables are also supported:

* ``MPI_COMPATIBILITY_TYPE``: String determining the logic adopted
to check the ABI compatibility of MPI libraries.
Must be one of ``major``, ``full``, or ``strict``.
If not defined, defaults to ``major``.
If defined, must be one of ``major``, ``full``, ``strict``.
If unset or set to an unexpected value, defaults to ``major``.
The checks performed for compatibility in the different cases are as follows:

* ``major``
Expand Down Expand Up @@ -76,17 +78,31 @@ arguments, but its actions are controlled through a few environment variables:
This compatibility check is in agreement with the MPICH ABI version number
schema.

* ``MPI_DEPENDENCY_LIBS`` (OPTIONAL): Colon separated list of absolute paths to
* ``MPI_DEPENDENCY_LIBS``: Colon separated list of absolute paths to
libraries that are dependencies of the ``MPI_LIBS``. These libraries
are always bind mounted in the container under ``/usr/lib``.

* ``BIND_MOUNTS`` (OPTIONAL): Colon separated list of absolute paths to generic
* ``BIND_MOUNTS``: Colon separated list of absolute paths to generic
files or directories that are required for the correct functionality of the
host MPI implementation (e.g. specific device files). These resources will
be bind mounted inside the container with the same path they have on the host.
If a path corresponds to a device file, that file will be whitelisted for
read/write access in the container's devices cgroup.

* ``HOOK_ROOTLESS``: String indicating whether the hook is being run under
a rootless container runtime. It determines some of the actions undertaken by
the hook before performing its bind mounts, for example if identity switches
are required to validate the mounts or to work with "root squashed"
filesystems.
By default, the hook operates in fully privileged mode, assuming "real root"
capabilities. This is the way the hook is run under Sarus, and in such a case
it is recommended to leave this environment variable unset.

If this variable is set to ``True`` (case-insensitive), the hook assumes
rootless execution.
This setting is intended to enable using the hook under unprivileged tools
like rootless Podman or Enroot.

The following is an example of `OCI hook JSON configuration file
<https://github.com/containers/common/blob/main/pkg/hooks/docs/oci-hooks.5.md>`_
enabling the MPI hook:
Expand Down
2 changes: 1 addition & 1 deletion doc/config/ssh-hook.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ In the prestart stage the hook sets up the container to accept connections and s
In the poststop stage, cleanup of the SSH daemon process takes place.
One OCI hook JSON configuration files is sufficient, provided it defines ``"stages": ["prestart", "poststop"]``.

The configuration of the ssh hook expects to receive its own name/location as the first argument,
The hook expects to receive its own name/location as the first argument,
and the string ``start-ssh-daemon`` as positional argument. In addition, the following
environment variables must be defined:

Expand Down
12 changes: 8 additions & 4 deletions src/hooks/mpi/MpiHook.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -129,6 +129,10 @@ void MpiHook::parseEnvironmentVariables() {
abiCompatibilityCheckerType = std::string(p);
}

if ((p = getenv("HOOK_ROOTLESS")) != nullptr) {
rootless = (boost::algorithm::to_upper_copy(std::string(p)) == std::string("TRUE"));
}

log("Successfully parsed environment variables", libsarus::LogLevel::INFO);
}

Expand Down Expand Up @@ -252,7 +256,7 @@ void MpiHook::injectHostLibrary(const SharedLibrary& hostLib,
if (it == hostToContainerLibs.cend()) {
log(boost::format{"no corresponding libs in container => bind mount (%s) into /lib"} % hostLib.getPath(), libsarus::LogLevel::DEBUG);
auto containerLib = "/lib" / hostLib.getPath().filename();
libsarus::mount::validatedBindMount(hostLib.getPath(), containerLib, userIdentity, rootfsDir);
libsarus::mount::validatedBindMount(hostLib.getPath(), containerLib, userIdentity, rootfsDir, 0, rootless);
createSymlinksInDynamicLinkerDefaultSearchDirs(containerLib, hostLib.getPath().filename(), false);
return;
}
Expand All @@ -265,14 +269,14 @@ void MpiHook::injectHostLibrary(const SharedLibrary& hostLib,
auto areCompatible{abiCompatibilityChecker->check(hostLib, bestCandidateLib)};
if(areCompatible.second == boost::none) {
log(boost::format{"abi-compatible => bind mount host lib (%s) on top of container lib (%s) (i.e. override)"} % hostLib.getPath() % bestCandidateLib.getPath(), libsarus::LogLevel::DEBUG);
libsarus::mount::validatedBindMount(hostLib.getPath(), bestCandidateLib.getPath(), userIdentity, rootfsDir);
libsarus::mount::validatedBindMount(hostLib.getPath(), bestCandidateLib.getPath(), userIdentity, rootfsDir, 0, rootless);
createSymlinksInDynamicLinkerDefaultSearchDirs(bestCandidateLib.getPath(), hostLib.getPath().filename(), containerHasLibsWithIncompatibleVersion);
log("Successfully injected host's shared lib", libsarus::LogLevel::DEBUG);
return;
}
log(areCompatible.second.get(), libsarus::LogLevel::INFO);
auto containerLib = "/lib" / hostLib.getPath().filename();
libsarus::mount::validatedBindMount(hostLib.getPath(), containerLib, userIdentity, rootfsDir);
libsarus::mount::validatedBindMount(hostLib.getPath(), containerLib, userIdentity, rootfsDir, 0, rootless);
if (areCompatible.first) {
createSymlinksInDynamicLinkerDefaultSearchDirs(containerLib, hostLib.getPath().filename(), containerHasLibsWithIncompatibleVersion);
} else {
Expand All @@ -296,7 +300,7 @@ void MpiHook::performBindMounts() const {
auto devicesCgroupPath = boost::filesystem::path{};

for(const auto& mount : bindMounts) {
libsarus::mount::validatedBindMount(mount, mount, userIdentity, rootfsDir);
libsarus::mount::validatedBindMount(mount, mount, userIdentity, rootfsDir, 0, rootless);

if (libsarus::filesystem::isDeviceFile(mount)) {
if (devicesCgroupPath.empty()) {
Expand Down
1 change: 1 addition & 0 deletions src/hooks/mpi/MpiHook.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,7 @@ class MpiHook {
void log(const boost::format& message, libsarus::LogLevel level) const;

private:
bool rootless = false;
libsarus::hook::ContainerState containerState;
boost::filesystem::path rootfsDir;
libsarus::UserIdentity userIdentity;
Expand Down
24 changes: 19 additions & 5 deletions src/libsarus/utility/mount.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -171,12 +171,22 @@ void validatedBindMount(const boost::filesystem::path& source,
const boost::filesystem::path& destination,
const UserIdentity& userIdentity,
const boost::filesystem::path& rootfsDir,
const unsigned long flags) {
const unsigned long flags,
const bool rootless) {
// This function assumes to be run as the root user, since it needs privileges to perform bind mounts.
// However, it is necessary to distinguish whether we are running within a fully privileged context
// (i.e. real root, e.g. within an suid program) or within a rootless/unprivileged context (i.e. under a user namespace).
// In the first case, we need to switch to the user identity to avoid complications due to root_squashed network
// filesystems; in the second case, identity switching is not necessary, since the kernel will always resolve
// to our true (unprivileged) identity in the topmost user namespace.
// To solve this somewhat elegantly, if we run as rootless, we set the target identity for the switches
// to the id which we already have. process::switchIdentity() is a no-op when attempting to switch to the same identity.
auto rootIdentity = UserIdentity{};
auto targetIdentity = rootless ? rootIdentity : userIdentity;

try {
// switch to user identity to make sure user has access to the mount source
process::switchIdentity(userIdentity);
process::switchIdentity(targetIdentity);
auto sourceReal = getValidatedMountSource(source);
auto destinationReal = getValidatedMountDestination(destination, rootfsDir);

Expand All @@ -199,10 +209,14 @@ void validatedBindMount(const boost::filesystem::path& source,
filesystem::createFileIfNecessary(destinationReal, userIdentity.uid, userIdentity.gid);
}

// switch to user filesystem identity to make sure we can access paths as root even on root_squashed filesystems
process::setFilesystemUid(userIdentity);
// If we are real root, switch to user filesystem identity (fsuid) to make sure we can access paths as root
// even on root_squashed filesystems.
// There is no dedicated way of retrieving the current fsuid, and calling setfsuid(-1) will immediately error out
// if we are rootless, because we have no CAP_SETUID.
// So in the end we have no way of telling whether we can no-op the fsuid switch, hence we have to use an explicit condition.
if (!rootless) process::setFilesystemUid(userIdentity);
bindMount(sourceReal, destinationReal, flags);
process::setFilesystemUid(rootIdentity);
if (!rootless) process::setFilesystemUid(rootIdentity);
}
catch(Error& e) {
// Restore root identity in case the exception happened while having a non-privileged id.
Expand Down
3 changes: 2 additions & 1 deletion src/libsarus/utility/mount.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,8 @@ void validatedBindMount(const boost::filesystem::path& source,
const boost::filesystem::path& destination,
const UserIdentity& userIdentity,
const boost::filesystem::path& rootfsDir,
const unsigned long flags=0);
const unsigned long flags=0,
const bool rootless=false);
void bindMount(const boost::filesystem::path& from, const boost::filesystem::path& to, unsigned long flags=0);
void loopMountSquashfs(const boost::filesystem::path& image, const boost::filesystem::path& mountPoint);
void mountOverlayfs(const boost::filesystem::path& lowerDir,
Expand Down
6 changes: 6 additions & 0 deletions src/libsarus/utility/process.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,12 @@ void switchIdentity(const libsarus::UserIdentity& identity) {
uid_t euid = geteuid();
uid_t egid = getegid();

if (euid == identity.uid && egid == identity.gid) {
logMessage(boost::format{"Switching to the same identity. Ignoring"},
LogLevel::DEBUG);
return;
}

if (euid == 0){
// unprivileged processes cannot call setgroups
if (setgroups(identity.supplementaryGids.size(), identity.supplementaryGids.data()) != 0) {
Expand Down

0 comments on commit 49cbcf2

Please sign in to comment.