Skip to content

Latest commit

 

History

History
1233 lines (923 loc) · 44.4 KB

configfiles.rst

File metadata and controls

1233 lines (923 loc) · 44.4 KB

{Project} Configuration Files

As {aProject} Administrator, you will have access to various configuration files, that will let you manage container resources, set security restrictions and configure network options etc, when installing {Project} across the system. All of these files can be found in /usr/local/etc/{command} by default for installations from source (though the location may differ based on options passed during the installation). For installations from RPM or Deb packages you will find the configuration files in /etc/{command}. This section will describe the configuration files and the various parameters contained by them.

{command}.conf

Most of the configuration options are set using the file {command}.conf that defines the global configuration for {Project} across the entire system. Using this file, system administrators can influence the behavior of {Project} and restrict the functionality that users can access. As a security measure, for setuid installations of {Project}, {command}.conf must be owned by root and must not be writable by users or {Project} will refuse to run. This is not the case for non-setuid installations that will only ever execute with user privilege and thus do not require such limitations.

The options set via {command}.conf are listed below. Options are grouped together based on relevance. The actual order of options within {command}.conf may differ.

Setuid and Capabilities

allow setuid: To use all features of {Project} containers, {Project} will need to have access to some privileged system calls. {Project} achieves this by using a helper binary with the setuid bit enabled. The allow-setuid option lets you enable/disable users ability to utilize these binaries within {Project}. By default, it is set to "yes", but that only makes a difference if the suid helper binary is installed, which is not the case by default (see the :ref:`Installation section <installation>`).

root default capabilities: {Project} allows the specification of capabilities kept by the root user when running a container in setuid (also known as SUID) mode. Options include:

  • full: all capabilities are maintained, this gives the same behavior as the --keep-privs option.
  • file: only capabilities granted for root in etc/{command}/capability.json are maintained.
  • no: no capabilities are maintained, this gives the same behavior as the --no-privs option.

Note

The root user can manage the capabilities granted to individual containers when they are launched through the --add-caps and drop-caps flags. Please see Linux Capabilities in the user guide for more information.

Loop Devices

{Project} in SUID mode uses loop devices to facilitate the mounting of container file systems from SIF and other images.

max loop devices: This option allows an admin to limit the total number of loop devices {Project} will consume at a given time.

shared loop devices: This allows containers running the same image to share a single loop device. This minimizes loop device usage and helps optimize kernel cache usage. Enabling this feature can be particularly useful for MPI jobs.

Namespace Options

allow pid ns: This option determines if users can leverage the PID namespace when running their containers through the --pid flag.

Note

Using the PID namespace can confuse the process tracking of some resource managers, as well as some MPI implementations.

Configuration Files

{Project} can automatically create or modify several system files within containers to ease usage.

Note

These options will have no effect if the file does not exist within the container and overlay and underlay support are not enabled.

config passwd: This option determines if {Project} should automatically append an entry to /etc/passwd for the user running the container.

config group: This option determines if {Project} should automatically append the calling user's group entries to the containers /etc/group.

config resolv_conf: This option determines if {Project} should automatically bind the host's /etc/resolv.conf within the container.

Session Directory and System Mounts

sessiondir max size: In order for the {Project} runtime to run a container it needs to create a temporary in-memory sessiondir as a location to assemble various components of the container, including mounting filesystems over the base image. In addition, this sessiondir will hold files that are written to the container when --writable-tmpfs is used, plus isolated temporary filesystems in --contain mode. The default value is 64MiB. If users commonly run containers with --writable-tmpfs or --contain, this value should be increased to accommodate their workflows. The tmpfs will grow to the specified maximum size, as required. Memory is not allocated ahead of usage.

mount proc: This option determines if {Project} should automatically bind mount /proc within the container.

mount sys: This option determines if {Project} should automatically bind mount /sys within the container.

mount dev: Should be set to "YES", if you want {Project} to automatically bind mount a complete /dev tree within the container. If set to minimal, then only /dev/null, /dev/zero, /dev/random, /dev/urandom, and /dev/shm will be included.

mount devpts: This option determines if {Project} will mount a new instance of devpts when there is a minimal /dev directory as explained above, or when the --contain option is passed.

Note

This requires either a kernel configured with CONFIG_DEVPTS_MULTIPLE_INSTANCES=y, or a kernel version at or newer than 4.7.

mount home: When this option is enabled, {Project} will automatically determine the calling user's home directory and attempt to mount it into the container.

mount tmp: When this option is enabled, {Project} will automatically bind mount /tmp and /var/tmp into the container from the host. If the --contain option is passed, {Project} will create both locations within the sessiondir or within the directory specified by the --workdir option if that is passed as well.

mount hostfs: This option will cause {Project} to probe the host for all mounted filesystems and bind those into containers at runtime.

mount slave: {Project} automatically mounts a handful host system directories to the container by default. This option determines if filesystem changes on the host should automatically be propagated to those directories in the container.

Note

This should be set to yes when autofs mounts occurring on the host system should be reflected up in the container.

memory fs type: This option allows admins to choose the temporary filesystem used by {Project}. Temporary filesystems are primarily used for system directories like /dev when the host system directory is not mounted within the container.

Note

For Cray CLE 5 and 6, up to CLE 6.0.UP05, there is an issue (kernel panic) when {Project} uses tmpfs, so on affected systems it's recommended to set this value to ramfs to avoid a kernel panic.

Bind Mount Management

bind path: This option is used to define a list of files or directories to automatically be made available when {Project} runs a container. In order to successfully mount listed paths the file or directory must exist within the container, or {Project} must be configured with either overlay or underlay support enabled.

Note

This option is ignored when containers are invoked with the --contain option.

You can define the a bind point where the source and destination are identical:

bind path = /etc/localtime

Or you can specify different source and destination locations using a colon:

bind path = /etc/{command}/default-nsswitch.conf:/etc/nsswitch.conf

user bind control: This allows admins to decide if users can define bind points at runtime. By Default, this option is set to YES, which means users can specify bind points, scratch and tmp locations.

Limiting Container Execution

By default {Project} allows all users on a system to execute any container, but there may be reasons that a system administrator desires to limit who can do that. The primary motivation of system administrators for this in the past was to prevent untrusted users from potentially attacking the kernel via setuid-mode mounting of containers using kernel drivers. However this is no longer the default behavior of {Project}; user namespace mode never uses kernel drivers, and setuid-mode by default does not use them if no container limits have been defined (see allow setuid-mount squashfs below). But there may be other reasons to limit execution, so {Project} provides configuration options for this purpose, described here and in the :ref:`Execution Control List <execution_control_list>` section below.

Note

The 'limit container' and 'allow container' directives are not effective if unprivileged user namespaces are enabled. They are only effectively applied when {Project} is running in setuid mode and unprivileged container execution is not possible on the host.

You must disable unprivileged user namespace creation on the host if you rely on the these directives to limit container execution. This will disable {Project}'s user namespace mode and most of its fakeroot modes.

There are several ways to limit container execution as an admin listed below. If stricter controls are required, check out the :ref:`Execution Control List <execution_control_list>`.

limit container owners: This restricts container execution to only allow containers that are owned by the specified user.

Note

This feature will only apply when {Project} is running in SUID mode and the user is non-root. By default this is set to NULL.

limit container groups: This restricts container execution to only allow containers that are owned by the specified group.

Note

This feature will only apply when {Project} is running in SUID mode and the user is non-root. By default this is set to NULL.

limit container paths: This restricts container execution to only allow containers that are located within the specified path prefix.

Note

This feature will only apply when {Project} is running in SUID mode and the user is non-root. By default this is set to NULL.

allow container ${type}: This set of options allows admins to limit the types of image formats that can be leveraged by users with {Project}, with the following types:

  • allow container sif permits / denies execution of unencrypted SIF containers.
  • allow container encrypted permits / denies execution of SIF containers with an encrypted root filesystem.
  • allow container squashfs permits / denies execution of bare SquashFS image files. E.g. Singularity 2.x images.
  • allow container extfs permits / denies execution of bare extfs image files.
  • allow container dir permits / denies execution of sandbox directory containers.

Note

These limitations do not apply to the root user.

allow setuid-mount ${type}: This set of options allows admins to limit the types of image formats that can be mounted using kernel drivers in SUID mode, with the following types:

  • allow setuid-mount encrypted permits/denies execution of encrypted SIF files in SUID mode using the kernel device-mapper. When set to no, gocryptfs FUSE-based encryption will be used instead, with the same format used in user namespace mode. This defaults to yes.
  • allow setuid-mount squashfs permits/denies execution of squashfs filesystems in SUID mode, both inside and outside of SIF files. When set to no, squashfuse_ll is used instead of the kernel squashfs driver. When set to iflimited, then if either a limit container option is used or the Execution Control List feature is activated, it will be treated as yes, and otherwise it will be treated as no. This defaults to iflimited.
  • allow setuid-mount extfs permits/denies execution of ext3 filesystems in SUID mode using the kernel ext4 driver, both inside and outside of SIF files. When set to no, fuse2fs will be used instead. For security reasons this defaults to no.

Networking Options

The --network option can be used to specify a CNI networking configuration that will be used when running a container with network virtualization. Unrestricted use of CNI network configurations requires root privilege, as certain configurations may disrupt the host networking environment.

{Project} allows specific users or groups to be granted the ability to run containers with administrator specified CNI configurations. These features only have an effect when {Project} is running in SUID mode and the user is non-root.

allow net users: Allow specified root administered CNI network configurations to be used by the specified list of users. By default only root may use CNI configuration, except in the case of a fakeroot execution where only 40_fakeroot.conflist is used.

allow net groups: Allow specified root administered CNI network configurations to be used by the specified list of users. By default only root may use CNI configuration, except in the case of a fakeroot execution where only 40_fakeroot.conflist is used.

allow net networks: Specify the names of CNI network configurations that may be used by users and groups listed in the allow net users / allow net groups directives.

GPU Options

{Project} provides integration with GPUs in order to facilitate GPU based workloads seamlessly. Both options listed below are particularly useful in GPU only environments. For more information on using GPUs with {Project} checkout :ref:`GPU Library Configuration <gpu_library_configuration>`.

always use nv: Enabling this option will cause every action command (exec/shell/run/instance) to be executed with the --nv option implicitly added.

always use rocm: Enabling this option will cause every action command (exec/shell/run/instance) to be executed with the --rocm option implicitly added.

Supplemental Filesystems

enable fusemount: This will allow users to mount fuse filesystems inside containers using the --fusemount flag.

enable overlay: This option will allow {Project} to create bind mounts at paths that do not exist within the container image. If set to yes (the default), the kernel overlay driver will be tried, but if it doesn't work then fuse-overlayfs will be used instead. A value of try is obsolete and is equivalent to yes. If set to driver, then fuse-overlayfs will always be used. If set to no, then no overlay will be used for missing bind mount paths, nor for any other purpose. Note that enable underlay = preferred below overrides this option.

enable underlay: This option will allow {Project} to create bind mounts at paths that do not currently exist within the container, without using any overlay feature. The underlay feature works by creating a scratch space made up of only bind mounts, either from the host or from the container image, and using that as the container's root filesystem. When set to yes (the default), then the underlay feature will be used either when the --underlay action option is given by the user or when the enable overlay option above is set to no. When set to preferred, then the underlay feature will always be used instead of the overlay feature for creating bind mount paths. When set to no, then the underlay feature will never be used. This option is deprecated and will be removed in a future release, because the implementation is complicated and the performance is similar to the kernel overlay driver and to fuse-overlayfs.

CNI Configuration and Plugins

cni configuration path: This option allows admins to specify a custom path for the CNI configuration that {Project} will use for Network Virtualization.

cni plugin path: This option allows admins to specify a custom path for {Project} to access CNI plugin executables. Check out the Network Virtualization section of the user guide for more information.

External Binaries

{Project} calls a number of external binaries for full functionality. They are found using the path defined by the binary path option in {command}.conf. If that option includes $PATH: (as it does by default) then that is replaced by the user's $PATH whenever it isn't some very basic system command or a command that can be run as root by the SUID flow.

Concurrent Downloads

{Project} will pull library:// container images using multiple concurrent downloads of parts of the image. This speeds up downloads vs using a single stream. The defaults are generally appropriate for Library API Registries, but may be tuned for your network conditions, or if you are pulling from a different library server.

download concurrency: specifies how many concurrent streams when downloading (pulling) an image from cloud library.

download part size: specifies the size of each part (bytes) when concurrent downloads are enabled.

download buffer size: specifies the transfer buffer size (bytes) when concurrent downloads are enabled.

Cgroups Options

systemd cgroups: specifies whether to use systemd to manage container cgroups. Required (with cgroups v2) for unprivileged users to apply resource limits on containers. If set to no, {Project} will directly manage cgroups via the cgroupfs.

Updating Configuration Options

In order to manage this configuration file, {Project} has a config global command group that allows you to get, set, reset, and unset values through the CLI. It's important to note that these commands must be run with elevated privileges because the {command}.conf can only be modified by an administrator.

Example

In this example we will changing the bind path option described above. First we can see the current list of bind paths set within our system configuration:

$ sudo {command} config global --get "bind path"
/etc/localtime,/etc/hosts

Now we can add a new path and verify it was successfully added:

$ sudo {command} config global --set "bind path" /etc/resolv.conf
$ sudo {command} config global --get "bind path"
/etc/resolv.conf,/etc/localtime,/etc/hosts

From here we can remove a path with:

$ sudo {command} config global --unset "bind path" /etc/localtime
$ sudo {command} config global --get "bind path"
/etc/resolv.conf,/etc/hosts

If we want to reset the option to the default at installation, then we can reset it with:

$ sudo {command} config global --reset "bind path"
$ sudo {command} config global --get "bind path"
/etc/localtime,/etc/hosts

And now we are back to our original option settings. You can also test what a change would look like by using the --dry-run option in conjunction with the above commands. Instead of writing to the configuration file, it will output what would have been written to the configuration file if the command had been run without the --dry-run option:

$ sudo {command} config global --dry-run --set "bind path" /etc/resolv.conf
# {ENVPREFIX}.CONF
# This is the global configuration file for {Project}. This file controls
[...]
# BIND PATH: [STRING]
# DEFAULT: Undefined
# Define a list of files/directories that should be made available from within
# the container. The file or directory must exist within the container on
# which to attach to. you can specify a different source and destination
# path (respectively) with a colon; otherwise source and dest are the same.
# NOTE: these are ignored if {command} is invoked with --contain.
bind path = /etc/resolv.conf
bind path = /etc/localtime
bind path = /etc/hosts
[...]
$ sudo {command} config global --get "bind path"
/etc/localtime,/etc/hosts

Above we can see that /etc/resolv.conf is listed as a bind path in the output of the --dry-run command, but did not affect the actual bind paths of the system.

cgroups.toml

The cgroups (control groups) functionality of the Linux kernel allows you to limit and meter the resources used by a process, or group of processes. Using cgroups you can limit memory and CPU usage. You can also rate limit block IO, network IO, and control access to device nodes.

There are two versions of cgroups in common use. Cgroups v1 sets resource limits for a process within separate hierarchies per resource class. Cgroups v2, the default in newer Linux distributions, implements a unified hierarchy, simplifying the structure of resource limits on processes.

{Project} can apply resource limitations to systems configured for both cgroups v1 and the v2 unified hierarchy. Resource limits are specified using a TOML file that represents the resources section of the OCI runtime-spec: https://github.com/opencontainers/runtime-spec/blob/master/config-linux.md#control-groups

On a cgroups v1 system the resources configuration is applied directly. On a cgroups v2 system the configuration is translated and applied to the unified hierarchy.

Under cgroups v1, access restrictions for device nodes are managed directly. Under cgroups v2, the restrictions are applied by attaching eBPF programs that implement the requested access controls.

Examples

To apply resource limits to a container, use the --apply-cgroups flag, which takes a path to a TOML file specifying the cgroups configuration to be applied:

$ {command} shell --apply-cgroups /path/to/cgroups.toml my_container.sif

Note

The --apply-cgroups option requires cgroups v2 to be used without root privileges.

Limiting memory

To limit the amount of memory that your container uses to 500MB (524288000 bytes), set a limit value inside the [memory] section of your cgroups TOML file:

[memory]
    limit = 524288000

Start your container, applying the toml file, e.g.:

$ {command} run --apply-cgroups path/to/cgroups.toml library://alpine

Limiting CPU

CPU usage can be limited using different strategies, with limits specified in the [cpu] section of the TOML file.

shares

This corresponds to a ratio versus other cgroups with cpu shares. Usually the default value is 1024. That means if you want to allow to use 50% of a single CPU, you will set 512 as value.

[cpu]
    shares = 512

A cgroup can get more than its share of CPU if there are enough idle CPU cycles available in the system, due to the work conserving nature of the scheduler, so a contained process can consume all CPU cycles even with a ratio of 50%. The ratio is only applied when two or more processes conflicts with their needs of CPU cycles.

quota/period

You can enforce hard limits on the CPU cycles a cgroup can consume, so contained processes can't use more than the amount of CPU time set for the cgroup. quota allows you to configure the amount of CPU time that a cgroup can use per period. The default is 100ms (100000us). So if you want to limit amount of CPU time to 20ms during period of 100ms:

[cpu]
    period = 100000
    quota = 20000

cpus/mems

You can also restrict access to specific CPUs (cores) and associated memory nodes by using cpus/mems fields:

[cpu]
    cpus = "0-1"
    mems = "0-1"

Where container has limited access to CPU 0 and CPU 1.

Note

It's important to set identical values for both cpus and mems.

Limiting IO

To control block device I/O, applying limits to competing container, use the [blockIO] section of the TOML file:

[blockIO]
    weight = 1000
    leafWeight = 1000

weight and leafWeight accept values between 10 and 1000.

weight is the default weight of the group on all the devices until and unless overridden by a per device rule.

leafWeight relates to weight for the purpose of deciding how heavily to weigh tasks in the given cgroup while competing with the cgroup's child cgroups.

To apply limits to specific block devices, you must set configuration for specific device major/minor numbers. For example, to override weight/leafWeight for /dev/loop0 and /dev/loop1 block devices, set limits for device major 7, minor 0 and 1:

[blockIO]
    [[blockIO.weightDevice]]
        major = 7
        minor = 0
        weight = 100
        leafWeight = 50
    [[blockIO.weightDevice]]
        major = 7
        minor = 1
        weight = 100
        leafWeight = 50

You can also limit the IO read/write rate to a specific absolute value, e.g. 16MB per second for the /dev/loop0 block device. The rate is specified in bytes per second.

[blockIO]
    [[blockIO.throttleReadBpsDevice]]
        major = 7
        minor = 0
        rate = 16777216
    [[blockIO.throttleWriteBpsDevice]]
        major = 7
        minor = 0
        rate = 16777216

Other limits

{Project} can apply all resource limits that are valid in the OCI runtime-spec resources section, including unified cgroups v2 constraints. It is most compatible, however, to use the cgroups v1 limits, which will be translated to v2 format when applied on a cgroups v2 system.

See https://github.com/opencontainers/runtime-spec/blob/master/config-linux.md#control-groups for information about the available limits. Note that {Project} uses TOML format for the configuration file, rather than JSON.

ecl.toml

The execution control list that can be used to restrict the execution of SIF files by signing key is defined here. You can authorize the containers by validating both the location of the SIF file in the filesystem and by checking against a list of signing entities.

Note

The ECL is not effective if unprivileged user namespaces are enabled. It is only effectively applied when {Project} is running in setuid mode, and unprivileged container execution is not possible on the host.

You must disable unprivileged user namespace creation on the host if you rely on the ECL limit container execution. This will disable {Project}'s user namespace mode and most of its fakeroot modes.

Warning

The ECL configuration applies to SIF container images only. To lock down execution fully you should disable execution of other container types (squashfs/extfs/dir) via the {command}.conf file allow container settings.

[[execgroup]]
  tagname = "group2"
  mode = "whitelist"
  dirpath = "/tmp/containers"
  keyfp = ["7064B1D6EFF01B1262FED3F03581D99FE87EAFD1"]

Only the containers running from and signed with above-mentioned path and keys will be authorized to run.

Three possible list modes you can choose from:

Whitestrict: The SIF must be signed by all of the keys mentioned.

Whitelist: As long as the SIF is signed by one or more of the keys, the container is allowed to run.

Blacklist: Only the containers whose keys are not mentioned in the group are allowed to run.

Note

Containers signed by Singularity versions older than 3.6.0 will not pass ECL checks because they are insecure.

To temporarily permit the use of legacy insecure signatures, set legacyinsecure = true in ecl.toml.

Managing ECL public keys

{Project} uses a global keyring for ECL signature verification. This keyring can be administered using the --global flag for the following commands:

  • {command} key import (root user only)
  • {command} key pull (root user only)
  • {command} key remove (root user only)
  • {command} key export
  • {command} key list

Note

For security reasons, it is not possible to import private keys into this global keyring because it must be accessible by users and is stored in the file SYSCONFDIR/{command}/global-pgp-public.

GPU Library Configuration

When a container includes a GPU enabled application, {Project} (with the --nv or --rocm options) can properly inject the required Nvidia or AMD GPU driver libraries into the container, to match the host's kernel. The GPU /dev entries are provided in containers run with --nv or --rocm even if the --contain option is used to restrict the in-container device tree.

Compatibility between containerized CUDA/ROCm/OpenCL applications and host drivers/libraries is dependent on the versions of the GPU compute frameworks that were used to build the applications. Compatibility and usage information is discussed in the GPU Support section of the user guide.

NVIDIA GPUs / CUDA

The nvliblist.conf configuration file is used to specify libraries and executables that need to be injected into the container when running {Project} with the --nv Nvidia GPU support option. The provided nvliblist.conf is suitable for CUDA 11, but may need to be modified if you need to include additional libraries, or further libraries are added to newer versions of the Nvidia driver/CUDA distribution.

When adding new entries to nvliblist.conf use the bare filename of executables, and the xxxx.so form of libraries. Libraries are resolved via ldconfig -p, and exectuables are found by searching $PATH.

Experimental nvidia-container-cli Support

The nvidia-container-cli tool is Nvidia's officially support method for configuring containers to use a GPU. It is targeted at OCI container runtimes.

{Project} has an experimental --nvccli option, which will call out to nvidia-container-cli for container GPU setup, rather than use the nvliblist.conf approach.

For security reasons, nvidia-container-cli cannot be used with privileged mode in a SUID installation of {Project}, it can only be used unprivileged. The operations performed by nvidia-container-cli are broadly similar to those which {Project} carries out when setting up a GPU container from nvliblist.conf.

AMD Radeon GPUs / ROCm

The rocmliblist.conf file is used to specify libraries and executables that need to be injected into the container when running {Project} with the --rocm Radeon GPU support option. The provided rocmliblist.conf is suitable for ROCm 4.0, but may need to modified if you need to include additional libraries, or further libraries are added to newer versions of the ROCm distribution.

When adding new entries to rocmlist.conf use the bare filename of executables, and the xxxx.so form of libraries. Libraries are resolved via ldconfig -p, and exectuables are found by searching $PATH.

GPU liblist format

The nvliblist.conf and rocmliblist files list the basename of executables and libraries to be bound into the container, without path information.

Binaries are found by searching $PATH:

# put binaries here
# In shared environments you should ensure that permissions on these files
# exclude writing by non-privileged users.
rocm-smi
rocminfo

Libraries should be specified without version information, i.e. libname.so, and are resolved using ldconfig.

# put libs here (must end in .so)
libamd_comgr.so
libcomgr.so
libCXLActivityLogger.so

If you receive warnings that binaries or libraries are not found, ensure that they are in a system path (binaries), or available in paths configured in /etc/ld.so.conf (libraries).

capability.json

Warning

It is extremely important to recognize that granting users Linux capabilities with the capability command group is usually identical to granting those users root level access on the host system. Most if not all capabilities will allow users to "break out" of the container and become root on the host. This feature is targeted toward special use cases (like cloud-native architectures) where an admin/developer might want to limit the attack surface within a container that normally runs as root. This is not a good option in multi-tenant HPC environments where an admin wants to grant a user special privileges within a container. For that and similar use cases, the :ref:`fakeroot feature <fakeroot>` is a better option.

{Project} in SUID mode provides full support for admins to grant and revoke Linux capabilities on a user or group basis. The capability.json file is maintained by {Project} in order to manage these capabilities. The capability command group allows you to add, drop, and list capabilities for users and groups.

For example, let us suppose that we have decided to grant a user (named pinger) capabilities to open raw sockets so that they can use ping in a container where the binary is controlled via capabilities.

To do so, we would issue a command such as this:

$ sudo {command} capability add --user pinger CAP_NET_RAW

This means the user pinger has just been granted permissions (through Linux capabilities) to open raw sockets within {Project} containers.

We can check that this change is in effect with the capability list command.

$ sudo {command} capability list --user pinger
CAP_NET_RAW

To take advantage of this new capability, the user pinger must also request the capability when executing a container with the --add-caps flag. pinger would need to run a command like this:

$ {command} exec --add-caps CAP_NET_RAW \
  library://sylabs/tests/ubuntu_ping:v1.0 ping -c 1 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=52 time=73.1 ms

--- 8.8.8.8 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 73.178/73.178/73.178/0.000 ms

If we decide that it is no longer necessary to allow the user pinger to open raw sockets within {Project} containers, we can revoke the appropriate Linux capability like so:

$ sudo {command} capability drop --user pinger CAP_NET_RAW

Now if pinger tries to use CAP_NET_RAW, {Project} will not give the capability to the container and ping will fail to create a socket:

$ {command} exec --add-caps CAP_NET_RAW \
  library://sylabs/tests/ubuntu_ping:v1.0 ping -c 1 8.8.8.8
WARNING: not authorized to add capability: CAP_NET_RAW
ping: socket: Operation not permitted

The capability add and drop subcommands will also accept the case insensitive keyword all to grant or revoke all Linux capabilities to a user or group.

For more information about individual Linux capabilities check out the man pages or use the capability avail command to output available capabilities with a description of their behaviors.

seccomp-profiles

Secure Computing (seccomp) Mode is a feature of the Linux kernel that allows an administrator to filter system calls being made from a container. Profiles made up of allowed and restricted calls can be passed to different containers. Seccomp provides more control than capabilities alone, giving a smaller attack surface for an attacker to work from within a container.

You can set the default action with defaultAction for a non-listed system call. Example: SCMP_ACT_ALLOW filter will allow all the system calls if it matches the filter rule and you can set it to SCMP_ACT_ERRNO which will have the thread receive a return value of errno if it calls a system call that matches the filter rule. The file is formatted in a way that it can take a list of additional system calls for different architecture and {Project} will automatically take syscalls related to the current architecture where it's been executed. The include/exclude-> caps section will include/exclude the listed system calls if the user has the associated capability.

Use the --security option to invoke the container like:

$ sudo {command} shell --security seccomp:/home/david/my.json my_container.sif

For more insight into security options, network options, cgroups, capabilities, etc, please check the Userdocs and it's Appendix.

remote.yaml

System-wide remote endpoints are defined in a configuration file typically located at /usr/local/etc/{command}/remote.yaml (this location may vary depending on installation parameters) and can be managed by administrators with the remote command group.

Remote Endpoints

{Project} allows users to login to an account and authenticate with a Library API Registry. Whether that registry exists on premise or in the cloud.

Note

A fresh installation of {Project} is configured with the DefaultRemote, which does not support the Library API as it is only configured with a functioning key server, https://keys.openpgp.org. Users or administrators should configure one of the Library API implementations listed here if they would like to use a Library API registry.

Examples

Use the remote command group with the --global flag to create a system-wide remote endpoint:

$ sudo {command} remote add --global company-remote https://enterprise.example.com
INFO:    Remote "company-remote" added.
INFO:    Global option detected. Will not automatically log into remote.

Conversely, to remove a system-wide endpoint:

$ sudo {command} remote remove --global company-remote
INFO:    Remote "company-remote" removed.

Note

Once users log in to a system wide endpoint, a copy of the endpoint will be listed in a their ~/.{command}/remote.yaml file. This means modifications or removal of the system-wide endpoint will not be reflected in the users configuration unless they remove the endpoint themselves.

Exclusive Endpoint

{Project} has the ability for an administrator to make a remote the only usable remote for the system by using the --exclusive flag:

$ sudo {command} remote use --exclusive company-remote
INFO:    Remote "company-remote" now in use.
$ {command} remote list
Cloud Services Endpoints
========================

NAME            URI                     ACTIVE  GLOBAL  EXCLUSIVE  INSECURE
DefaultRemote   cloud.apptainer.org     NO      YES     NO         NO
company-remote  enterprise.example.com  YES     YES     YES        NO
myremote        enterprise.example.com  NO      NO      NO         NO

Keyservers
==========

URI                       GLOBAL  INSECURE  ORDER
https://keys.example.com  YES     NO        1*

* Active cloud services keyserver

Insecure (HTTP) Endpoints

If you are using a endpoint that exposes its service discovery file over an insecure HTTP connection only, it can be added by specifying the --insecure flag:

$ sudo {command} remote add --global --insecure test http://test.example.com
INFO:    Remote "test" added.
INFO:    Global option detected. Will not automatically log into remote.

This flag controls HTTP vs HTTPS for service discovery only. The protocol used to access individual library, build and keyserver URLs is set by the service discovery file.

Restoring pre-{Project} library behavior

{Project}'s default remote endpoint configures only a public key server, it does not support the library:// protocol. Formerly the default was set to point to Sylabs servers, but the read/write support of the oras:// protocol by for example the GitHub Container Registry makes it unnecessary. The remote endpoint was also formerly used for builds using the build --remote option, but {Project} does not support that. Instead, it supports unprivileged local builds.

If you would still like to have the previous default for all users, these are the commands to restore the library behavior from before {Project}, where using the library:// URI would download from the Sylabs Cloud anonymously:

$ sudo {command} remote add --global SylabsCloud cloud.sycloud.io
INFO:    Remote "SylabsCloud" added.
INFO:    Global option detected. Will not automatically log into remote.
$ sudo {command} remote use --global SylabsCloud
INFO:    Remote "SylabsCloud" now in use.
$ {command} remote list
Cloud Services Endpoints
========================

NAME           URI                  ACTIVE  GLOBAL  EXCLUSIVE
DefaultRemote  cloud.apptainer.org  NO      YES     NO
SylabsCloud    cloud.sycloud.io     YES     YES     NO

Keyservers
==========

URI                                 GLOBAL  INSECURE  ORDER
https://keys.production.sycloud.io  YES     NO        1*

* Active cloud services keyserver

Additional Information

For more details on the remote command group and managing remote endpoints, please check the Remote Userdocs.

Keyserver Configuration

By default, {Project} will use the keyserver correlated to the active cloud service endpoint. This behavior can be changed or supplemented via the add-keyserver and remove-keyserver commands. These commands allow an administrator to create a global list of key servers used to verify container signatures by default.

For more details on the remote command group and managing keyservers, please check the Remote Userdocs.

dmtcp-conf.yaml

The dmtcp-conf.yaml is a YAML configuration file that is used to specify the libraries and executables that need to be injected into the container, without path information. This configuration is used when running {Project} instances with application checkpointing ( e.g. --dmctp-launch, dmtcp-restart).

Note

This feature is marked as experimental to allow flexibility as community feedback may warrant breaking changes to improve the overall usability for this feature set as it matures.

Binaries are specified as an array under the bins key and are found by searching $PATH:

# List binaries to bind into the container here
# In shared environments you should ensure that permissions on these files
# exclude writing by non-privileged users.
bins:
  - "dmtcp_command"
  - "dmtcp_discover_rm"
  - "dmtcp_launch"
[...]

Libraries are specified as an array under the libs key and should be specified without version information, i.e. libname.so, and are resolved using ldconfig.

# List libraries to bind into the container here. Library names must end in ".so"
libs:
  - "libdmtcp_alloc.so"
  - "libdmtcp_dl.so"
  - "libdmtcp_modify-env.so"
[...]

If you receive warnings that binaries or libraries are not found, ensure that they are in a system path (binaries), or available in paths configured in /etc/ld.so.conf (libraries).

For more details on the checkpointing features in {Project}, please check the Checkpoint Userdocs.