As {aProject} Administrator, you will have access to various
configuration files, that will let you manage container resources, set
security restrictions and configure network options etc, when
installing {Project} across the system. All of these files can be
found in /usr/local/etc/{command}
by default for installations
from source (though the location may differ based on options passed
during the installation). For installations from RPM or Deb packages
you will find the configuration files in /etc/{command}
. This
section will describe the configuration files and the various
parameters contained by them.
Most of the configuration options are set using the file
{command}.conf
that defines the global configuration for
{Project} across the entire system. Using this file, system
administrators can influence the behavior of {Project} and
restrict the functionality that users can access. As a security
measure, for setuid
installations of {Project},
{command}.conf
must be owned by root and must not be writable by
users or {Project} will refuse to run. This is not the case for
non-setuid
installations that will only ever execute with user
privilege and thus do not require such limitations.
The options set via {command}.conf
are listed below. Options are
grouped together based on relevance. The actual order of options within
{command}.conf
may differ.
allow setuid
: To use all features of {Project} containers,
{Project} will need to have access to some privileged system calls.
{Project} achieves this by using a helper binary with the setuid
bit enabled. The allow-setuid
option lets you enable/disable users
ability to utilize these binaries within {Project}. By default, it
is set to "yes", but that only makes a difference if the suid helper
binary is installed, which is not the case by default (see the
:ref:`Installation section <installation>`).
root default capabilities
: {Project} allows the specification of
capabilities kept by the root user when running a container in
setuid (also known as SUID) mode.
Options include:
- full: all capabilities are maintained, this gives the same behavior
as the
--keep-privs
option. - file: only capabilities granted for root in
etc/{command}/capability.json
are maintained. - no: no capabilities are maintained, this gives the same behavior as
the
--no-privs
option.
Note
The root user can manage the capabilities granted to individual
containers when they are launched through the --add-caps
and
drop-caps
flags. Please see Linux Capabilities
in the user guide for more information.
{Project} in SUID mode uses loop devices to facilitate the mounting of container file systems from SIF and other images.
max loop devices
: This option allows an admin to limit the total
number of loop devices {Project} will consume at a given time.
shared loop devices
: This allows containers running the same image
to share a single loop device. This minimizes loop device usage and
helps optimize kernel cache usage. Enabling this feature can be
particularly useful for MPI jobs.
allow pid ns
: This option determines if users can leverage the PID
namespace when running their containers through the --pid
flag.
Note
Using the PID namespace can confuse the process tracking of some resource managers, as well as some MPI implementations.
{Project} can automatically create or modify several system files within containers to ease usage.
Note
These options will have no effect if the file does not exist within the container and overlay and underlay support are not enabled.
config passwd
: This option determines if {Project} should
automatically append an entry to /etc/passwd
for the user running
the container.
config group
: This option determines if {Project} should
automatically append the calling user's group entries to the containers
/etc/group
.
config resolv_conf
: This option determines if {Project} should
automatically bind the host's /etc/resolv.conf
within the container.
sessiondir max size
: In order for the {Project} runtime to run
a container it needs to create a temporary in-memory sessiondir
as
a location to assemble various components of the container, including
mounting filesystems over the base image. In addition, this sessiondir
will hold files that are written to the container when
--writable-tmpfs
is used, plus isolated temporary filesystems in
--contain
mode. The default value is
64MiB. If users commonly run containers with --writable-tmpfs
or
--contain
, this value should be increased to
accommodate their workflows. The tmpfs will grow to the specified
maximum size, as required. Memory is not allocated ahead of usage.
mount proc
: This option determines if {Project} should
automatically bind mount /proc
within the container.
mount sys
: This option determines if {Project} should
automatically bind mount /sys
within the container.
mount dev
: Should be set to "YES", if you want {Project} to
automatically bind mount a complete /dev
tree within the container.
If set to minimal
, then only /dev/null
, /dev/zero
,
/dev/random
, /dev/urandom
, and /dev/shm
will be included.
mount devpts
: This option determines if {Project} will mount a
new instance of devpts
when there is a minimal
/dev
directory as explained above, or when the --contain
option is
passed.
Note
This requires either a kernel configured with
CONFIG_DEVPTS_MULTIPLE_INSTANCES=y
, or a kernel version at or
newer than 4.7
.
mount home
: When this option is enabled, {Project} will
automatically determine the calling user's home directory and attempt to
mount it into the container.
mount tmp
: When this option is enabled, {Project} will
automatically bind mount /tmp
and /var/tmp
into the container
from the host. If the --contain
option is passed, {Project} will
create both locations within the sessiondir
or within the directory
specified by the --workdir
option if that is passed as well.
mount hostfs
: This option will cause {Project} to probe the host
for all mounted filesystems and bind those into containers at runtime.
mount slave
: {Project} automatically mounts a handful host
system directories to the container by default. This option determines
if filesystem changes on the host should automatically be propagated to
those directories in the container.
Note
This should be set to yes
when autofs mounts occurring on the host
system should be reflected up in the container.
memory fs type
: This option allows admins to choose the temporary
filesystem used by {Project}. Temporary filesystems are primarily
used for system directories like /dev
when the host system directory
is not mounted within the container.
Note
For Cray CLE 5 and 6, up to CLE 6.0.UP05, there is an issue (kernel
panic) when {Project} uses tmpfs, so on affected systems it's
recommended to set this value to ramfs
to avoid a kernel panic.
bind path
: This option is used to define a list of files or
directories to automatically be made available when {Project} runs a
container. In order to successfully mount listed paths the file or
directory must exist within the container, or {Project} must be
configured with either overlay or underlay support enabled.
Note
This option is ignored when containers are invoked with the
--contain
option.
You can define the a bind point where the source and destination are identical:
bind path = /etc/localtime
Or you can specify different source and destination locations using a colon:
bind path = /etc/{command}/default-nsswitch.conf:/etc/nsswitch.conf
user bind control
: This allows admins to decide if users can define
bind points at runtime. By Default, this option is set to YES
, which
means users can specify bind points, scratch and tmp locations.
By default {Project} allows all users on a system to execute any container,
but there may be reasons that a system administrator desires to limit who
can do that.
The primary motivation of system administrators for this in the past was
to prevent untrusted users from potentially attacking the kernel via
setuid-mode mounting of containers using kernel drivers.
However this is no longer the default behavior of {Project};
user namespace mode never uses kernel drivers, and setuid-mode by
default does not use them if no container limits have been defined
(see allow setuid-mount squashfs
below).
But there may be other reasons to limit execution, so {Project} provides
configuration options for this purpose, described here and in the
:ref:`Execution Control List <execution_control_list>` section below.
Note
The 'limit container' and 'allow container' directives are not effective if unprivileged user namespaces are enabled. They are only effectively applied when {Project} is running in setuid mode and unprivileged container execution is not possible on the host.
You must disable unprivileged user namespace creation on the host if you rely on the these directives to limit container execution. This will disable {Project}'s user namespace mode and most of its fakeroot modes.
There are several ways to limit container execution as an admin listed below. If stricter controls are required, check out the :ref:`Execution Control List <execution_control_list>`.
limit container owners
: This restricts container execution to only
allow containers that are owned by the specified user.
Note
This feature will only apply when {Project} is running in SUID
mode and the user is non-root. By default this is set to NULL
.
limit container groups
: This restricts container execution to only
allow containers that are owned by the specified group.
Note
This feature will only apply when {Project} is running in SUID
mode and the user is non-root. By default this is set to NULL
.
limit container paths
: This restricts container execution to only
allow containers that are located within the specified path prefix.
Note
This feature will only apply when {Project} is running in SUID
mode and the user is non-root. By default this is set to NULL
.
allow container ${type}
: This set of options allows admins to limit the
types of image formats that can be leveraged by users with
{Project}, with the following types:
allow container sif
permits / denies execution of unencrypted SIF containers.allow container encrypted
permits / denies execution of SIF containers with an encrypted root filesystem.allow container squashfs
permits / denies execution of bare SquashFS image files. E.g. Singularity 2.x images.allow container extfs
permits / denies execution of bare extfs image files.allow container dir
permits / denies execution of sandbox directory containers.
Note
These limitations do not apply to the root user.
allow setuid-mount ${type}
: This set of options allows admins to limit the
types of image formats that can be mounted using kernel drivers in SUID
mode, with the following types:
allow setuid-mount encrypted
permits/denies execution of encrypted SIF files in SUID mode using the kernel device-mapper. When set tono
, gocryptfs FUSE-based encryption will be used instead, with the same format used in user namespace mode. This defaults toyes
.allow setuid-mount squashfs
permits/denies execution of squashfs filesystems in SUID mode, both inside and outside of SIF files. When set tono
, squashfuse_ll is used instead of the kernel squashfs driver. When set toiflimited
, then if either alimit container
option is used or the Execution Control List feature is activated, it will be treated asyes
, and otherwise it will be treated asno
. This defaults toiflimited
.allow setuid-mount extfs
permits/denies execution of ext3 filesystems in SUID mode using the kernel ext4 driver, both inside and outside of SIF files. When set tono
, fuse2fs will be used instead. For security reasons this defaults tono
.
The --network
option can be used to specify a CNI networking
configuration that will be used when running a container with network
virtualization.
Unrestricted use of CNI network configurations requires root privilege,
as certain configurations may disrupt the host networking environment.
{Project} allows specific users or groups to be granted the ability to run containers with administrator specified CNI configurations. These features only have an effect when {Project} is running in SUID mode and the user is non-root.
allow net users
: Allow specified root administered CNI network
configurations to be used by the specified list of users. By default
only root may use CNI configuration, except in the case of a fakeroot
execution where only 40_fakeroot.conflist is used.
allow net groups
: Allow specified root administered CNI network
configurations to be used by the specified list of users. By default
only root may use CNI configuration, except in the case of a fakeroot
execution where only 40_fakeroot.conflist is used.
allow net networks
: Specify the names of CNI network configurations
that may be used by users and groups listed in the allow net users /
allow net groups directives.
{Project} provides integration with GPUs in order to facilitate GPU based workloads seamlessly. Both options listed below are particularly useful in GPU only environments. For more information on using GPUs with {Project} checkout :ref:`GPU Library Configuration <gpu_library_configuration>`.
always use nv
: Enabling this option will cause every action command
(exec/shell/run/instance
) to be executed with the --nv
option
implicitly added.
always use rocm
: Enabling this option will cause every action
command (exec/shell/run/instance
) to be executed with the --rocm
option implicitly added.
enable fusemount
: This will allow users to mount fuse filesystems
inside containers using the --fusemount
flag.
enable overlay
: This option will allow {Project} to create bind
mounts at paths that do not exist within the container image.
If set to yes
(the default), the kernel overlay driver will be tried,
but if it doesn't work then fuse-overlayfs
will be used instead.
A value of try
is obsolete and is equivalent to yes
.
If set to driver
, then fuse-overlayfs
will always be used.
If set to no
, then no overlay will be used for missing bind
mount paths, nor for any other purpose.
Note that enable underlay = preferred
below overrides this option.
enable underlay
: This option will allow {Project} to create bind
mounts at paths that do not currently exist within the container,
without using any overlay feature.
The underlay feature works by creating a scratch space made up of only
bind mounts, either from the host or from the container image,
and using that as the container's root filesystem.
When set to yes
(the default), then the underlay feature will be used
either when the --underlay
action option is given by the user or when
the enable overlay
option above is set to no
.
When set to preferred
, then the underlay feature will always be used
instead of the overlay feature for creating bind mount paths.
When set to no
, then the underlay feature will never be used.
This option is deprecated and will be removed in a future release,
because the implementation is complicated and the performance is
similar to the kernel overlay driver and to fuse-overlayfs.
cni configuration path
: This option allows admins to specify a
custom path for the CNI configuration that {Project} will use for
Network Virtualization.
cni plugin path
: This option allows admins to specify a custom path
for {Project} to access CNI plugin executables. Check out the
Network Virtualization
section of the user guide for more information.
{Project} calls a number of external binaries for full functionality.
They are found using the path defined by the binary path
option in
{command}.conf
.
If that option includes $PATH:
(as it does by default) then
that is replaced by the user's $PATH
whenever it isn't some
very basic system command or a command that can be run as root
by the SUID flow.
{Project} will pull library://
container images
using multiple concurrent downloads of parts of the image. This speeds
up downloads vs using a single stream. The defaults are generally
appropriate for Library API Registries,
but may be tuned for your network conditions, or if you are pulling
from a different library server.
download concurrency
: specifies how many concurrent streams when
downloading (pulling) an image from cloud library.
download part size
: specifies the size of each part (bytes) when
concurrent downloads are enabled.
download buffer size
: specifies the transfer buffer size (bytes)
when concurrent downloads are enabled.
systemd cgroups
: specifies whether to use systemd to manage container
cgroups. Required (with cgroups v2) for unprivileged users to apply resource
limits on containers. If set to no
, {Project} will directly manage
cgroups via the cgroupfs.
In order to manage this configuration file, {Project} has a config
global
command group that allows you to get, set, reset, and unset
values through the CLI. It's important to note that these commands must
be run with elevated privileges because the {command}.conf
can
only be modified by an administrator.
In this example we will changing the bind path
option described
above. First we can see the current list of bind paths set within our
system configuration:
$ sudo {command} config global --get "bind path" /etc/localtime,/etc/hosts
Now we can add a new path and verify it was successfully added:
$ sudo {command} config global --set "bind path" /etc/resolv.conf $ sudo {command} config global --get "bind path" /etc/resolv.conf,/etc/localtime,/etc/hosts
From here we can remove a path with:
$ sudo {command} config global --unset "bind path" /etc/localtime $ sudo {command} config global --get "bind path" /etc/resolv.conf,/etc/hosts
If we want to reset the option to the default at installation, then we can reset it with:
$ sudo {command} config global --reset "bind path" $ sudo {command} config global --get "bind path" /etc/localtime,/etc/hosts
And now we are back to our original option settings. You can also test
what a change would look like by using the --dry-run
option in
conjunction with the above commands. Instead of writing to the
configuration file, it will output what would have been written to the
configuration file if the command had been run without the --dry-run
option:
$ sudo {command} config global --dry-run --set "bind path" /etc/resolv.conf # {ENVPREFIX}.CONF # This is the global configuration file for {Project}. This file controls [...] # BIND PATH: [STRING] # DEFAULT: Undefined # Define a list of files/directories that should be made available from within # the container. The file or directory must exist within the container on # which to attach to. you can specify a different source and destination # path (respectively) with a colon; otherwise source and dest are the same. # NOTE: these are ignored if {command} is invoked with --contain. bind path = /etc/resolv.conf bind path = /etc/localtime bind path = /etc/hosts [...] $ sudo {command} config global --get "bind path" /etc/localtime,/etc/hosts
Above we can see that /etc/resolv.conf
is listed as a bind path in
the output of the --dry-run
command, but did not affect the actual
bind paths of the system.
The cgroups (control groups) functionality of the Linux kernel allows you to limit and meter the resources used by a process, or group of processes. Using cgroups you can limit memory and CPU usage. You can also rate limit block IO, network IO, and control access to device nodes.
There are two versions of cgroups in common use. Cgroups v1 sets resource limits for a process within separate hierarchies per resource class. Cgroups v2, the default in newer Linux distributions, implements a unified hierarchy, simplifying the structure of resource limits on processes.
- v1 documentation: https://www.kernel.org/doc/Documentation/cgroup-v1/cgroups.txt
- v2 documentation: https://www.kernel.org/doc/Documentation/cgroup-v2.txt
{Project} can apply resource limitations to systems
configured for both cgroups v1 and the v2 unified hierarchy. Resource
limits are specified using a TOML file that represents the resources
section of the OCI runtime-spec:
https://github.com/opencontainers/runtime-spec/blob/master/config-linux.md#control-groups
On a cgroups v1 system the resources configuration is applied directly. On a cgroups v2 system the configuration is translated and applied to the unified hierarchy.
Under cgroups v1, access restrictions for device nodes are managed directly. Under cgroups v2, the restrictions are applied by attaching eBPF programs that implement the requested access controls.
To apply resource limits to a container, use the --apply-cgroups
flag, which takes a path to a TOML file specifying the cgroups
configuration to be applied:
$ {command} shell --apply-cgroups /path/to/cgroups.toml my_container.sif
Note
The --apply-cgroups
option requires cgroups v2 to be used without root
privileges.
To limit the amount of memory that your container uses to 500MB
(524288000 bytes), set a limit
value inside the [memory]
section
of your cgroups TOML file:
[memory] limit = 524288000
Start your container, applying the toml file, e.g.:
$ {command} run --apply-cgroups path/to/cgroups.toml library://alpine
CPU usage can be limited using different strategies, with limits
specified in the [cpu]
section of the TOML file.
shares
This corresponds to a ratio versus other cgroups with cpu shares.
Usually the default value is 1024
. That means if you want to allow
to use 50% of a single CPU, you will set 512
as value.
[cpu] shares = 512
A cgroup can get more than its share of CPU if there are enough idle CPU cycles available in the system, due to the work conserving nature of the scheduler, so a contained process can consume all CPU cycles even with a ratio of 50%. The ratio is only applied when two or more processes conflicts with their needs of CPU cycles.
quota/period
You can enforce hard limits on the CPU cycles a cgroup can consume, so
contained processes can't use more than the amount of CPU time set for
the cgroup. quota
allows you to configure the amount of CPU time
that a cgroup can use per period. The default is 100ms (100000us). So if
you want to limit amount of CPU time to 20ms during period of 100ms:
[cpu] period = 100000 quota = 20000
cpus/mems
You can also restrict access to specific CPUs (cores) and associated
memory nodes by using cpus/mems
fields:
[cpu] cpus = "0-1" mems = "0-1"
Where container has limited access to CPU 0 and CPU 1.
Note
It's important to set identical values for both cpus
and
mems
.
To control block device I/O, applying limits to competing container, use
the [blockIO]
section of the TOML file:
[blockIO] weight = 1000 leafWeight = 1000
weight
and leafWeight
accept values between 10
and 1000
.
weight
is the default weight of the group on all the devices until
and unless overridden by a per device rule.
leafWeight
relates to weight for the purpose of deciding how heavily
to weigh tasks in the given cgroup while competing with the cgroup's
child cgroups.
To apply limits to specific block devices, you must set configuration
for specific device major/minor numbers. For example, to override
weight/leafWeight
for /dev/loop0
and /dev/loop1
block
devices, set limits for device major 7, minor 0 and 1:
[blockIO] [[blockIO.weightDevice]] major = 7 minor = 0 weight = 100 leafWeight = 50 [[blockIO.weightDevice]] major = 7 minor = 1 weight = 100 leafWeight = 50
You can also limit the IO read/write rate to a specific absolute value,
e.g. 16MB per second for the /dev/loop0
block device. The rate
is specified in bytes per second.
[blockIO] [[blockIO.throttleReadBpsDevice]] major = 7 minor = 0 rate = 16777216 [[blockIO.throttleWriteBpsDevice]] major = 7 minor = 0 rate = 16777216
{Project} can apply all resource limits that are valid in the OCI
runtime-spec resources
section, including unified
cgroups v2
constraints. It is most compatible, however, to use the cgroups v1 limits,
which will be translated to v2 format when applied on a cgroups v2 system.
See https://github.com/opencontainers/runtime-spec/blob/master/config-linux.md#control-groups for information about the available limits. Note that {Project} uses TOML format for the configuration file, rather than JSON.
The execution control list that can be used to restrict the execution of SIF files by signing key is defined here. You can authorize the containers by validating both the location of the SIF file in the filesystem and by checking against a list of signing entities.
Note
The ECL is not effective if unprivileged user namespaces are enabled. It is only effectively applied when {Project} is running in setuid mode, and unprivileged container execution is not possible on the host.
You must disable unprivileged user namespace creation on the host if you rely on the ECL limit container execution. This will disable {Project}'s user namespace mode and most of its fakeroot modes.
Warning
The ECL configuration applies to SIF container images only. To lock
down execution fully you should disable execution of other container
types (squashfs/extfs/dir) via the {command}.conf
file allow
container
settings.
[[execgroup]] tagname = "group2" mode = "whitelist" dirpath = "/tmp/containers" keyfp = ["7064B1D6EFF01B1262FED3F03581D99FE87EAFD1"]
Only the containers running from and signed with above-mentioned path and keys will be authorized to run.
Three possible list modes you can choose from:
Whitestrict: The SIF must be signed by all of the keys mentioned.
Whitelist: As long as the SIF is signed by one or more of the keys, the container is allowed to run.
Blacklist: Only the containers whose keys are not mentioned in the group are allowed to run.
Note
Containers signed by Singularity versions older than 3.6.0 will not pass ECL checks because they are insecure.
To temporarily permit the use of legacy insecure signatures, set
legacyinsecure = true
in ecl.toml
.
{Project} uses a global keyring for ECL signature
verification. This keyring can be administered using the --global
flag for the following commands:
{command} key import
(root user only){command} key pull
(root user only){command} key remove
(root user only){command} key export
{command} key list
Note
For security reasons, it is not possible to import private keys into
this global keyring because it must be accessible by users and is
stored in the file SYSCONFDIR/{command}/global-pgp-public
.
When a container includes a GPU enabled application, {Project} (with
the --nv
or --rocm
options) can properly inject the required
Nvidia or AMD GPU driver libraries into the container, to match the
host's kernel. The GPU /dev
entries are provided in containers run
with --nv
or --rocm
even if the --contain
option is used to
restrict the in-container device tree.
Compatibility between containerized CUDA/ROCm/OpenCL applications and host drivers/libraries is dependent on the versions of the GPU compute frameworks that were used to build the applications. Compatibility and usage information is discussed in the GPU Support section of the user guide.
The nvliblist.conf
configuration file is used to specify libraries
and executables that need to be injected into the container when running
{Project} with the --nv
Nvidia GPU support option. The provided
nvliblist.conf
is suitable for CUDA 11, but may need to be modified
if you need to include additional libraries, or further libraries are
added to newer versions of the Nvidia driver/CUDA distribution.
When adding new entries to nvliblist.conf
use the bare filename of
executables, and the xxxx.so
form of libraries. Libraries are
resolved via ldconfig -p
, and exectuables are found by searching
$PATH
.
The nvidia-container-cli tool is Nvidia's officially support method for configuring containers to use a GPU. It is targeted at OCI container runtimes.
{Project} has an experimental --nvccli
option, which
will call out to nvidia-container-cli
for container GPU setup,
rather than use the nvliblist.conf
approach.
For security reasons, nvidia-container-cli
cannot be used with privileged
mode in a SUID installation of {Project}, it can only be used unprivileged.
The operations performed by nvidia-container-cli
are broadly similar to
those which {Project} carries out when setting up a GPU container
from nvliblist.conf
.
The rocmliblist.conf
file is used to specify libraries and
executables that need to be injected into the container when running
{Project} with the --rocm
Radeon GPU support option. The
provided rocmliblist.conf
is suitable for ROCm 4.0, but may need to
modified if you need to include additional libraries, or further
libraries are added to newer versions of the ROCm distribution.
When adding new entries to rocmlist.conf
use the bare filename of
executables, and the xxxx.so
form of libraries. Libraries are
resolved via ldconfig -p
, and exectuables are found by searching
$PATH
.
The nvliblist.conf
and rocmliblist
files list the basename of
executables and libraries to be bound into the container, without path
information.
Binaries are found by searching $PATH
:
# put binaries here # In shared environments you should ensure that permissions on these files # exclude writing by non-privileged users. rocm-smi rocminfo
Libraries should be specified without version information, i.e.
libname.so
, and are resolved using ldconfig
.
# put libs here (must end in .so) libamd_comgr.so libcomgr.so libCXLActivityLogger.so
If you receive warnings that binaries or libraries are not found, ensure
that they are in a system path (binaries), or available in paths
configured in /etc/ld.so.conf
(libraries).
Warning
It is extremely important to recognize that granting users Linux
capabilities with the capability
command group is usually
identical to granting those users root level access on the host
system. Most if not all capabilities will allow users to "break
out" of the container and become root on the host. This feature is
targeted toward special use cases (like cloud-native architectures)
where an admin/developer might want to limit the attack surface
within a container that normally runs as root. This is not a good
option in multi-tenant HPC environments where an admin wants to grant
a user special privileges within a container. For that and similar
use cases, the :ref:`fakeroot feature <fakeroot>` is a better option.
{Project} in SUID mode provides full support for admins to grant and revoke
Linux capabilities on a user or group basis. The capability.json
file is
maintained by {Project} in order to manage these capabilities. The
capability
command group allows you to add
, drop
, and
list
capabilities for users and groups.
For example, let us suppose that we have decided to grant a user (named
pinger
) capabilities to open raw sockets so that they can use
ping
in a container where the binary is controlled via capabilities.
To do so, we would issue a command such as this:
$ sudo {command} capability add --user pinger CAP_NET_RAW
This means the user pinger
has just been granted permissions
(through Linux capabilities) to open raw sockets within {Project}
containers.
We can check that this change is in effect with the capability list
command.
$ sudo {command} capability list --user pinger CAP_NET_RAW
To take advantage of this new capability, the user pinger
must also
request the capability when executing a container with the
--add-caps
flag. pinger
would need to run a command like this:
$ {command} exec --add-caps CAP_NET_RAW \ library://sylabs/tests/ubuntu_ping:v1.0 ping -c 1 8.8.8.8 PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data. 64 bytes from 8.8.8.8: icmp_seq=1 ttl=52 time=73.1 ms --- 8.8.8.8 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 73.178/73.178/73.178/0.000 ms
If we decide that it is no longer necessary to allow the user pinger
to open raw sockets within {Project} containers, we can revoke the
appropriate Linux capability like so:
$ sudo {command} capability drop --user pinger CAP_NET_RAW
Now if pinger
tries to use CAP_NET_RAW
, {Project} will not
give the capability to the container and ping
will fail to create a
socket:
$ {command} exec --add-caps CAP_NET_RAW \ library://sylabs/tests/ubuntu_ping:v1.0 ping -c 1 8.8.8.8 WARNING: not authorized to add capability: CAP_NET_RAW ping: socket: Operation not permitted
The capability add
and drop
subcommands will also accept the
case insensitive keyword all
to grant or revoke all Linux
capabilities to a user or group.
For more information about individual Linux capabilities check out the
man pages
or use the capability avail
command to output available capabilities
with a description of their behaviors.
Secure Computing (seccomp) Mode is a feature of the Linux kernel that allows an administrator to filter system calls being made from a container. Profiles made up of allowed and restricted calls can be passed to different containers. Seccomp provides more control than capabilities alone, giving a smaller attack surface for an attacker to work from within a container.
You can set the default action with defaultAction
for a non-listed
system call. Example: SCMP_ACT_ALLOW
filter will allow all the
system calls if it matches the filter rule and you can set it to
SCMP_ACT_ERRNO
which will have the thread receive a return value of
errno if it calls a system call that matches the filter rule. The file
is formatted in a way that it can take a list of additional system calls
for different architecture and {Project} will automatically take
syscalls related to the current architecture where it's been executed.
The include
/exclude
-> caps
section will include/exclude the
listed system calls if the user has the associated capability.
Use the --security
option to invoke the container like:
$ sudo {command} shell --security seccomp:/home/david/my.json my_container.sif
For more insight into security options, network options, cgroups, capabilities, etc, please check the Userdocs and it's Appendix.
System-wide remote endpoints are defined in a configuration file
typically located at /usr/local/etc/{command}/remote.yaml
(this
location may vary depending on installation parameters) and can be
managed by administrators with the remote
command group.
{Project} allows users to login to an account and authenticate with a Library API Registry. Whether that registry exists on premise or in the cloud.
Note
A fresh installation of {Project} is configured with the DefaultRemote
,
which does not support the Library API as it is only configured with a
functioning key server, https://keys.openpgp.org
. Users or administrators
should configure one of the Library API implementations listed here if they would
like to use a Library API registry.
Examples
Use the remote
command group with the --global
flag to create a
system-wide remote endpoint:
$ sudo {command} remote add --global company-remote https://enterprise.example.com INFO: Remote "company-remote" added. INFO: Global option detected. Will not automatically log into remote.
Conversely, to remove a system-wide endpoint:
$ sudo {command} remote remove --global company-remote INFO: Remote "company-remote" removed.
Note
Once users log in to a system wide endpoint, a copy of the endpoint
will be listed in a their ~/.{command}/remote.yaml
file. This
means modifications or removal of the system-wide endpoint will not
be reflected in the users configuration unless they remove the
endpoint themselves.
{Project} has the ability for an administrator to make a
remote the only usable remote for the system by using the
--exclusive
flag:
$ sudo {command} remote use --exclusive company-remote INFO: Remote "company-remote" now in use. $ {command} remote list Cloud Services Endpoints ======================== NAME URI ACTIVE GLOBAL EXCLUSIVE INSECURE DefaultRemote cloud.apptainer.org NO YES NO NO company-remote enterprise.example.com YES YES YES NO myremote enterprise.example.com NO NO NO NO Keyservers ========== URI GLOBAL INSECURE ORDER https://keys.example.com YES NO 1* * Active cloud services keyserver
If you are using a endpoint that exposes its
service discovery file over an insecure HTTP connection only, it can be
added by specifying the --insecure
flag:
$ sudo {command} remote add --global --insecure test http://test.example.com INFO: Remote "test" added. INFO: Global option detected. Will not automatically log into remote.
This flag controls HTTP vs HTTPS for service discovery only. The protocol used to access individual library, build and keyserver URLs is set by the service discovery file.
{Project}'s default remote endpoint configures only a public key
server, it does not support the library://
protocol.
Formerly the default was set to point to Sylabs servers, but the
read/write support of the oras://
protocol by for example the
GitHub Container Registry
makes it unnecessary.
The remote endpoint was also formerly used for builds using the
build --remote
option, but {Project} does not support that.
Instead, it supports
unprivileged local builds.
If you would still like to have the previous default for all users,
these are the commands to restore the library behavior from before
{Project}, where using the library://
URI would download from the
Sylabs Cloud anonymously:
$ sudo {command} remote add --global SylabsCloud cloud.sycloud.io INFO: Remote "SylabsCloud" added. INFO: Global option detected. Will not automatically log into remote. $ sudo {command} remote use --global SylabsCloud INFO: Remote "SylabsCloud" now in use. $ {command} remote list Cloud Services Endpoints ======================== NAME URI ACTIVE GLOBAL EXCLUSIVE DefaultRemote cloud.apptainer.org NO YES NO SylabsCloud cloud.sycloud.io YES YES NO Keyservers ========== URI GLOBAL INSECURE ORDER https://keys.production.sycloud.io YES NO 1* * Active cloud services keyserver
For more details on the remote
command group and managing remote
endpoints, please check the Remote Userdocs.
By default, {Project} will use the keyserver correlated to the
active cloud service endpoint. This behavior can be changed or
supplemented via the add-keyserver
and remove-keyserver
commands. These commands allow an administrator to create a global list
of key servers used to verify container signatures by default.
For more details on the remote
command group and managing
keyservers, please check the Remote Userdocs.
The dmtcp-conf.yaml
is a YAML configuration file that is used to specify the
libraries and executables that need to be injected into the container, without
path information. This configuration is used when running {Project} instances
with application checkpointing ( e.g. --dmctp-launch
, dmtcp-restart
).
Note
This feature is marked as experimental to allow flexibility as community feedback may warrant breaking changes to improve the overall usability for this feature set as it matures.
Binaries are specified as an array under the bins
key and are found by
searching $PATH
:
# List binaries to bind into the container here # In shared environments you should ensure that permissions on these files # exclude writing by non-privileged users. bins: - "dmtcp_command" - "dmtcp_discover_rm" - "dmtcp_launch" [...]
Libraries are specified as an array under the libs
key and should be
specified without version information, i.e.
libname.so
, and are resolved using ldconfig
.
# List libraries to bind into the container here. Library names must end in ".so" libs: - "libdmtcp_alloc.so" - "libdmtcp_dl.so" - "libdmtcp_modify-env.so" [...]
If you receive warnings that binaries or libraries are not found, ensure
that they are in a system path (binaries), or available in paths
configured in /etc/ld.so.conf
(libraries).
For more details on the checkpointing features in {Project}, please check the Checkpoint Userdocs.