-
Notifications
You must be signed in to change notification settings - Fork 12
Remote Desktop Dev Notes
Docker images are built in layers. Each Dockerfile instruction, such as USER,
COPY, and RUN, creates a new layer. And everything in a particular instruction
(e.g. RUN foo && \ bar && \ baz
) is part of the same layer.
Layers are cached during the build process. After the first time a Docker image
is built, only the layer that changed and every layer after it will be rebuilt,
saving development time. These cached layers can even be used in other images,
up to the point of the first different layer. It is possible to circumvent this
by either clearing the build cache or building with the --no-cache
argument.
Layers can be thought of as a record of changes. This allows them to be cached and shared, but there is a downside. If a file is created on one layer, and then changed on another layer, the total image size increases by the size of that file for each change on a different layer. This increase occurs even if only the file permissions are changed, and, perhaps counterintuitively, even if the change is the deletion of the file.
However, only the final state of the layer matters. If a file is changed and reverted, or created and deleted in the same layer, the image size will not increase.
The most common way to get around this limitation is to therefore make sure that as little as possible changes between layers. This includes, for example, clearing installation caches and other temporary files at the end of each layer where something is installed, so that these caches are never ultimately added to the image. That is often done in a reusable general-purpose layer cleaning script. Another common script is one that fixes permissions, to keep them as consistent as possible between layers.
There are two other common methods, but neither seems appropriate for the scale
and complexity of the Remote Desktop images. One is to use an experimental
docker
daemon
to build with the --squash
argument, which combines every layer into a single
layer. But this removes the advantages of multiple layers. Layers could no
longer be shared, increasing the size that images take up in the Container
Registry. And every build would start from scratch, even if an error occurs at
the very end, which would make debugging and updating unfeasibly time consuming.
Another option appropriate to simpler images is to use multi-stage builds, where part or all of the final file system of one image is copied into another image. However, Remote Desktop is built for a typical Linux desktop experience, where some files have permissions only for root, and others have permissions for non-root users, as well as users and groups designed for systems and software. To the extent of my experimentation, these necessary permissions were unfortunately lost during the copy process. Restoring them manually afterwards is both very complicated and introduces the same image size increasing changes that this process was meant to avoid.
How can we best take advantage of layers to spend the least time building and the most time testing our changes? By making these changes as late in the Dockerfile as possible, so that the most cached layers are used and the fewest are rebuilt. This process can involve temporarily replacing something built on an early layer.
For example, let’s say we want to change the contents of a script.sh
in a 100
layer image were layer 2 is COPY example/local/dir/script.sh example/container/dir/script.sh
and layer 87 is RUN a/container/dir/script.sh
.
If we edit example/local/dir/script.sh
directly, layers 3-100 will rebuild for
each change.
What we can do instead is make our changes in a new file, newscript.sh
, and
use it to overwrite the original. If right before layer 87 we insert COPY newscript.sh example/container/dir/script.sh
, only the layers 87 and onward
will rebuild.
We can do even better by moving both COPY newscript.sh example/container/dir/script.sh
and RUN a/container/dir/script.sh
as near to
the end of the Dockerfile as possible (or right before the first layer that
depends on them, and seeing if that can also be moved).
Similarly, instead of changing an installation (such as apt-get install
,
conda install
...) early in the Dockerfile, we can go as close to the end of
the Dockerfile as possible to RUN
a corresponding uninstall
, and proceed to
test the absence of that package or its replacement by another.
These are all temporary changes to streamline the development process: when a working solution is identified, these experimental overwrites can be removed and their changes applied to the original layers. Now we can do a final verification by rebuilding from the original layer, ideally only needing to do that for the ultimate solution instead of for each attempted change.
Building images and running containers can become repetitive and error-prone for
many quick, small changes. Consider creating a script to combine the process for
you. How to best do so will vary depending on your operating system. For rapid
testing of Remote Desktop images in a Linux environment, I have the following in
my ~/.bashrc
:
dr() {
docker run --rm -it -p 8888:8888 $(docker build -q .)
}
With this, when I am in a terminal at the folder of any Dockerfile that runs on
port 8888 (appropriate for the Remote Desktop images; the function can be
changed for others), I only have to enter dr
to build the image and run it in
a container that will be removed after it is exited (--rm1
).
docker run --rm -it -[your image id] /bin/bash
opens an interactive terminal
into the convtainer, which is convenient for quick debugging of certain issues.
You can also add -u=root
after docker run
to debug with administrative
privileges.
Building a lot of images and using a lot of containers can quickly use up a lot of hard drive space.
Use --rm
when running containers to make sure that they are removed after they
are stopped.
Use docker tag [image id] [name:tag]
to give names to images you want to keep,
and clear the rest with docker rmi $(docker images | grep "^<none>" | awk "{print $3}")
or an equivalent for your operating system.
For an all-purpose deep clean, use docker system prune
, but note that this
will also remove your build caches.
Remote Desktop is based on https://github.com/ml-tooling/ml-workspace and adapted to our project’s needs.
The most significant difference is that while ML Workspace is built for one “root” user with full administrative privileges over the operating system in the container, for security purposes Remote Desktop runs as a regular user. At the time of writing this documentation, this user is always "jovyan".
Root is still used in the Dockerfiles to perform installations, and sudo access has been granted to run 3 specific Netdata, Rsyslogd, and Cron commands as I did not find a non-privileged way of using them in the time I had to prioritise that task (but there might still be one).
Remote Desktop is also much smaller than ML Workspace, removing tools, packages,
and libraries that are not required by typical users of this project, as long as
anyone who still needs them can install them themselves (through e.g. pip
,
conda
, npm
).
We also have two official image extensions: r
and geomatics
. These add
commonly, but not universally, requested software that requires administrative
privileges to install, but is too complex (long build time) and large (image
size increase) to warrant including in the base Dockerfile. r
extends the base
image to install RStudio and various R libraries, while geomatics
extends r
to add QGIS and various geomatics libraries. Note that while most other software
installation processes use checksums, QGIS is validated with its GPG key.
The Github Actions workflow defines a CI process such that when a change is accepted into the master repository, these images are sequentially built, tagged with the SHA of their commit to master, scanned for vulnerabilities, and pushed to an Azure Container Registry that users of our Kubeflow instance can pull from.
This workflow does not automatically populate the “Create Notebook Server” dropdown. To do that, submit a pull request to update the SHAs in this ConfigMap. Note that the base image is not offered in the dropdown: its purpose is to streamline development.
Github Actions have an image size limit: around 14 GB. In the future, a GPU version of Remote Desktop may be created, and it might not be possible to make it smaller than 14 GB. In that case, it would have to be pushed manually (akin to the manual build instructions below), or a different CI process would need to be established.
There is presently also one unofficial experimental image. It does not have a CI process, both due to its nature as a temporary placeholder, and because it is too large for Github Actions have an image size limit: around 14 GB.
- Install the Azure CLI if you have not done so before.
- Log in to the ACR with the command
az acr login -n k8scc01covidacr
using your cloud account. Sometimes the connection is refused: try again until it goes through (it should only take 2-4 tries unless something abnormal is going on). - If you are testing an image that is based on parent images where you have
made changes, build them first in the appropriate sequence and tag them
locally with the master tag, but do not push them. For example, if you have
changes in
base
that you want to test ingeomatics
:- Build the modified base image.
- Tag the modified base image with
docker tag [image id] k8scc01covidacr.azurecr.io/remote-desktop-base:master
- Build the
r
image. - Tag the r image with
docker tag [image id] k8scc01covidacr.azurecr.io/remote-desktop-r:master
- Build the image you want to test.
- Tag the image you want to test with
docker tag [image id] k8scc01covidacr.azurecr.io/remote-desktop-test:[add a descriptive tag]
docker push k8scc01covidacr.azurecr.io/remote-desktop-test:[your tag]
- Create a notebook server on Kubeflow with
k8scc01covidacr.azurecr.io/remote-desktop-test:[your tag]
as a custom image.
Note: If step 6 fails, ask your supervisor to confirm that your cloud account
has been granted the appropriate permissions for the Azure Container Registry
k8scc01covidacr
Jupyter Notebook is available as a vestigial interface. When possible, Jupyter users are directed to use one of the JupyterLab images which are individually supported and far more fully featured. However, it is not yet possible connect a Remote Desktop image and a different image (or any two different images) to the same Persistent Volume storage at the same time. In the future, when ReadWriteMany PVCs (or any alternative such as Minio buckets) are supported and confirmed working for this purpose, Jupyter will be removed from Remote Desktop.
Alternatively, this may be enacted earlier if the experimental image that combines JupyterLab and Remote Desktop is deemed to be a sufficient placeholder. However, that image would need to be kept up to date.
The eventual removal of Jupyter includes at least the following:
- resources/branding
- resources/home/.workspace
- resources/jupyter
- resources/tutorials
- jupyter-related configuration in nginx.conf
- jupyter-related code in base Dockerfile
- possibly docker-entrypoint.py which would require at least a minor refactor
- reviewing resources/scripts
- resources/icons other than netdata (consider defining a netata icon from elsewhere, or if present using one available somewhere in netdata’s install directory, and then removing all of resources/icons)
- resources/licenses (or update)
- resources/reports (if resources/tests is completely removed; requires minor refactor)
- resources/ssh - to confirm that it doesn’t interfere with VNC functionality
- reviewing resources/scripts (for unused non-Jupyter scripts)
- resources/tests
The resources folder of the base image defines a lot of configurations, most of which are accessed by the base Dockerfile.
Unchanged from original.
This folder contains branding assets for ML Workspace. The last of these that are still in use are in the Jupyter Notebook Tree View: when that is obsolete, this directory can be deleted.
Unchanged from original.
Contains a couple of configurations for apt-get installations and xrdp (graphical remote login).
This directory gets copied into the home directory of the container. In the
present single-user state, the home directory of the container is /home/jovyan,
and can also be referred to as the potentially more dynamic environment
variables $HOME
and /home/$NB_USER
resources/home
contains that following subdirectories:
.config
Miscellaneous configurations, including VS Code configurations and default application associations.
.workspace/tools
JSON corresponding to the “Open Tools” dropdown of the Jupyter Notebook Tree View: when that is obsolete, this subdirectory can be deleted.
Desktop
Ensures that an empty “Desktop” directory is copied into the user’s home directory, to be populated by application shortcuts added during installation processes. An empty .dockerignore (a file type that is not built into the container) is in place because a fully empty directory would otherwise disappear from Git tracking.
Unchanged from original.
Only netdata-icon.png is in use as the icon for the desktop shortcut to netdata.
Unchanged from original beyond renaming an instance of the term “Workspace” with “Remote Desktop”.
Various configurations for the Jupyter Notebook Tree View: when that is obsolete, this directory can be deleted.
The assets for the custom landing page for Remote Desktop. They require
corresponding server configuration entries in nginx/nginx.conf
to be
displayed.
Unchanged from original.
Contains various license information, perhaps automatically generated with some sort of tool, for the original ML Workspace image: they are not up-to-date for Remote Desktop. Consider removing or finding a way to update them.
Various configurations for nginx, which handles making the container accessible
through a web browser via reverse proxies. It provides an in-browser path, such
as through Access Port
and/or with an alias such as tools/vscode
, to what is
running inside the container on localhost:[port number]
. Modifications have
been made to nginx.conf, and further modifications will be warranted for Jupyter
removal.
Note that the WORKSPACE_BASE_URL
is a dynamic variable that corresponds to the
relative path for connecting to the container on Kubeflow. It is of the pattern
/notebook/[namespace]/[notebook server name]
.
The value for WORKSPACE_BASE_URL
(or the /workspace
placeholder when there
is no value) is passed by [init.sh](#init-sh)
. When there is no such value,
which occurs when the Remote Desktop container is run locally, the placeholder
value /workspace
is used.
Configurations for the VNC client, such as the lefthand sidebar for the Desktop GUI and clipboard behavior.
Note that direct clipboard sharing (ctrl+c and ctrl+v, or the command equivalent on Mac, working without use of the sidebar) is presently only supported on Chrome. Firefox does not yet permit clipboard sharing in both directions, and other browsers have had limited testing.
A directory for holding reports generated by resources/tests
, which may no
longer be appropriate for Remote Desktop (unverified). This directory itself is
empty, but was kept because something in the code expects it to exist. If
resources/tests
will be removed, then resources/reports
should be refactored
out.
Unchanged from original beyond renaming an instance of the term “Workspace” with “Remote Desktop”.
Various scripts. clean-layer.sh
and fix-permissions.sh
are used throughout
the base Dockerfile, while run_workspace.py
, start-vnc-server.sh
, and
configure-nginx.py
are critical to the appropriate functioning of the
container. The rest warrant review: while some may be important, others may not
be in use, or may become irrelevant after Jupyter is removed.
Should be removed if it does not interfere with the VNC / RDP setup (graphically connecting to the desktop envrionment).
Supervisor ensures that certain programs, as configured in the conf.d
subdirectory, run at startup on specified ports, restart when they crash, and
output status logs (stdout locally, pod logs on Kubeflow).
Unchanged from original.
Various tests that have not been verified and may no longer be compatible with
Remote Desktop. To review. If they are all removed, resources/reports
should
also be removed (that requires a minor refactor: its existence is expected
somewhere).
Various installation scripts referenced by the base Dockerfile. To make the base Dockerfile easier to read, consider converting sections of it into additional .sh files here, keeping in mind the difference in syntax between Dockerfile layers and plain bash.
Unchanged from original.
Jupyter notebook tutorials to be deleted when Jupyter is removed.
Unchanged from original.
A simple loading page that refreshes itself until a resource is ready. Might
make sense to move to resources/landing_page
. A future design task could be to
rebrand it to Remote Desktop. Note that it does not make use of
resources/branding
.
Unchanged from original.
Runs directly after [init.sh](#init-sh)
and sets various configurations, but
mostly for components which are not in Remote Desktop. After Jupyter is removed,
see if it is possible to remove this file and go directly to
scripts/run_workspace.py
instead.
The current image is based on Ubuntu 18.04.
At the beginning of the base Dockerfile, as well as the extension Dockerfiles, the user is set to root, and at the end the user is set to the environmental variable $NB_USER (presently always ‘jovyan’).
Environmental variables are defined at the beginning, with the exception of ones that are part of an installation process.
The resources described above are brought in with COPY instructions. One present
inconsistency is that some software is installed from bash scripts in
resources/tools
, while other software is installed directly in the Dockerfile.
A potential refactor would be to separate at least the more complex of these
installs in the Dockerfile out into bash scripts, which could make the
Dockerfile easier to read, navigate, and track changes in.
Checksums are used throughout the Dockerfile, preventing the build from proceeding when the SHA-256 sum of an installation file no longer matches its expected value. This could occur due to malicious interception or, more commonly because the file has been updated, removed, or replaced. When such software is updated, a new SHA-256 sum must be generated for it, which can be done on a development machine (and, when possible, verified against an official source). How to do so varies by operating system: a Linux approach is sha256sum.
In order to allow the 3 privileged supervisor commands to run with sudo access, they are added to the sudoers.d file for the user with NOPASSWD so that there will be no password prompt (for which there is no root password set to begin with). These are logging commands managed automatically by supervisor without interaction from the user.
An unusual instruction, mv $HOME/.config $HOME/.config2 && \ mv $HOME/.config2 $HOME/.config
, is a workaround for a strange bug. Without this workaround,
sometimes the permissions on $HOME/.config
break: they are present when
logging in to the shell as root and checking with ls -a
, but
return question marks when doing so as the standard user jovyan
. These broken
permissions render the files in that directory, and in turn the GUI,
inaccessible. At the time of writing, I had neither been able to trace why it
this bug would (sometimes, but not always) occur; nor why moving $HOME/.config
back and forth prevents it.
Another important workaround is the workspace override hotfix. When a Remote
Desktop image is used to create a Notebook Server on Kubeflow, and a Persistent
Volume is selected or created, that Persistent Volume mounts to the home
directory (presently always /home/jovyan
) and replaces that home directory
completely. Everything in the home directory that has been set up throughout the
Dockerfile, including critical configurations, is thus lost. Therefore, at the
end of the Dockerfile, the contents of the home directory are copied to another
folder, home_nbuser_default
, to preserve them. init.sh
, described below,
will copy the contents of the original home directory back into the newly
overwritten home directory (now a Persistent Volume) at container runtime.
Tini is used as the
entrypoint which
runs the CMD, executing init.sh
.
Because this script runs after the point that a persistent volume is attached
(if one is specified), it completes the workspace overwrite workaround by
copying home*nbuser_default back into the home directory that would have now
been overwritten by the persistent volume. --no-clobber
is specified in order
to not overwrite any existing data on the persistent volume.
The script then exports the variables WORKSPACE_BASE_URL
(using the path
received as NB_PREFIX
from Kubeflow, or if none exists (such as during a local
docker run
) then using the placeholder "/workspace
") and NB_USER
(presently always jovyan
) so that they are accessible elsewhere, and concludes
by running docker-entrypoint.py.