Skip to content

Commit

Permalink
Add node selection documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
athornton committed Dec 23, 2024
1 parent 56f241a commit 8abd69d
Show file tree
Hide file tree
Showing 2 changed files with 32 additions and 0 deletions.
1 change: 1 addition & 0 deletions docs/admin/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ Guides to common problems and instructions on how to integrate properly with Pha
:maxdepth: 2
:caption: Guides

node-selection
home-directories
init-containers
setup-gar
Expand Down
31 changes: 31 additions & 0 deletions docs/admin/node-selection.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
######################
Forcing Node Selection
######################

You may want to force user labs and fileservers to run on particular nodes.
A particular use case is when you are running on a Kubernetes cluster that supports autoscaling, and you would prefer that:

* User labs be the only workload on a particular set of nodes, so they are easily separated from the rest of the Phalanx machinery.
* These nodes be optimized for resource usage rather than distribution, so scaling is less painful because the pods to be disrupted are gathered rather than widely spread.

Taints and Tolerations
======================

In order to ensure that lab workload is the only workload on given nodes, you would assign those nodes particular taint keys and values, and then arrange for the controller or fileserver configuration to tolerate those taints.
We conventionally use the key ``nublado.lsst.io/permitted`` with a value (which is ignored) of ``true`` and an effect of ``NoExecute``.
The corresponding toleration in the configuration restates the key and effect, and uses the operator ``Exists`` (which is why the value does not matter).
Note that you should repeat this in the configuration for both the controller and the fileserver.

Node Selection
==============

However, this is only half the job.
Having told kubernetes that only user labs and fileservers can execute on tained nodes, we now want to ensure that they go there.

There are two ways to do this.
One is with the ``nodeSelector`` configuration item.
This is the preferred method.
It is much simpler than using ``Affinity`` and, more important, the prepuller understands ``nodeSelector`` and if you use that method, the prepuller will only pull to nodes that can be selected for Lab workloads.
If you use ``Affinity`` the workload will still end up being scheduled onto the right nodes, and those nodes will still have prepulled images, but the prepuller will pull to other nodes as well.
Assuming that you can use a simple ``nodeSelector`` label, which you probably can, do that.
If, for instance, you have a node pool all of whose members are labelled with ``node_pool`` equal to ``user-lab-pool``, then you should have ``nodeSelector: { node_pool: "user-lab-pool" }`` in your fileserver and controller configuration.

0 comments on commit 8abd69d

Please sign in to comment.