From c588614de81cc1076a6d020849f23b7ca295c27a Mon Sep 17 00:00:00 2001 From: Vitaliy Emporopulo Date: Thu, 7 Mar 2024 15:05:17 +0200 Subject: [PATCH] Document deploying DRA to OpenShift * Document the differences on OpenShift * Include useful setup scripts Signed-off-by: Vitaliy Emporopulo --- README.md | 2 +- demo/clusters/openshift/README.md | 142 ++++++++++++++++++ .../openshift/add-certified-catalog-source.sh | 21 +++ demo/clusters/openshift/enable-dra-profile.sh | 6 + .../openshift/extend-kube-scheduler-rbac.sh | 30 ++++ 5 files changed, 200 insertions(+), 1 deletion(-) create mode 100644 demo/clusters/openshift/README.md create mode 100755 demo/clusters/openshift/add-certified-catalog-source.sh create mode 100755 demo/clusters/openshift/enable-dra-profile.sh create mode 100755 demo/clusters/openshift/extend-kube-scheduler-rbac.sh diff --git a/README.md b/README.md index 9a7b497b..75861ca4 100644 --- a/README.md +++ b/README.md @@ -12,7 +12,7 @@ A document and demo of the DRA support for GPUs provided by this repo can be fou ## Demo -This section describes using `kind` to demo the functionality of the NVIDIA GPU DRA Driver. +This section describes using `kind` to demo the functionality of the NVIDIA GPU DRA Driver. For Red Hat OpenShift, refer to [running the NVIDIA DRA driver on OpenShift](demo/clusters/openshift/README.md). First since we'll launch kind with GPU support, ensure that the following prerequisites are met: 1. `kind` is installed. See the official documentation [here](https://kind.sigs.k8s.io/docs/user/quick-start/#installation). diff --git a/demo/clusters/openshift/README.md b/demo/clusters/openshift/README.md new file mode 100644 index 00000000..6910a203 --- /dev/null +++ b/demo/clusters/openshift/README.md @@ -0,0 +1,142 @@ +# Running the NVIDIA DRA Driver on Red Hat OpenShift + +This document explains the differences between deploying the NVIDIA DRA driver on OpenShift and upstream Kubernetes or its flavors. + +## Prerequisites + +Install a recent build of OpenShift 4.16 (e.g. 4.16.0-ec.3). You can obtain an IPI installer binary (`openshift-install`) from the [Release Status](https://amd64.ocp.releases.ci.openshift.org/) page, or use the Assisted Installer to install on bare metal. Refer to the [OpenShift documentation](https://docs.openshift.com/container-platform/4.15/installing/index.html) for different installation methods. + +## Enabling DRA on OpenShift + +Enable the `TechPreviewNoUpgrade` feature set as explained in [Enabling features using FeatureGates](https://docs.openshift.com/container-platform/4.15/nodes/clusters/nodes-cluster-enabling-features.html), either during the installation or post-install. The feature set includes the `DynamicResourceAllocation` feature gate. + +Update the cluster scheduler to enable the DRA scheduling plugin: + +```console +$ oc patch --type merge -p '{"spec":{"profile": "HighNodeUtilization", "profileCustomizations": {"dynamicResourceAllocation": "Enabled"}}}' scheduler cluster +``` + +## NVIDIA GPU Drivers + +The easiest way to install NVIDIA GPU drivers on OpenShift nodes is via the NVIDIA GPU Operator. + +**Be careful to disable the device plugin so it does not conflict with the DRA plugin**. It is recommended to leave only the NVIDIA GPU driver and driver toolkit configs, and disable everything else: + +```yaml + <...> + devicePlugin: + enabled: false + <...> + driver: + enabled: true + <...> + toolkit: + enabled: true + <...> +``` + + +The NVIDIA GPU Operator might not be available through the OperatorHub in a pre-production version of OpenShift. In this case, deploy the operator from a bundle or add a certified catalog index from an earlier version of OpenShift, e.g.: + +```yaml +kind: CatalogSource +apiVersion: operators.coreos.com/v1alpha1 +metadata: + name: certified-operators-v415 + namespace: openshift-marketplace +spec: + displayName: Certified Operators v4.15 + image: registry.redhat.io/redhat/certified-operator-index:v4.15 + priority: -100 + publisher: Red Hat + sourceType: grpc + updateStrategy: + registryPoll: + interval: 10m0s +``` + +Then follow the installation steps in [NVIDIA GPU Operator on Red Hat OpenShift Container Platform](https://docs.nvidia.com/datacenter/cloud-native/openshift/latest/index.html). + +## NVIDIA Binaries on RHCOS + +The location of some NVIDIA binaries on an OpenShift node differs from the defaults. Make sure to pass the following values when installing the Helm chart: + +```yaml +nvidiaDriverRoot: /run/nvidia/driver +nvidiaCtkPath: /var/usrlocal/nvidia/toolkit/nvidia-ctk +``` + +## OpenShift Security + +OpenShift generally requires more stringent security settings than Kubernetes. If you see a warning about security context constraints when deploying the DRA plugin, pass the following to the Helm chart, either via an in-line variable or a values file: + +```yaml +kubeletPlugin: + containers: + plugin: + securityContext: + privileged: true + seccompProfile: + type: Unconfined +``` + +If you see security context constraints errors/warnings when deploying a sample workload, make sure to update the workload's security settings according to the [OpenShift documentation](https://docs.openshift.com/container-platform/4.15/operators/operator_sdk/osdk-complying-with-psa.html). Usually applying the following `securityContext` definition at a pod or container level works for non-privileged workloads. + +```yaml + securityContext: + runAsNonRoot: true + seccompProfile: + type: RuntimeDefault + allowPrivilegeEscalation: false + capabilities: + drop: + - ALL +``` + +If you see the following error when trying to deploy a workload: + +```console +Warning FailedScheduling 21m default-scheduler running Reserve plugin "DynamicResources": podschedulingcontexts.resource.k8s.io "gpu-example" is forbidden: cannot set blockOwnerDeletion if an ownerReference refers to a resource you can't set finalizers on: , +``` + +apply the following RBAC configuration (this should be fixed in newer OpenShift builds): + +```yaml +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRole +metadata: + name: system:kube-scheduler:podfinalizers +rules: +- apiGroups: + - "" + resources: + - pods/finalizers + verbs: + - update +--- +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRoleBinding +metadata: + name: system:kube-scheduler:podfinalizers:crbinding +roleRef: + apiGroup: rbac.authorization.k8s.io + kind: ClusterRole + name: system:kube-scheduler:podfinalizers +subjects: +- kind: User + name: system:kube-scheduler +``` + +## Using Multi-Instance GPU (MIG) + +Workloads that use the Multi-instance GPU (MIG) feature require MIG to be [enabled](https://docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html#enable-mig-mode) on the worker nodes with [MIG-supported GPUs](https://docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html#supported-gpus), e.g. A100. + +You can do it via the driver daemon set pod running on a GPU node as follows (here, the GPU ID is 0, i.e. `-i 0`): + +```console +oc exec -ti nvidia-driver-daemonset-416.94.202402160025-0-g45bd -n nvidia-gpu-operator -- nvidia-smi -i 0 -mig 1 +Enabled MIG Mode for GPU 00000000:0A:00.0 +All done. +``` + +Make sure to stop everything that may hold the GPU before enabling MIG. Otherwise you will see a warning, and the MIG status will have an asterisk (i.e. `Enabled*`), meaning that the setting could not be applied. \ No newline at end of file diff --git a/demo/clusters/openshift/add-certified-catalog-source.sh b/demo/clusters/openshift/add-certified-catalog-source.sh new file mode 100755 index 00000000..12fe1495 --- /dev/null +++ b/demo/clusters/openshift/add-certified-catalog-source.sh @@ -0,0 +1,21 @@ +#!/usr/bin/env bash + +set -ex +set -o pipefail + +oc create -f - <