Skip to content

Commit

Permalink
Document deploying DRA to OpenShift
Browse files Browse the repository at this point in the history
* Document the differences on OpenShift
* Include useful setup scripts

Signed-off-by: Vitaliy Emporopulo <[email protected]>
  • Loading branch information
empovit committed Mar 7, 2024
1 parent 1646035 commit c588614
Show file tree
Hide file tree
Showing 5 changed files with 200 additions and 1 deletion.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ A document and demo of the DRA support for GPUs provided by this repo can be fou

## Demo

This section describes using `kind` to demo the functionality of the NVIDIA GPU DRA Driver.
This section describes using `kind` to demo the functionality of the NVIDIA GPU DRA Driver. For Red Hat OpenShift, refer to [running the NVIDIA DRA driver on OpenShift](demo/clusters/openshift/README.md).

First since we'll launch kind with GPU support, ensure that the following prerequisites are met:
1. `kind` is installed. See the official documentation [here](https://kind.sigs.k8s.io/docs/user/quick-start/#installation).
Expand Down
142 changes: 142 additions & 0 deletions demo/clusters/openshift/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,142 @@
# Running the NVIDIA DRA Driver on Red Hat OpenShift

This document explains the differences between deploying the NVIDIA DRA driver on OpenShift and upstream Kubernetes or its flavors.

## Prerequisites

Install a recent build of OpenShift 4.16 (e.g. 4.16.0-ec.3). You can obtain an IPI installer binary (`openshift-install`) from the [Release Status](https://amd64.ocp.releases.ci.openshift.org/) page, or use the Assisted Installer to install on bare metal. Refer to the [OpenShift documentation](https://docs.openshift.com/container-platform/4.15/installing/index.html) for different installation methods.

## Enabling DRA on OpenShift

Enable the `TechPreviewNoUpgrade` feature set as explained in [Enabling features using FeatureGates](https://docs.openshift.com/container-platform/4.15/nodes/clusters/nodes-cluster-enabling-features.html), either during the installation or post-install. The feature set includes the `DynamicResourceAllocation` feature gate.

Update the cluster scheduler to enable the DRA scheduling plugin:

```console
$ oc patch --type merge -p '{"spec":{"profile": "HighNodeUtilization", "profileCustomizations": {"dynamicResourceAllocation": "Enabled"}}}' scheduler cluster
```

## NVIDIA GPU Drivers

The easiest way to install NVIDIA GPU drivers on OpenShift nodes is via the NVIDIA GPU Operator.

**Be careful to disable the device plugin so it does not conflict with the DRA plugin**. It is recommended to leave only the NVIDIA GPU driver and driver toolkit configs, and disable everything else:

```yaml
<...>
devicePlugin:
enabled: false
<...>
driver:
enabled: true
<...>
toolkit:
enabled: true
<...>
```


The NVIDIA GPU Operator might not be available through the OperatorHub in a pre-production version of OpenShift. In this case, deploy the operator from a bundle or add a certified catalog index from an earlier version of OpenShift, e.g.:

```yaml
kind: CatalogSource
apiVersion: operators.coreos.com/v1alpha1
metadata:
name: certified-operators-v415
namespace: openshift-marketplace
spec:
displayName: Certified Operators v4.15
image: registry.redhat.io/redhat/certified-operator-index:v4.15
priority: -100
publisher: Red Hat
sourceType: grpc
updateStrategy:
registryPoll:
interval: 10m0s
```
Then follow the installation steps in [NVIDIA GPU Operator on Red Hat OpenShift Container Platform](https://docs.nvidia.com/datacenter/cloud-native/openshift/latest/index.html).
## NVIDIA Binaries on RHCOS
The location of some NVIDIA binaries on an OpenShift node differs from the defaults. Make sure to pass the following values when installing the Helm chart:
```yaml
nvidiaDriverRoot: /run/nvidia/driver
nvidiaCtkPath: /var/usrlocal/nvidia/toolkit/nvidia-ctk
```
## OpenShift Security
OpenShift generally requires more stringent security settings than Kubernetes. If you see a warning about security context constraints when deploying the DRA plugin, pass the following to the Helm chart, either via an in-line variable or a values file:
```yaml
kubeletPlugin:
containers:
plugin:
securityContext:
privileged: true
seccompProfile:
type: Unconfined
```
If you see security context constraints errors/warnings when deploying a sample workload, make sure to update the workload's security settings according to the [OpenShift documentation](https://docs.openshift.com/container-platform/4.15/operators/operator_sdk/osdk-complying-with-psa.html). Usually applying the following `securityContext` definition at a pod or container level works for non-privileged workloads.

```yaml
securityContext:
runAsNonRoot: true
seccompProfile:
type: RuntimeDefault
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
```

If you see the following error when trying to deploy a workload:

```console
Warning FailedScheduling 21m default-scheduler running Reserve plugin "DynamicResources": podschedulingcontexts.resource.k8s.io "gpu-example" is forbidden: cannot set blockOwnerDeletion if an ownerReference refers to a resource you can't set finalizers on: , <nil>
```

apply the following RBAC configuration (this should be fixed in newer OpenShift builds):

```yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: system:kube-scheduler:podfinalizers
rules:
- apiGroups:
- ""
resources:
- pods/finalizers
verbs:
- update
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: system:kube-scheduler:podfinalizers:crbinding
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:kube-scheduler:podfinalizers
subjects:
- kind: User
name: system:kube-scheduler
```

## Using Multi-Instance GPU (MIG)

Workloads that use the Multi-instance GPU (MIG) feature require MIG to be [enabled](https://docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html#enable-mig-mode) on the worker nodes with [MIG-supported GPUs](https://docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html#supported-gpus), e.g. A100.

You can do it via the driver daemon set pod running on a GPU node as follows (here, the GPU ID is 0, i.e. `-i 0`):

```console
oc exec -ti nvidia-driver-daemonset-416.94.202402160025-0-g45bd -n nvidia-gpu-operator -- nvidia-smi -i 0 -mig 1
Enabled MIG Mode for GPU 00000000:0A:00.0
All done.
```

Make sure to stop everything that may hold the GPU before enabling MIG. Otherwise you will see a warning, and the MIG status will have an asterisk (i.e. `Enabled*`), meaning that the setting could not be applied.
21 changes: 21 additions & 0 deletions demo/clusters/openshift/add-certified-catalog-source.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
#!/usr/bin/env bash

set -ex
set -o pipefail

oc create -f - <<EOF
kind: CatalogSource
apiVersion: operators.coreos.com/v1alpha1
metadata:
name: certified-operators-v415
namespace: openshift-marketplace
spec:
displayName: Certified Operators v4.15
image: registry.redhat.io/redhat/certified-operator-index:v4.15
priority: -100
publisher: Red Hat
sourceType: grpc
updateStrategy:
registryPoll:
interval: 10m0s
EOF
6 changes: 6 additions & 0 deletions demo/clusters/openshift/enable-dra-profile.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
#!/usr/bin/env bash

set -ex
set -o pipefail

oc patch --type merge -p '{"spec":{"profile": "HighNodeUtilization", "profileCustomizations": {"dynamicResourceAllocation": "Enabled"}}}' scheduler cluster
30 changes: 30 additions & 0 deletions demo/clusters/openshift/extend-kube-scheduler-rbac.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
#!/usr/bin/env bash

set -ex
set -o pipefail

oc apply -f - <<EOF
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: system:kube-scheduler:podfinalizers
rules:
- apiGroups:
- ""
resources:
- pods/finalizers
verbs:
- update
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: system:kube-scheduler:podfinalizers:crbinding
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:kube-scheduler:podfinalizers
subjects:
- kind: User
name: system:kube-scheduler
EOF

0 comments on commit c588614

Please sign in to comment.