diff --git a/deployment/network-operator/README.md b/deployment/network-operator/README.md index 30c0a4340..55043514f 100644 --- a/deployment/network-operator/README.md +++ b/deployment/network-operator/README.md @@ -1,7 +1,6 @@ # Nvidia Network Operator Helm Chart -Nvidia Network Operator Helm Chart provides an easy way to install, configure and manage the lifecycle of Nvidia -Mellanox network operator. +Nvidia Network Operator Helm Chart provides an easy way to install and manage the lifecycle of Nvidia network operator. ## Nvidia Network Operator @@ -136,7 +135,7 @@ To install development version of Network Operator you need to clone repository local directory: ``` -# Clone Network Operatro Repository +# Clone Network Operator Repository $ git clone https://github.com/Mellanox/network-operator.git # Update chart dependencies @@ -183,28 +182,6 @@ Replace `SVCNAME` with the SVC name follows this convention -webho This command will generate a new RSA key pair with 2048 bits and create a self-signed certificate (`server.crt`) and private key (`server.key`) that are valid for 365 days. -## Helm Tests - -Network Operator has Helm tests to verify deployment. To run tests it is required to set the following chart parameters -on helm install/upgrade: `deployCR`, `rdmaSharedDevicePlugin`, `secondaryNetwork` as the test depends -on `NicClusterPolicy` instance being deployed by Helm. Supported Tests: - -- Device Plugin Resource: This test creates a pod that requests the first resource in `rdmaSharedDevicePlugin.resources` -- RDMA Traffic: This test creates a pod that test loopback RDMA traffic with `rping` - -Run the helm test with following command after deploying network operator with helm - -``` -$ helm test -n network-operator network-operator --timeout=5m -``` - -Notes: - -- Test will keeping running endlessly if pod creating failed so it is recommended to use `--timeout` which fails test - after exceeding given timeout -- Default PF to run test is `ens2f0` to override it add `--set test.pf=` to `helm install/upgrade` -- Tests should be executed after `NicClusterPolicy` custom resource state is `Ready` -- In case of a test failed it is possible to collect the logs with `kubectl logs -n ` ## Upgrade @@ -224,8 +201,7 @@ helm search repo nvidia/network-operator -l ### Upgrade CRDs to compatible version -The network-operator helm chart contains a hook(pre-install, pre-upgrade) -that will automatically upgrade required CRDs in the cluster. +The network-operator helm chart contains a hook(pre-install, pre-upgrade) that will automatically upgrade required CRDs in the cluster. The hook is enabled by default. If you don't want to upgrade CRDs with helm automatically, you can disable auto upgrade by setting `upgradeCRDs: false` in the helm chart values. Then you can follow the guide below to download and apply CRDs for the concrete version of the network-operator. @@ -252,21 +228,7 @@ Download Helm values for the specific release helm show values nvidia/network-operator --version= > values-.yaml ``` -Edit `values-.yaml` file as required for your cluster. The network operator has some limitations about which -updates in NicClusterPolicy it can handle automatically. If the configuration for the new release is different from the -current configuration in the deployed release, then some additional manual actions may be required. - -Known limitations: - -- If component configuration was removed from the NicClusterPolicy, then manual clean up of the component's resources - (DaemonSets, ConfigMaps, etc.) may be required -- If configuration for devicePlugin changed without image upgrade, then manual restart of the devicePlugin may be - required - -These limitations will be addressed in future releases. - -> __NOTE__: changes which were made directly in NicClusterPolicy CR (e.g. with `kubectl edit`) -> will be overwritten by Helm upgrade due to the `force` flag. +Edit `values-.yaml` file as required for your cluster. ### Apply Helm chart update @@ -276,85 +238,6 @@ helm upgrade -n network-operator network-operator nvidia/network-operator --ver > __NOTE__: `--devel` option required if you want to use the beta release -### Enable automatic upgrade for containerized OFED driver (recommended) - -> __NOTE__: this operation is required only if **containerized OFED** is in use - -Check [Automatic OFED upgrade](../../docs/automatic-ofed-upgrade.md) document for more details. - -### OR manually restart PODs with containerized OFED driver - -> __NOTE__: this operation is required only if **containerized OFED** is in use - -When containerized OFED driver reloaded on the node, all PODs which use secondary network based on NVIDIA Mellanox NICs -will lose network interface in their containers. To prevent outage you need to remove all PODs which use secondary -network from the node before you reload the driver POD on it. - -Helm upgrade command will just upgrade DaemonSet spec of the OFED driver to point to the new driver version. The OFED -driver's DaemonSet will not automatically restart PODs with the driver on the nodes because it uses "OnDelete" -updateStrategy. The old OFED version will still run on the node until you explicitly remove the driver POD or reboot the -node. - -It is possible to remove all PODs with secondary networks from all cluster nodes and then restart OFED PODs on all nodes -at once. - -The alternative option is to do upgrade in a rolling manner to reduce the impact of the driver upgrade on the cluster. -The driver POD restart can be done on each node individually. In this case, PODs with secondary networks should be -removed from the single node only, no need to stop PODs on all nodes. - -Recommended sequence to reload the driver on the node: - -_For each node follow these steps_ - -- [Remove PODs with secondary network from the node](#remove-pods-with-secondary-network-from-the-node) -- [Restart OFED driver POD](#restart-ofed-driver-pod) -- [Return PODs with secondary network to the node](#return-pods-with-secondary-network-to-the-node) - -_When the OFED driver becomes ready, proceed with the same steps for other nodes_ - -#### Remove PODs with secondary network from the node - -This can be done with node drain command: - -``` -kubectl drain --pod-selector= -``` - -> __NOTE__: replace with `-l "network.nvidia.com/operator.mofed.wait=false"` if you -> want to drain all nodes at once - -#### Restart OFED driver POD - -Find OFED driver POD name for the node - -``` -kubectl get pod -l app=mofed- -o wide -A -``` - -_example for Ubuntu 20.04: `kubectl get pod -l app=mofed-ubuntu20.04 -o wide -A`_ - -Delete OFED driver POD from the node - -``` -kubectl delete pod -n -``` - -> __NOTE__: replace with `-l app=mofed-ubuntu20.04` if you -> want to remove OFED PODs on all nodes at once - -New version of the OFED POD will automatically start. - -#### Return PODs with secondary network to the node - -After OFED POD is ready on the node you can make node schedulable again. - -The command below will uncordon (remove `node.kubernetes.io/unschedulable:NoSchedule` taint) -the node and return PODs to it. - -``` -kubectl uncordon -l "network.nvidia.com/operator.mofed.wait=false" -``` - ## Chart parameters In order to tailor the deployment of the network operator to your cluster needs, Chart parameters are available. See official [documentation](https://docs.nvidia.com/networking/software/cloud-orchestration/index.html). diff --git a/deployment/network-operator/templates/_helpers.tpl b/deployment/network-operator/templates/_helpers.tpl index 813bd8f78..198e47ad3 100644 --- a/deployment/network-operator/templates/_helpers.tpl +++ b/deployment/network-operator/templates/_helpers.tpl @@ -83,165 +83,4 @@ imagePullSecrets helpers {{- end }} {{- end }} {{- $imagePullSecrets | toJson }} -{{- end }} - -{{- define "network-operator.ofed.imagePullSecrets" }} -{{- $imagePullSecrets := list }} -{{- if .Values.ofedDriver.imagePullSecrets }} -{{- range .Values.ofedDriver.imagePullSecrets }} -{{- $imagePullSecrets = append $imagePullSecrets . }} -{{- end }} -{{- else }} -{{- if .Values.imagePullSecrets }} -{{- range .Values.imagePullSecrets }} -{{- $imagePullSecrets = append $imagePullSecrets . }} -{{- end }} -{{- end }} -{{- end }} -{{- $imagePullSecrets | toJson }} -{{- end }} - -{{- define "network-operator.rdmaSharedDevicePlugin.imagePullSecrets" }} -{{- $imagePullSecrets := list }} -{{- if .Values.rdmaSharedDevicePlugin.imagePullSecrets }} -{{- range .Values.rdmaSharedDevicePlugin.imagePullSecrets }} -{{- $imagePullSecrets = append $imagePullSecrets . }} -{{- end }} -{{- else }} -{{- if .Values.imagePullSecrets }} -{{- range .Values.imagePullSecrets }} -{{- $imagePullSecrets = append $imagePullSecrets . }} -{{- end }} -{{- end }} -{{- end }} -{{- $imagePullSecrets | toJson }} -{{- end }} - -{{- define "network-operator.sriovDevicePlugin.imagePullSecrets" }} -{{- $imagePullSecrets := list }} -{{- if .Values.sriovDevicePlugin.imagePullSecrets }} -{{- range .Values.sriovDevicePlugin.imagePullSecrets }} -{{- $imagePullSecrets = append $imagePullSecrets . }} -{{- end }} -{{- else }} -{{- if .Values.imagePullSecrets }} -{{- range .Values.imagePullSecrets }} -{{- $imagePullSecrets = append $imagePullSecrets . }} -{{- end }} -{{- end }} -{{- end }} -{{- $imagePullSecrets | toJson }} -{{- end }} - -{{- define "network-operator.ibKubernetes.imagePullSecrets" }} -{{- $imagePullSecrets := list }} -{{- if .Values.ibKubernetes.imagePullSecrets }} -{{- range .Values.ibKubernetes.imagePullSecrets }} -{{- $imagePullSecrets = append $imagePullSecrets . }} -{{- end }} -{{- else }} -{{- if .Values.imagePullSecrets }} -{{- range .Values.imagePullSecrets }} -{{- $imagePullSecrets = append $imagePullSecrets . }} -{{- end }} -{{- end }} -{{- end }} -{{- $imagePullSecrets | toJson }} -{{- end }} - -{{- define "network-operator.secondaryNetwork.cniPlugins.imagePullSecrets" }} -{{- $imagePullSecrets := list }} -{{- if .Values.secondaryNetwork.cniPlugins.imagePullSecrets }} -{{- range .Values.secondaryNetwork.cniPlugins.imagePullSecrets }} -{{- $imagePullSecrets = append $imagePullSecrets . }} -{{- end }} -{{- else }} -{{- if .Values.imagePullSecrets }} -{{- range .Values.imagePullSecrets }} -{{- $imagePullSecrets = append $imagePullSecrets . }} -{{- end }} -{{- end }} -{{- end }} -{{- $imagePullSecrets | toJson }} -{{- end }} - -{{- define "network-operator.secondaryNetwork.multus.imagePullSecrets" }} -{{- $imagePullSecrets := list }} -{{- if .Values.secondaryNetwork.multus.imagePullSecrets }} -{{- range .Values.secondaryNetwork.multus.imagePullSecrets }} -{{- $imagePullSecrets = append $imagePullSecrets . }} -{{- end }} -{{- else }} -{{- if .Values.imagePullSecrets }} -{{- range .Values.imagePullSecrets }} -{{- $imagePullSecrets = append $imagePullSecrets . }} -{{- end }} -{{- end }} -{{- end }} -{{- $imagePullSecrets | toJson }} -{{- end }} - -{{- define "network-operator.secondaryNetwork.ipamPlugin.imagePullSecrets" }} -{{- $imagePullSecrets := list }} -{{- if .Values.secondaryNetwork.ipamPlugin.imagePullSecrets }} -{{- range .Values.secondaryNetwork.ipamPlugin.imagePullSecrets }} -{{- $imagePullSecrets = append $imagePullSecrets . }} -{{- end }} -{{- else }} -{{- if .Values.imagePullSecrets }} -{{- range .Values.imagePullSecrets }} -{{- $imagePullSecrets = append $imagePullSecrets . }} -{{- end }} -{{- end }} -{{- end }} -{{- $imagePullSecrets | toJson }} -{{- end }} - -{{- define "network-operator.nvIpam.imagePullSecrets" }} -{{- $imagePullSecrets := list }} -{{- if .Values.nvIpam.imagePullSecrets }} -{{- range .Values.nvIpam.imagePullSecrets }} -{{- $imagePullSecrets = append $imagePullSecrets . }} -{{- end }} -{{- else }} -{{- if .Values.imagePullSecrets }} -{{- range .Values.imagePullSecrets }} -{{- $imagePullSecrets = append $imagePullSecrets . }} -{{- end }} -{{- end }} -{{- end }} -{{- $imagePullSecrets | toJson }} -{{- end }} - -{{- define "network-operator.nicFeatureDiscovery.imagePullSecrets" }} -{{- $imagePullSecrets := list }} -{{- if .Values.nicFeatureDiscovery.imagePullSecrets }} -{{- range .Values.nicFeatureDiscovery.imagePullSecrets }} -{{- $imagePullSecrets = append $imagePullSecrets . }} -{{- end }} -{{- else }} -{{- if .Values.imagePullSecrets }} -{{- range .Values.imagePullSecrets }} -{{- $imagePullSecrets = append $imagePullSecrets . }} -{{- end }} -{{- end }} -{{- end }} -{{- $imagePullSecrets | toJson }} -{{- end }} - -{{- define "network-operator.docaTelemetryService.imagePullSecrets" }} -{{- $imagePullSecrets := list }} -{{- if .Values.docaTelemetryService.imagePullSecrets }} -{{- range .Values.docaTelemetryService.imagePullSecrets }} -{{- $imagePullSecrets = append $imagePullSecrets . }} -{{- end }} -{{- else }} -{{- if .Values.imagePullSecrets }} -{{- range .Values.imagePullSecrets }} -{{- $imagePullSecrets = append $imagePullSecrets . }} -{{- end }} -{{- end }} -{{- end }} -{{- $imagePullSecrets | toJson }} -{{- end }} - +{{- end }} \ No newline at end of file diff --git a/deployment/network-operator/templates/mellanox.com_v1alpha1_nicclusterpolicy_cr.yaml b/deployment/network-operator/templates/mellanox.com_v1alpha1_nicclusterpolicy_cr.yaml deleted file mode 100644 index fa3f25df7..000000000 --- a/deployment/network-operator/templates/mellanox.com_v1alpha1_nicclusterpolicy_cr.yaml +++ /dev/null @@ -1,241 +0,0 @@ -{{/* - Copyright 2020 NVIDIA - - Licensed under the Apache License, Version 2.0 (the "License"); - you may not use this file except in compliance with the License. - You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - - Unless required by applicable law or agreed to in writing, software - distributed under the License is distributed on an "AS IS" BASIS, - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - See the License for the specific language governing permissions and - limitations under the License. -*/}} -{{- if .Values.deployCR }} -apiVersion: mellanox.com/v1alpha1 -kind: NicClusterPolicy -metadata: - name: nic-cluster-policy -spec: - {{- if .Values.nodeAffinity }} - nodeAffinity: -{{ toYaml .Values.nodeAffinity | indent 4 }} - {{- end }} - {{- if .Values.tolerations }} - tolerations: -{{ toYaml .Values.tolerations | indent 4 }} - {{- end }} - {{- if .Values.ofedDriver.deploy }} - ofedDriver: - image: {{ .Values.ofedDriver.image }} - repository: {{ .Values.ofedDriver.repository }} - version: {{ .Values.ofedDriver.version }} - forcePrecompiled: {{ .Values.ofedDriver.forcePrecompiled }} - {{- if .Values.ofedDriver.env }} - env: - {{ toYaml .Values.ofedDriver.env | nindent 6 }} - {{- end }} - {{- if .Values.ofedDriver.certConfig.name }} - certConfig: - name: {{ .Values.ofedDriver.certConfig.name }} - {{- end }} - {{- if .Values.ofedDriver.repoConfig.name }} - repoConfig: - name: {{ .Values.ofedDriver.repoConfig.name }} - {{- end }} - imagePullSecrets: {{ include "network-operator.ofed.imagePullSecrets" . }} - {{- if .Values.ofedDriver.containerResources }} - containerResources: {{ toYaml .Values.ofedDriver.containerResources | nindent 6 }} - {{- end }} - terminationGracePeriodSeconds: {{ .Values.ofedDriver.terminationGracePeriodSeconds }} - startupProbe: - initialDelaySeconds: {{ .Values.ofedDriver.startupProbe.initialDelaySeconds }} - periodSeconds: {{ .Values.ofedDriver.startupProbe.periodSeconds }} - livenessProbe: - initialDelaySeconds: {{ .Values.ofedDriver.livenessProbe.initialDelaySeconds }} - periodSeconds: {{ .Values.ofedDriver.livenessProbe.periodSeconds }} - readinessProbe: - initialDelaySeconds: {{ .Values.ofedDriver.readinessProbe.initialDelaySeconds }} - periodSeconds: {{ .Values.ofedDriver.readinessProbe.periodSeconds }} - {{- if .Values.ofedDriver.upgradePolicy }} - upgradePolicy: - autoUpgrade: {{ .Values.ofedDriver.upgradePolicy.autoUpgrade | default false }} - maxParallelUpgrades: {{ .Values.ofedDriver.upgradePolicy.maxParallelUpgrades | default 0 }} - safeLoad: {{ .Values.ofedDriver.upgradePolicy.safeLoad | default false }} - {{- if .Values.ofedDriver.upgradePolicy.drain }} - drain: - enable: {{ .Values.ofedDriver.upgradePolicy.drain.enable | default true }} - force: {{ .Values.ofedDriver.upgradePolicy.drain.force | default false }} - podSelector: {{ .Values.ofedDriver.upgradePolicy.drain.podSelector | quote }} - timeoutSeconds: {{ .Values.ofedDriver.upgradePolicy.drain.timeoutSeconds }} - deleteEmptyDir: {{ .Values.ofedDriver.upgradePolicy.drain.deleteEmptyDir | default false}} - {{- end }} - {{- if .Values.ofedDriver.upgradePolicy.waitForCompletion }} - waitForCompletion: - podSelector: {{ .Values.ofedDriver.upgradePolicy.waitForCompletion.podSelector | default ""}} - timeoutSeconds: {{ .Values.ofedDriver.upgradePolicy.waitForCompletion.timeoutSeconds | default 0 }} - {{- end }} - {{- end }} - {{- end }} - {{- if .Values.rdmaSharedDevicePlugin.deploy }} - rdmaSharedDevicePlugin: - # {{ required "A valid value for .Values.rdmaSharedDevicePlugin.resources is required" .Values.rdmaSharedDevicePlugin.resources }} - image: {{ .Values.rdmaSharedDevicePlugin.image }} - repository: {{ .Values.rdmaSharedDevicePlugin.repository }} - version: {{ .Values.rdmaSharedDevicePlugin.version }} - imagePullSecrets: {{ include "network-operator.rdmaSharedDevicePlugin.imagePullSecrets" . }} - {{- if .Values.rdmaSharedDevicePlugin.useCdi }} - useCdi: {{ .Values.rdmaSharedDevicePlugin.useCdi }} - {{- end }} - # The config below directly propagates to k8s-rdma-shared-device-plugin configuration. - # Replace 'devices' with your (RDMA capable) netdevice name. - config: | - { - "configList": [ - {{- $length := len .Values.rdmaSharedDevicePlugin.resources }} - {{- range $index, $element := .Values.rdmaSharedDevicePlugin.resources }} - { - "resourceName": {{ $element.name | quote }}, - "rdmaHcaMax": {{ $element.rdmaHcaMax | default 63 }}, - "selectors": { - "vendors": {{ $element.vendors | default list | toJson }}, - "deviceIDs": {{ $element.deviceIDs | default list | toJson }}, - "drivers": {{ $element.drivers | default list | toJson }}, - "ifNames": {{ $element.ifNames | default list | toJson }}, - "linkTypes": {{ $element.linkTypes | default list | toJson }} - } - } {{- if ne $length (add1 $index) }},{{ end }} - {{- end }} - ] - } - {{- if .Values.rdmaSharedDevicePlugin.containerResources }} - containerResources: {{ toYaml .Values.rdmaSharedDevicePlugin.containerResources | nindent 6 }} - {{- end }} - {{- end }} - {{- if .Values.sriovDevicePlugin.deploy }} - sriovDevicePlugin: - image: {{ .Values.sriovDevicePlugin.image }} - repository: {{ .Values.sriovDevicePlugin.repository }} - version: {{ .Values.sriovDevicePlugin.version }} - imagePullSecrets: {{ include "network-operator.sriovDevicePlugin.imagePullSecrets" . }} - {{- if .Values.sriovDevicePlugin.useCdi }} - useCdi: {{ .Values.sriovDevicePlugin.useCdi }} - {{- end }} - config: | - { - "resourceList": [ - {{- $length := len .Values.sriovDevicePlugin.resources }} - {{- range $index, $element := .Values.sriovDevicePlugin.resources }} - { - "resourcePrefix": "nvidia.com", - "resourceName": {{ $element.name | quote }}, - "selectors": { - "vendors": {{ $element.vendors | default list | toJson }}, - "devices": {{ $element.devices | default list | toJson }}, - "drivers": {{ $element.drivers | default list | toJson }}, - "pfNames": {{ $element.pfNames | default list | toJson }}, - "pciAddresses": {{ $element.pciAddresses | default list | toJson }}, - "rootDevices": {{ $element.rootDevices | default list | toJson }}, - "linkTypes": {{ $element.linkTypes | default list | toJson }}, - "isRdma": true - } - } {{- if ne $length (add1 $index) }},{{ end }} - {{- end }} - ] - } - {{- if .Values.sriovDevicePlugin.containerResources }} - containerResources: {{ toYaml .Values.sriovDevicePlugin.containerResources | nindent 6 }} - {{- end }} - {{- end }} - {{- if .Values.ibKubernetes.deploy }} - ibKubernetes: - image: {{ .Values.ibKubernetes.image }} - repository: {{ .Values.ibKubernetes.repository }} - version: {{ .Values.ibKubernetes.version }} - imagePullSecrets: {{ include "network-operator.ibKubernetes.imagePullSecrets" . }} - {{- if .Values.ibKubernetes.containerResources }} - containerResources: {{ toYaml .Values.ibKubernetes.containerResources | nindent 6 }} - {{- end }} - pKeyGUIDPoolRangeStart: {{ .Values.ibKubernetes.pKeyGUIDPoolRangeStart }} - pKeyGUIDPoolRangeEnd: {{ .Values.ibKubernetes.pKeyGUIDPoolRangeEnd }} - ufmSecret: {{ .Values.ibKubernetes.ufmSecret | quote }} - {{- end }} - {{- if .Values.secondaryNetwork.deploy }} - secondaryNetwork: - {{- if .Values.secondaryNetwork.cniPlugins.deploy }} - cniPlugins: - image: {{ .Values.secondaryNetwork.cniPlugins.image }} - repository: {{ .Values.secondaryNetwork.cniPlugins.repository }} - version: {{ .Values.secondaryNetwork.cniPlugins.version }} - imagePullSecrets: {{ include "network-operator.secondaryNetwork.cniPlugins.imagePullSecrets" . }} - {{- if .Values.secondaryNetwork.cniPlugins.containerResources }} - containerResources: {{ toYaml .Values.secondaryNetwork.cniPlugins.containerResources | nindent 8 }} - {{- end }} - {{- end }} - {{- if .Values.secondaryNetwork.multus.deploy }} - multus: - image: {{ .Values.secondaryNetwork.multus.image }} - repository: {{ .Values.secondaryNetwork.multus.repository }} - version: {{ .Values.secondaryNetwork.multus.version }} - imagePullSecrets: {{ include "network-operator.secondaryNetwork.multus.imagePullSecrets" . }} - {{- if .Values.secondaryNetwork.multus.containerResources }} - containerResources: {{ toYaml .Values.secondaryNetwork.multus.containerResources | nindent 8 }} - {{- end }} - {{- if .Values.secondaryNetwork.multus.config | empty | not }} - config: {{ .Values.secondaryNetwork.multus.config | quote }} - {{- end }} - {{- end }} - {{- if .Values.secondaryNetwork.ipoib.deploy }} - ipoib: - image: {{ .Values.secondaryNetwork.ipoib.image }} - repository: {{ .Values.secondaryNetwork.ipoib.repository }} - version: {{ .Values.secondaryNetwork.ipoib.version }} - {{- if .Values.secondaryNetwork.ipoib.containerResources }} - containerResources: {{ toYaml .Values.secondaryNetwork.ipoib.containerResources | nindent 8 }} - {{- end }} - {{- end }} - {{- if .Values.secondaryNetwork.ipamPlugin.deploy }} - ipamPlugin: - image: {{ .Values.secondaryNetwork.ipamPlugin.image }} - repository: {{ .Values.secondaryNetwork.ipamPlugin.repository }} - version: {{ .Values.secondaryNetwork.ipamPlugin.version }} - imagePullSecrets: {{ include "network-operator.secondaryNetwork.ipamPlugin.imagePullSecrets" . }} - {{- if .Values.secondaryNetwork.ipamPlugin.containerResources }} - containerResources: {{ toYaml .Values.secondaryNetwork.ipamPlugin.containerResources | nindent 8 }} - {{- end }} - {{- end }} - {{- end }} - {{- if .Values.nvIpam.deploy }} - nvIpam: - image: {{ .Values.nvIpam.image }} - repository: {{ .Values.nvIpam.repository }} - version: {{ .Values.nvIpam.version }} - imagePullSecrets: {{ include "network-operator.nvIpam.imagePullSecrets" . }} - {{- if .Values.nvIpam.containerResources }} - containerResources: {{ toYaml .Values.nvIpam.containerResources | nindent 6 }} - {{- end }} - enableWebhook: {{ .Values.nvIpam.enableWebhook }} - {{- end }} - {{- if .Values.nicFeatureDiscovery.deploy }} - nicFeatureDiscovery: - image: {{ .Values.nicFeatureDiscovery.image }} - repository: {{ .Values.nicFeatureDiscovery.repository }} - version: {{ .Values.nicFeatureDiscovery.version }} - imagePullSecrets: {{ include "network-operator.nicFeatureDiscovery.imagePullSecrets" . }} - {{- if .Values.nicFeatureDiscovery.containerResources }} - containerResources: {{ toYaml .Values.nicFeatureDiscovery.containerResources | nindent 6 }} - {{- end }} - {{- end }} - {{- if .Values.docaTelemetryService.deploy }} - docaTelemetryService: - image: {{ .Values.docaTelemetryService.image }} - repository: {{ .Values.docaTelemetryService.repository }} - version: {{ .Values.docaTelemetryService.version }} - imagePullSecrets: {{ include "network-operator.docaTelemetryService.imagePullSecrets" . }} - {{- if .Values.docaTelemetryService.containerResources }} - containerResources: {{ toYaml .Values.docaTelemetryService.containerResources | nindent 6 }} - {{- end }} - {{- end }} -{{ end }} diff --git a/deployment/network-operator/templates/operator.yaml b/deployment/network-operator/templates/operator.yaml index 698c9619f..23997b64e 100644 --- a/deployment/network-operator/templates/operator.yaml +++ b/deployment/network-operator/templates/operator.yaml @@ -84,9 +84,9 @@ spec: - name: CNI_BIN_DIR value: "{{ .Values.operator.cniBinDirectory }}" {{- end }} - {{- if and .Values.ofedDriver.initContainer .Values.ofedDriver.initContainer.enable }} + {{- if and .Values.operator.ofedDriver.initContainer .Values.operator.ofedDriver.initContainer.enable }} - name: OFED_INIT_CONTAINER_IMAGE - {{- with .Values.ofedDriver.initContainer }} + {{- with .Values.operator.ofedDriver.initContainer }} value: "{{ .repository }}/{{ .image }}:{{ .version }}" {{- end }} {{- end }} diff --git a/deployment/network-operator/templates/tests/test-rping.yaml b/deployment/network-operator/templates/tests/test-rping.yaml deleted file mode 100644 index 90de7ad54..000000000 --- a/deployment/network-operator/templates/tests/test-rping.yaml +++ /dev/null @@ -1,49 +0,0 @@ -{{- if and .Values.deployCR .Values.rdmaSharedDevicePlugin .deploy .Values.secondaryNetwork.deploy .Values.secondaryNetwork.multus.deploy .Values.secondaryNetwork.cniPlugins.deploy }} -apiVersion: mellanox.com/v1alpha1 -kind: MacvlanNetwork -metadata: - name: "{{ .Release.Name }}-rping-macvlan-network-test" - annotations: - helm.sh/hook: test - helm.sh/hook-delete-policy: before-hook-creation,hook-succeeded,hook-failed -spec: - networkNamespace: "{{ .Release.Namespace }}" - mode: "bridge" - master: "{{ .Values.test.pf }}" - mtu: 1500 - ipam: | - { - "type": "static", - "addresses": [{ "address": "10.10.0.1/24" }] - } ---- -apiVersion: v1 -kind: Pod -metadata: - name: "{{ .Release.Name }}-rping-test" - annotations: - helm.sh/hook: test - helm.sh/hook-delete-policy: before-hook-creation,hook-succeeded - k8s.v1.cni.cncf.io/networks: "{{ .Release.Name }}-rping-macvlan-network-test" -spec: - restartPolicy: Never - containers: - - image: mellanox/rping-test - name: "{{ .Release.Name }}-rping-test" - securityContext: - capabilities: - add: [ "IPC_LOCK" ] - resources: - requests: - rdma/{{ (index .Values.rdmaSharedDevicePlugin.resources 0).name }}: '1' - limits: - rdma/{{ (index .Values.rdmaSharedDevicePlugin.resources 0).name }}: '1' - command: - - sh - - -c - - | - ls -l /dev/infiniband /sys/class/net /sys/class/infiniband - ip addr show net1 - rping -svd & - rping -cvd -a 10.10.0.1 -C 1 -{{- end }} diff --git a/deployment/network-operator/values.yaml b/deployment/network-operator/values.yaml index 3c9c84d68..3e4e2bc3a 100644 --- a/deployment/network-operator/values.yaml +++ b/deployment/network-operator/values.yaml @@ -258,345 +258,21 @@ operator: # MHcl4wOuDwKQa+upc8GftXE2C//4mKANBC6It01gUaTIpo= # ... # -----END EC PRIVATE KEY----- + ofedDriver: + initContainer: + # -- Deploy init container. + enable: true + # -- Init container image repository. + repository: ghcr.io/mellanox + # -- Init container image name. + image: network-operator-init-container + # -- Init container image version. + version: v0.0.2 # -- An optional list of references to secrets to use for pulling any of the # Network Operator images. imagePullSecrets: [] -# NicClusterPolicy CR values: -# -- Deploy ``NicClusterPolicy`` custom resource according to the provided parameters. -deployCR: false -ofedDriver: - # -- Deploy the NVIDIA DOCA Driver driver container. - deploy: false - # -- NVIDIA DOCA Driver image name. - image: doca-driver - # -- NVIDIA DOCA Driver image repository. - repository: nvcr.io/nvstaging/mellanox - # -- NVIDIA DOCA Driver version. - version: 24.10-0.4.6.0-0 - initContainer: - # -- Deploy init container. - enable: true - # -- Init container image repository. - repository: ghcr.io/mellanox - # -- Init container image name. - image: network-operator-init-container - # -- Init container image version. - version: v0.0.2 - # imagePullSecrets: [] - # env, if defined will pass environment variables to the OFED container - # env: - # - name: EXAMPLE_ENV_VAR - # value: example_env_var_value - # containerResources: - # - name: "mofed-container" - # requests: - # cpu: "200m" - # memory: "150Mi" - # limits: - # cpu: "300m" - # memory: "300Mi" - # -- The grace period before the driver containeris forcibly removed. - terminationGracePeriodSeconds: 300 - # -- Private mirror repository configuration. - # @notationType -- yaml - repoConfig: - name: "" - # Custom ssl key/certificate configuration. - certConfig: - # -- Custom TLS key/certificate configuration configMap name. - name: "" - startupProbe: - # -- NVIDIA DOCA Driver startup probe initial delay. - initialDelaySeconds: 10 - # -- NVIDIA DOCA Driver startup probe interval. - periodSeconds: 20 - livenessProbe: - # -- NVIDIA DOCA Driver liveness probe initial delay. - initialDelaySeconds: 30 - # -- NVIDIA DOCA Driver liveness probe interval. - periodSeconds: 30 - readinessProbe: - # -- NVIDIA DOCA Driver readiness probe initial delay. - initialDelaySeconds: 10 - # -- NVIDIA DOCA Driver readiness probe interval. - periodSeconds: 30 - upgradePolicy: - # -- Global switch for automatic upgrade feature, - # if set to false all other options are ignored. - autoUpgrade: true - # -- Number of nodes that can be upgraded in parallel (default: 1). - # 0 means no limit, all nodes will be upgraded in parallel. - maxParallelUpgrades: 1 - # -- Cordon and drain (if enabled) a node before loading the driver on it. - safeLoad: false - # -- Options for node drain (`kubectl drain`) before the driver reload. - # If auto upgrade is enabled but drain.enable is false, then driver POD will be - # reloaded immediately without removing PODs from the node. - # @notationType -- yaml - drain: - # -- Options for node drain (``kubectl drain``) before driver reload, if - # auto upgrade is enabled. - enable: true - # -- Use force drain of pods. - force: true - # -- Pod selector to specify which pods will be drained from the node. - # An empty selector means all pods. - podSelector: "" - # -- It's recommended to set a timeout to avoid infinite drain in case - # non-fatal error keeps happening on retries. - timeoutSeconds: 300 - # -- Delete pods local storage. - deleteEmptyDir: true - waitForCompletion: - # specifies a label selector for the pods to wait for completion - # podSelector: "app=myapp" - # specify the length of time in seconds to wait before giving up for - # workload to finish, zero means infinite - # timeoutSeconds: 300 - # -- Fail Mellanox OFED deployment if precompiled OFED driver container image - # does not exists. - forcePrecompiled: false - -rdmaSharedDevicePlugin: - # -- Deploy RDMA shared device plugin. - deploy: true - # -- RDMA shared device plugin image name. - image: k8s-rdma-shared-dev-plugin - # -- RDMA shared device plugin image repository. - repository: ghcr.io/mellanox - # -- RDMA shared device plugin version. - version: sha-4f3eb2224b8b5f97be3f17441ddee8d41753b7d5 - # -- Enable Container Device Interface (CDI) mode. - # **NOTE**: NVIDIA Network Operator does not configure container runtime to - # enable CDI. - useCdi: false - # imagePullSecrets: [] - # containerResources: - # - name: "rdma-shared-dp" - # requests: - # cpu: "100m" - # memory: "50Mi" - # limits: - # cpu: "150m" - # memory: "100Mi" - # -- The following defines the RDMA resources in the cluster. - # It must be provided by the user when deploying the chart. - # Each entry in the resources element will create a resource with the provided - # and list of devices. - # @notationType -- yaml - resources: - - name: rdma_shared_device_a - vendors: [15b3] - rdmaHcaMax: 63 - -sriovDevicePlugin: - # -- Deploy SR-IOV Network device plugin. - deploy: false - # -- SR-IOV Network device plugin image name. - image: sriov-network-device-plugin - # -- SR-IOV Network device plugin image repository. - repository: ghcr.io/k8snetworkplumbingwg - # -- SR-IOV Network device plugin version - version: v3.7.0 - # -- Enable Container Device Interface (CDI) mode. - # **NOTE**: NVIDIA Network Operator does not configure container runtime to - # enable CD. - useCdi: false - # imagePullSecrets: [] - # containerResources: - # - name: "kube-sriovdp" - # requests: - # cpu: "100m" - # memory: "50Mi" - # limits: - # cpu: "150m" - # memory: "100Mi" - # Each entry in the resources elements will be an entry in a ``resourceList`` - # created as part of the ``kube-sriovdp`` container configuration. - # @notationType -- yaml - resources: - - name: hostdev - vendors: [15b3] - -ibKubernetes: - # -- Deploy IB Kubernetes. - deploy: false - # -- IB Kubernetes image name. - image: ib-kubernetes - # -- IB Kubernetes image repository. - repository: ghcr.io/mellanox - # -- IB Kubernetes version. - version: v1.1.0 - # imagePullSecrets: [] - # containerResources: - # - name: "ib-kubernetes" - # requests: - # cpu: "100m" - # memory: "300Mi" - # limits: - # cpu: "100m" - # memory: "300Mi" - # -- Interval of periodic update in seconds. - periodicUpdateSeconds: 5 - # -- Minimal available GUID value to be allocated for the pod. - pKeyGUIDPoolRangeStart: "02:00:00:00:00:00:00:00" - # -- Maximal available GUID value to be allocated for the pod. - pKeyGUIDPoolRangeEnd: "02:FF:FF:FF:FF:FF:FF:FF" - # -- Name of the Secret with the NVIDIA UFM access credentials, deployed in advance. - ufmSecret: '' # specify the secret name here - -nvIpam: - # -- Deploy NVIDIA IPAM Plugin. - deploy: true - # -- NVIDIA IPAM Plugin image name. - image: nvidia-k8s-ipam - # -- NVIDIA IPAM Plugin image repository. - repository: ghcr.io/mellanox - # -- NVIDIA IPAM Plugin image version. - version: v0.2.0 - # -- Enable deployment of the validataion webhook for IPPool CRD. - enableWebhook: false - # imagePullSecrets: [] - # containerResources: - # - name: "nv-ipam-node" - # requests: - # cpu: "150m" - # memory: "50Mi" - # limits: - # cpu: "300m" - # memory: "300Mi" - # - name: "nv-ipam-controller" - # requests: - # cpu: "150m" - # memory: "50Mi" - # limits: - # cpu: "300m" - # memory: "300Mi" - -secondaryNetwork: - # -- Deploy Secondary Network. - deploy: true - cniPlugins: - # -- Deploy CNI Plugins Secondary Network. - deploy: true - # -- CNI Plugins image name. - image: plugins - # -- CNI Plugins image repository. - repository: ghcr.io/k8snetworkplumbingwg - # -- CNI Plugins image version. - version: v1.5.0 - # imagePullSecrets: [] - # containerResources: - # - name: "cni-plugins" - # requests: - # cpu: "100m" - # memory: "50Mi" - # limits: - # cpu: "100m" - # memory: "50Mi" - multus: - # -- Deploy Multus Secondary Network. - deploy: true - # -- Multus image name. - image: multus-cni - # -- Multus image repository. - repository: ghcr.io/k8snetworkplumbingwg - # -- Multus image version. - version: v4.1.0 - # imagePullSecrets: [] - # config: '' - # containerResources: - # - name: "kube-multus" - # requests: - # cpu: "100m" - # memory: "50Mi" - # limits: - # cpu: "100m" - # memory: "50Mi" - ipoib: - # -- Deploy IPoIB CNI. - deploy: false - # -- IPoIB CNI image name. - image: ipoib-cni - # -- IPoIB CNI image repository. - repository: ghcr.io/mellanox - # -- IPoIB CNI image version. - version: v1.2.0 - # imagePullSecrets: [] - # containerResources: - # - name: "ipoib-cni" - # requests: - # cpu: "100m" - # memory: "50Mi" - # limits: - # cpu: "100m" - # memory: "50Mi" - ipamPlugin: - # -- Deploy IPAM CNI Plugin Secondary Network. - deploy: false - # -- IPAM CNI Plugin image name. - image: whereabouts - # -- IPAM CNI Plugin image repository. - repository: ghcr.io/k8snetworkplumbingwg - # -- IPAM CNI Plugin image version. - version: v0.7.0 - # imagePullSecrets: [] - # containerResources: - # - name: "whereabouts" - # requests: - # cpu: "100m" - # memory: "100Mi" - # limits: - # cpu: "100m" - # memory: "200Mi" - -nicFeatureDiscovery: - # -- Deploy NVIDIA NIC Feature Discovery. - deploy: false - # -- NVIDIA NIC Feature Discovery image name. - image: nic-feature-discovery - # -- NVIDIA NIC Feature Discovery repository. - repository: ghcr.io/mellanox - # -- NVIDIA NIC Feature Discovery image version. - version: v0.0.1 - # imagePullSecrets: [] - # containerResources: - # - name: "nic-feature-discovery" - # requests: - # cpu: "100m" - # memory: "50Mi" - # limits: - # cpu: "300m" - # memory: "150Mi" - -docaTelemetryService: - # -- Deploy DOCA Telemetry Service. - deploy: false - # -- DOCA Telemetry Service image name. - image: doca_telemetry - # -- DOCA Telemetry Service image repository. - repository: nvcr.io/nvidia/doca - # -- DOCA Telemetry Service image version. - version: 1.16.5-doca2.6.0-host - # imagePullSecrets: [] - # containerResources: - # - name: "doca-telemetry-service" - # requests: - # cpu: "100m" - # memory: "50Mi" - # limits: - # cpu: "300m" - # memory: "150Mi" - -# Can be set to nicclusterpolicy and override other ds node affinity, -# e.g. https://github.com/Mellanox/network-operator/blob/master/manifests/state-multus-cni/0050-multus-ds.yml#L26-L36 -#nodeAffinity: - -# Can be set to nicclusterpolicy to add extra tolerations to ds -#tolerations: - # @ignore test: pf: ens2f0 diff --git a/hack/templates/values/values.template b/hack/templates/values/values.template index 7de0e6cd9..46397d143 100644 --- a/hack/templates/values/values.template +++ b/hack/templates/values/values.template @@ -258,345 +258,21 @@ operator: # MHcl4wOuDwKQa+upc8GftXE2C//4mKANBC6It01gUaTIpo= # ... # -----END EC PRIVATE KEY----- + ofedDriver: + initContainer: + # -- Deploy init container. + enable: true + # -- Init container image repository. + repository: ghcr.io/mellanox + # -- Init container image name. + image: network-operator-init-container + # -- Init container image version. + version: v0.0.2 # -- An optional list of references to secrets to use for pulling any of the # Network Operator images. imagePullSecrets: [] -# NicClusterPolicy CR values: -# -- Deploy ``NicClusterPolicy`` custom resource according to the provided parameters. -deployCR: false -ofedDriver: - # -- Deploy the NVIDIA DOCA Driver driver container. - deploy: false - # -- NVIDIA DOCA Driver image name. - image: {{ .Mofed.Image }} - # -- NVIDIA DOCA Driver image repository. - repository: {{ .Mofed.Repository }} - # -- NVIDIA DOCA Driver version. - version: {{ .Mofed.Version }} - initContainer: - # -- Deploy init container. - enable: true - # -- Init container image repository. - repository: {{ .NetworkOperatorInitContainer.Repository }} - # -- Init container image name. - image: {{ .NetworkOperatorInitContainer.Image }} - # -- Init container image version. - version: {{ .NetworkOperatorInitContainer.Version }} - # imagePullSecrets: [] - # env, if defined will pass environment variables to the OFED container - # env: - # - name: EXAMPLE_ENV_VAR - # value: example_env_var_value - # containerResources: - # - name: "mofed-container" - # requests: - # cpu: "200m" - # memory: "150Mi" - # limits: - # cpu: "300m" - # memory: "300Mi" - # -- The grace period before the driver containeris forcibly removed. - terminationGracePeriodSeconds: 300 - # -- Private mirror repository configuration. - # @notationType -- yaml - repoConfig: - name: "" - # Custom ssl key/certificate configuration. - certConfig: - # -- Custom TLS key/certificate configuration configMap name. - name: "" - startupProbe: - # -- NVIDIA DOCA Driver startup probe initial delay. - initialDelaySeconds: 10 - # -- NVIDIA DOCA Driver startup probe interval. - periodSeconds: 20 - livenessProbe: - # -- NVIDIA DOCA Driver liveness probe initial delay. - initialDelaySeconds: 30 - # -- NVIDIA DOCA Driver liveness probe interval. - periodSeconds: 30 - readinessProbe: - # -- NVIDIA DOCA Driver readiness probe initial delay. - initialDelaySeconds: 10 - # -- NVIDIA DOCA Driver readiness probe interval. - periodSeconds: 30 - upgradePolicy: - # -- Global switch for automatic upgrade feature, - # if set to false all other options are ignored. - autoUpgrade: true - # -- Number of nodes that can be upgraded in parallel (default: 1). - # 0 means no limit, all nodes will be upgraded in parallel. - maxParallelUpgrades: 1 - # -- Cordon and drain (if enabled) a node before loading the driver on it. - safeLoad: false - # -- Options for node drain (`kubectl drain`) before the driver reload. - # If auto upgrade is enabled but drain.enable is false, then driver POD will be - # reloaded immediately without removing PODs from the node. - # @notationType -- yaml - drain: - # -- Options for node drain (``kubectl drain``) before driver reload, if - # auto upgrade is enabled. - enable: true - # -- Use force drain of pods. - force: true - # -- Pod selector to specify which pods will be drained from the node. - # An empty selector means all pods. - podSelector: "" - # -- It's recommended to set a timeout to avoid infinite drain in case - # non-fatal error keeps happening on retries. - timeoutSeconds: 300 - # -- Delete pods local storage. - deleteEmptyDir: true - waitForCompletion: - # specifies a label selector for the pods to wait for completion - # podSelector: "app=myapp" - # specify the length of time in seconds to wait before giving up for - # workload to finish, zero means infinite - # timeoutSeconds: 300 - # -- Fail Mellanox OFED deployment if precompiled OFED driver container image - # does not exists. - forcePrecompiled: false - -rdmaSharedDevicePlugin: - # -- Deploy RDMA shared device plugin. - deploy: true - # -- RDMA shared device plugin image name. - image: {{ .RdmaSharedDevicePlugin.Image }} - # -- RDMA shared device plugin image repository. - repository: {{ .RdmaSharedDevicePlugin.Repository }} - # -- RDMA shared device plugin version. - version: {{ .RdmaSharedDevicePlugin.Version }} - # -- Enable Container Device Interface (CDI) mode. - # **NOTE**: NVIDIA Network Operator does not configure container runtime to - # enable CDI. - useCdi: false - # imagePullSecrets: [] - # containerResources: - # - name: "rdma-shared-dp" - # requests: - # cpu: "100m" - # memory: "50Mi" - # limits: - # cpu: "150m" - # memory: "100Mi" - # -- The following defines the RDMA resources in the cluster. - # It must be provided by the user when deploying the chart. - # Each entry in the resources element will create a resource with the provided - # and list of devices. - # @notationType -- yaml - resources: - - name: rdma_shared_device_a - vendors: [15b3] - rdmaHcaMax: 63 - -sriovDevicePlugin: - # -- Deploy SR-IOV Network device plugin. - deploy: false - # -- SR-IOV Network device plugin image name. - image: {{ .SriovDevicePlugin.Image }} - # -- SR-IOV Network device plugin image repository. - repository: {{ .SriovDevicePlugin.Repository }} - # -- SR-IOV Network device plugin version - version: {{ .SriovDevicePlugin.Version }} - # -- Enable Container Device Interface (CDI) mode. - # **NOTE**: NVIDIA Network Operator does not configure container runtime to - # enable CD. - useCdi: false - # imagePullSecrets: [] - # containerResources: - # - name: "kube-sriovdp" - # requests: - # cpu: "100m" - # memory: "50Mi" - # limits: - # cpu: "150m" - # memory: "100Mi" - # Each entry in the resources elements will be an entry in a ``resourceList`` - # created as part of the ``kube-sriovdp`` container configuration. - # @notationType -- yaml - resources: - - name: hostdev - vendors: [15b3] - -ibKubernetes: - # -- Deploy IB Kubernetes. - deploy: false - # -- IB Kubernetes image name. - image: {{ .IbKubernetes.Image }} - # -- IB Kubernetes image repository. - repository: {{ .IbKubernetes.Repository }} - # -- IB Kubernetes version. - version: {{ .IbKubernetes.Version }} - # imagePullSecrets: [] - # containerResources: - # - name: "ib-kubernetes" - # requests: - # cpu: "100m" - # memory: "300Mi" - # limits: - # cpu: "100m" - # memory: "300Mi" - # -- Interval of periodic update in seconds. - periodicUpdateSeconds: 5 - # -- Minimal available GUID value to be allocated for the pod. - pKeyGUIDPoolRangeStart: "02:00:00:00:00:00:00:00" - # -- Maximal available GUID value to be allocated for the pod. - pKeyGUIDPoolRangeEnd: "02:FF:FF:FF:FF:FF:FF:FF" - # -- Name of the Secret with the NVIDIA UFM access credentials, deployed in advance. - ufmSecret: '' # specify the secret name here - -nvIpam: - # -- Deploy NVIDIA IPAM Plugin. - deploy: true - # -- NVIDIA IPAM Plugin image name. - image: {{ .NvIPAM.Image }} - # -- NVIDIA IPAM Plugin image repository. - repository: {{ .NvIPAM.Repository }} - # -- NVIDIA IPAM Plugin image version. - version: {{ .NvIPAM.Version }} - # -- Enable deployment of the validataion webhook for IPPool CRD. - enableWebhook: false - # imagePullSecrets: [] - # containerResources: - # - name: "nv-ipam-node" - # requests: - # cpu: "150m" - # memory: "50Mi" - # limits: - # cpu: "300m" - # memory: "300Mi" - # - name: "nv-ipam-controller" - # requests: - # cpu: "150m" - # memory: "50Mi" - # limits: - # cpu: "300m" - # memory: "300Mi" - -secondaryNetwork: - # -- Deploy Secondary Network. - deploy: true - cniPlugins: - # -- Deploy CNI Plugins Secondary Network. - deploy: true - # -- CNI Plugins image name. - image: {{ .CniPlugins.Image }} - # -- CNI Plugins image repository. - repository: {{ .CniPlugins.Repository }} - # -- CNI Plugins image version. - version: {{ .CniPlugins.Version }} - # imagePullSecrets: [] - # containerResources: - # - name: "cni-plugins" - # requests: - # cpu: "100m" - # memory: "50Mi" - # limits: - # cpu: "100m" - # memory: "50Mi" - multus: - # -- Deploy Multus Secondary Network. - deploy: true - # -- Multus image name. - image: {{ .Multus.Image }} - # -- Multus image repository. - repository: {{ .Multus.Repository }} - # -- Multus image version. - version: {{ .Multus.Version }} - # imagePullSecrets: [] - # config: '' - # containerResources: - # - name: "kube-multus" - # requests: - # cpu: "100m" - # memory: "50Mi" - # limits: - # cpu: "100m" - # memory: "50Mi" - ipoib: - # -- Deploy IPoIB CNI. - deploy: false - # -- IPoIB CNI image name. - image: {{ .Ipoib.Image }} - # -- IPoIB CNI image repository. - repository: {{ .Ipoib.Repository }} - # -- IPoIB CNI image version. - version: {{ .Ipoib.Version }} - # imagePullSecrets: [] - # containerResources: - # - name: "ipoib-cni" - # requests: - # cpu: "100m" - # memory: "50Mi" - # limits: - # cpu: "100m" - # memory: "50Mi" - ipamPlugin: - # -- Deploy IPAM CNI Plugin Secondary Network. - deploy: false - # -- IPAM CNI Plugin image name. - image: {{ .IpamPlugin.Image }} - # -- IPAM CNI Plugin image repository. - repository: {{ .IpamPlugin.Repository }} - # -- IPAM CNI Plugin image version. - version: {{ .IpamPlugin.Version }} - # imagePullSecrets: [] - # containerResources: - # - name: "whereabouts" - # requests: - # cpu: "100m" - # memory: "100Mi" - # limits: - # cpu: "100m" - # memory: "200Mi" - -nicFeatureDiscovery: - # -- Deploy NVIDIA NIC Feature Discovery. - deploy: false - # -- NVIDIA NIC Feature Discovery image name. - image: {{ .NicFeatureDiscovery.Image }} - # -- NVIDIA NIC Feature Discovery repository. - repository: {{ .NicFeatureDiscovery.Repository }} - # -- NVIDIA NIC Feature Discovery image version. - version: {{ .NicFeatureDiscovery.Version }} - # imagePullSecrets: [] - # containerResources: - # - name: "nic-feature-discovery" - # requests: - # cpu: "100m" - # memory: "50Mi" - # limits: - # cpu: "300m" - # memory: "150Mi" - -docaTelemetryService: - # -- Deploy DOCA Telemetry Service. - deploy: false - # -- DOCA Telemetry Service image name. - image: {{ .DOCATelemetryService.Image }} - # -- DOCA Telemetry Service image repository. - repository: {{ .DOCATelemetryService.Repository }} - # -- DOCA Telemetry Service image version. - version: {{ .DOCATelemetryService.Version }} - # imagePullSecrets: [] - # containerResources: - # - name: "doca-telemetry-service" - # requests: - # cpu: "100m" - # memory: "50Mi" - # limits: - # cpu: "300m" - # memory: "150Mi" - -# Can be set to nicclusterpolicy and override other ds node affinity, -# e.g. https://github.com/Mellanox/network-operator/blob/master/manifests/state-multus-cni/0050-multus-ds.yml#L26-L36 -#nodeAffinity: - -# Can be set to nicclusterpolicy to add extra tolerations to ds -#tolerations: - # @ignore test: pf: ens2f0 diff --git a/pkg/state/continuity_check_test.go b/pkg/state/continuity_check_test.go deleted file mode 100644 index 395c32342..000000000 --- a/pkg/state/continuity_check_test.go +++ /dev/null @@ -1,157 +0,0 @@ -/* -2023 NVIDIA CORPORATION & AFFILIATES - -Licensed under the Apache License, Version 2.0 (the "License"); -you may not use this file except in compliance with the License. -You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. -*/ - -package state - -import ( - "bufio" - "bytes" - "os" - "path/filepath" - "sort" - "strings" - - . "github.com/onsi/ginkgo/v2" - . "github.com/onsi/gomega" - "gopkg.in/yaml.v3" - - mellanoxv1alpha1 "github.com/Mellanox/network-operator/api/v1alpha1" - "github.com/Mellanox/network-operator/pkg/config" -) - -func extractContainerNamesFromHelmChart(path string) ([]string, error) { - //nolint:gosec - data, err := os.ReadFile(path) - if err != nil { - return nil, err - } - - var parsedData map[string]interface{} - err = yaml.Unmarshal(uncommentContainerResources(data), &parsedData) - if err != nil { - return nil, err - } - - containerNames := extractContainerNamesFromSubsection(parsedData) - - return containerNames, nil -} - -// uncommentContainerResources iterates through the document and removes '#' comments from containerResources sections -func uncommentContainerResources(fileData []byte) []byte { - var result bytes.Buffer - scanner := bufio.NewScanner(bytes.NewReader(fileData)) - var processNextLines bool - - for scanner.Scan() { - line := scanner.Text() - - if strings.TrimSpace(line) == "# containerResources:" { - // Remove "# " and set flag to process next lines - line = strings.Replace(line, "# ", "", 1) - processNextLines = true - } else if processNextLines && strings.HasPrefix(strings.TrimSpace(line), "# ") { - // For subsequent lines starting with "# ", remove "# " - line = strings.Replace(line, "# ", "", 1) - } else { - processNextLines = false - } - - result.WriteString(line + "\n") - } - - return result.Bytes() -} - -//nolint:gocognit -func extractContainerNamesFromSubsection(data interface{}) []string { - var names []string - - switch v := data.(type) { - case []interface{}: - for _, item := range v { - names = append(names, extractContainerNamesFromSubsection(item)...) - } - case map[string]interface{}: - for key, value := range v { - if key == "containerResources" { - if resources, ok := value.([]interface{}); ok { - for _, resource := range resources { - if resMap, ok := resource.(map[string]interface{}); ok { - if name, ok := resMap["name"].(string); ok { - names = append(names, name) - } - } - } - } - } else { - names = append(names, extractContainerNamesFromSubsection(value)...) - } - } - } - - return names -} - -var _ = Describe("Continuity check", func() { - - Context("Resource requirements", func() { - It("Resource requirements from helm chart should cover all deployable containers", func() { - wd, err := os.Getwd() - Expect(err).NotTo(HaveOccurred()) - - chartPath := filepath.Join(wd, "..", "..", "deployment", "network-operator", "values.yaml") - - namesFromChart, err := extractContainerNamesFromHelmChart(chartPath) - Expect(err).NotTo(HaveOccurred()) - - var namesFromManifests []string - - cr := &mellanoxv1alpha1.NicClusterPolicy{} - cr.Name = "nic-cluster-policy" - imageSpec := mellanoxv1alpha1.ImageSpec{Image: "image", Repository: "", Version: "version"} - imageSpecWithConfig := mellanoxv1alpha1.ImageSpecWithConfig{ImageSpec: imageSpec} - cr.Spec.IBKubernetes = &mellanoxv1alpha1.IBKubernetesSpec{ImageSpec: imageSpec} - cr.Spec.OFEDDriver = &mellanoxv1alpha1.OFEDDriverSpec{ImageSpec: imageSpec} - cr.Spec.RdmaSharedDevicePlugin = &mellanoxv1alpha1.DevicePluginSpec{ImageSpecWithConfig: imageSpecWithConfig} - cr.Spec.SriovDevicePlugin = &mellanoxv1alpha1.DevicePluginSpec{ImageSpecWithConfig: imageSpecWithConfig} - cr.Spec.NvIpam = &mellanoxv1alpha1.NVIPAMSpec{ImageSpec: imageSpec} - cr.Spec.NicFeatureDiscovery = &mellanoxv1alpha1.NICFeatureDiscoverySpec{ImageSpec: imageSpec} - cr.Spec.SecondaryNetwork = &mellanoxv1alpha1.SecondaryNetworkSpec{} - cr.Spec.SecondaryNetwork.CniPlugins = &imageSpec - cr.Spec.SecondaryNetwork.IpamPlugin = &imageSpec - cr.Spec.SecondaryNetwork.IPoIB = &imageSpec - cr.Spec.SecondaryNetwork.Multus = &mellanoxv1alpha1.MultusSpec{ImageSpecWithConfig: imageSpecWithConfig} - cr.Spec.DOCATelemetryService = &mellanoxv1alpha1.DOCATelemetryServiceSpec{ImageSpec: imageSpec} - - manifestsBaseDir := filepath.Join("..", "..", "manifests") - envConfig = &config.OperatorConfig{State: config.StateConfig{ManifestBaseDir: manifestsBaseDir}} - states, err := newNicClusterPolicyStates(nil) - Expect(err).NotTo(HaveOccurred()) - - for _, state := range states { - names, err := ParseContainerNames(state.(ManifestRenderer), cr, testLogger) - Expect(err).NotTo(HaveOccurred()) - namesFromManifests = append(namesFromManifests, names...) - } - - sort.Strings(namesFromChart) - sort.Strings(namesFromManifests) - Expect(namesFromChart).To(Equal(namesFromManifests)) - - }) - }) -})