You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What happened:
We required all network-operator pods to be deployed on nodes with taints. We added toleration in operator spec as well as in Toleration section for daemon sets. But These tolerations are not propagated to nvipam-controller. nvipam controller remains in pending state
What you expected to happen:
tolerations to propagate to nvipam controller also
How to reproduce it (as minimally and precisely as possible):
Deploy network operator on cluster with all nodes tainted. Add toleration in helm chart. If nvipam controller deployment is true, its pods will remain in pending state
Anything else we need to know?:
Logs:
NicClusterPolicy CR spec and state:
Output of: kubectl -n nvidia-network-operator get -A:
Network Operator version:
Logs of Network Operator controller:
Logs of the various Pods in nvidia-network-operator namespace:
Helm Configuration (if applicable):
Kubernetes' nodes information (labels, annotations and status): kubectl get node -o yaml:
Environment:
Kubernetes version (use kubectl version): N/A
Hardware configuration: N/A
Network adapter model and firmware version:
OS (e.g: cat /etc/os-release): N/A
Kernel (e.g. uname -a):N/A
Others:N/A
The text was updated successfully, but these errors were encountered:
This setting does not change tolerations for controllers deployed by the network-operator, such as nv-ipam-controller and ib-kubernetes.
These controllers have the following tolerations:
I agree that adding support for custom tolerations and nodeAffinity for controllers could be beneficial. This feature will likely be implemented as a separate setting.
The issue has been converted to an enhancement, as the controller currently functions as expected.
@ykulazhenkov Thanks! Due to security constraints, there are situation when all nodes in cluster are required to have taints.
Having one deployment as odd one out of all pods run by network operator which does not support adding toleration felt like a bug to me. A miss rather than by design.
Hopefully we can fix it soon so that we can use network operator without any workarounds
What happened:
We required all network-operator pods to be deployed on nodes with taints. We added toleration in operator spec as well as in Toleration section for daemon sets. But These tolerations are not propagated to nvipam-controller. nvipam controller remains in pending state
What you expected to happen:
tolerations to propagate to nvipam controller also
How to reproduce it (as minimally and precisely as possible):
Deploy network operator on cluster with all nodes tainted. Add toleration in helm chart. If nvipam controller deployment is true, its pods will remain in pending state
Anything else we need to know?:
Logs:
kubectl -n nvidia-network-operator get -A
:nvidia-network-operator
namespace:kubectl get node -o yaml
:Environment:
kubectl version
): N/Acat /etc/os-release
): N/Auname -a
):N/AThe text was updated successfully, but these errors were encountered: