Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tolerations are not propagated to nvipam controller #1172

Open
arpitsardhana opened this issue Nov 19, 2024 · 2 comments
Open

Tolerations are not propagated to nvipam controller #1172

arpitsardhana opened this issue Nov 19, 2024 · 2 comments
Labels
enhancement New feature or request

Comments

@arpitsardhana
Copy link

What happened:
We required all network-operator pods to be deployed on nodes with taints. We added toleration in operator spec as well as in Toleration section for daemon sets. But These tolerations are not propagated to nvipam-controller. nvipam controller remains in pending state

What you expected to happen:
tolerations to propagate to nvipam controller also

How to reproduce it (as minimally and precisely as possible):
Deploy network operator on cluster with all nodes tainted. Add toleration in helm chart. If nvipam controller deployment is true, its pods will remain in pending state

Anything else we need to know?:

Logs:

  • NicClusterPolicy CR spec and state:
  • Output of: kubectl -n nvidia-network-operator get -A:
  • Network Operator version:
  • Logs of Network Operator controller:
  • Logs of the various Pods in nvidia-network-operator namespace:
  • Helm Configuration (if applicable):
  • Kubernetes' nodes information (labels, annotations and status): kubectl get node -o yaml:

Environment:

  • Kubernetes version (use kubectl version): N/A
  • Hardware configuration: N/A
    • Network adapter model and firmware version:
  • OS (e.g: cat /etc/os-release): N/A
  • Kernel (e.g. uname -a):N/A
  • Others:N/A
@arpitsardhana arpitsardhana added the bug Something isn't working label Nov 19, 2024
@ykulazhenkov
Copy link
Collaborator

Hi @arpitsardhana,

The tolerations field in the NicClusterPolicy affects only the DaemonSets. This behavior is documented in the description of the field:

Tolerations []v1.Toleration `json:"tolerations,omitempty"`

This setting does not change tolerations for controllers deployed by the network-operator, such as nv-ipam-controller and ib-kubernetes.
These controllers have the following tolerations:


Currently, it is not possible to modify them.

I agree that adding support for custom tolerations and nodeAffinity for controllers could be beneficial. This feature will likely be implemented as a separate setting.

The issue has been converted to an enhancement, as the controller currently functions as expected.

@ykulazhenkov ykulazhenkov added enhancement New feature or request and removed bug Something isn't working labels Nov 20, 2024
@arpitsardhana
Copy link
Author

@ykulazhenkov Thanks! Due to security constraints, there are situation when all nodes in cluster are required to have taints.
Having one deployment as odd one out of all pods run by network operator which does not support adding toleration felt like a bug to me. A miss rather than by design.

Hopefully we can fix it soon so that we can use network operator without any workarounds

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants