Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to set node affinity and/or tolerations on RKE-deployed addon manifests #1529

Open
Oats87 opened this issue Aug 2, 2019 · 14 comments

Comments

@Oats87
Copy link
Contributor

Oats87 commented Aug 2, 2019

RKE does not currently allow you to set tolerations or affinity rules on the deployment/daemonset manifests that it deploys, for components like CoreDNS, kube-dns, nginx-ingress, etc.

This is a critical feature required in order to allow for architecting robust highly available clusters when planning for minimizing noisy neighbor problems.

Related issues to this are:
#1066
#1365

@ekristen
Copy link

Big +1 to get this added ASAP. Thanks!

@olegTarassov
Copy link

+1 for us as well. We have master nodes that act as etcd + controlplane node but run our critical stuf like coredns and ingress on those 2 nodes. Easier to create Load balancing and troubleshooting.

Given this limitation, we spin cluster using RKE then modify the daemonset config to allow tolerations for etcd + controlepane and with the node_selection set, the pods lift on the master nodes. the downside, we never touch RKE after the deployments to not overwrite daemonset configs.

hope this makes it soon enough.

@deniseschannon
Copy link

With the ability to add taints to nodes adding in RKE v0.3.0 (#157), all system add-ons will now be set with a wildcard toleration for the worker nodes. The taints will be used to request that these add-ons are not scheduled onto these nodes.

By introducing/adding node selector (rancher/rancher#22447) for the remaining RKE add-ons, you can use node selectors to determine which nodes these system add-ons should be scheduled to.

With the combination of taints and node tolerations, you will be able to plan out which nodes your system add-ons should be deployed to.

I am closing this issue in favor of the other two issues listed above.

@Oats87
Copy link
Contributor Author

Oats87 commented Aug 27, 2019

@deniseschannon what issue did you keep open for this?

@Oats87 Oats87 reopened this Aug 28, 2019
@Oats87
Copy link
Contributor Author

Oats87 commented Aug 28, 2019

I don't believe that rancher/rancher#22447 or #157 solve this issue, so I am reopening it.

@alena1108 alena1108 added this to the v0.4.x - Backlog milestone Sep 4, 2019
@alena1108 alena1108 modified the milestones: v1.1.x, v0.3.x - Backlog Oct 10, 2019
@deniseschannon deniseschannon removed this from the v1.1 - Rancher v2.4 milestone Feb 10, 2020
@olegTarassov
Copy link

Updating to latest discovery.
Investigated CoreDNS and Ingress-Nginx deployments and found that tolerations were good:

e.g CoreDNS Deployment

tolerations:
      - key: CriticalAddonsOnly
        operator: Exists
      - effect: NoExecute
        operator: Exists
      - effect: NoSchedule
        operator: Exists

...

nodeSelector:
        beta.kubernetes.io/os: linux
        dns_nodes: coredns

The problem is in the Affinity declaration:

    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: node-role.kubernetes.io/worker
                operator: Exists

Setting this to - key: node-role.kubernetes.io/controlplane resolved the issue.

@dankeder
Copy link

I wonder why is there NoExecute and NoSchedule in tolerations?

tolerations:
  - effect: NoExecute
    operator: Exists
  - effect: NoSchedule
    operator: Exists

If I understand it correctly it means that coredns may be scheduled on cordoned nodes which may become unavailable e.g. due to the OS upgrade or other system maintenance. Or am I missing something?

@superseb
Copy link
Contributor

@dankeder
Copy link

That's actually great, thank you for pointing into the right direction.

@tboussar
Copy link

With Terraform provider rancher/rancher2 1.17 it is now possible to set tolerations and node selector.
We would like to schedule these addons on our controlplane nodes but the nodeAffinity value is not configurable.
We'd love to see this land in a next version of this provider.

@aslafy-z
Copy link
Contributor

Any updates on this topic please?

@H3rman8
Copy link

H3rman8 commented Nov 8, 2023

Really would like to be able to schedule some add-ons like Nginx-Ingress on controlplanes nodes.
Nowadays 4GB memory for a "ControlPlane & etcd" node is insufficient for production k8s clusters and the next thing cloud-providers offer is a 8GB memory option, which leaves enough resources to also run stuff like Nginx.

The nodeAffinity which only allow the Workers role is blocking this. And like stated before the nodeSelector feature is only working on Workers and therefor no solution.

So why not wasting resources and put this topic on the roadmap asap?

@serverbaboon
Copy link

serverbaboon commented Dec 21, 2023

The ability to override the affinity would be really useful for us where we scaled down worker nodes at night and allowing DNS to run on a control plane speeds up the cluster startup in the morning as CoreDNS is critical to startup.

Seems silly to deploy a cluster and then edit the Coredns Deployment manually afterwards to remove this affinity,

Of course thdn make it available in the Terraform Provider as well

@rustyshackleford96
Copy link

Any updates on this please?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests