Cluster trying to schedule all custom pods on master nodes before required cluster components are deployed #953
-
For some unknown reason, sometimes after or during an OKD upgrade (it happens again with version 4.8.0-0.okd-2021-10-24-061736) the cluster tried to deploy all custom applications/pods to the first master node first. If this happens, the cluster is not able to deploy cluster components any more to this master node and everything stuck until I restart manually some of the custom application pods. The custom pods will be than redeployed to a worker node and OKD is able to deploy the required cluster components successfully to the master node. Why OKD prefer to deploy all custom applications/pods on the master nodes and after that the master node do not have enough cpu/memory resources anymore for critical stuff like apiserver, etcd, kube-apiserver,.. Is there a way to tell OKD that all cluster components will be deployed first on any node and than the custom applications/pods and not vice versa? If this is not possible, maybe we should remove the worker role from the master nodes explained at https://access.redhat.com/solutions/4564851 |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 4 replies
-
Please attach (or upload to the public file sharing service) must-gather archive |
Beta Was this translation helpful? Give feedback.
-
Just for documentation. We decided to add the following node affinity to our custom deployments/statefulsets:
Now all custom Deployments/Statefulsets will never be deployed on nodes with label key "node-role.kubernetes.io/master". OKD/OpenShift stuff is still deployable on master nodes. It is not so restrictive as "oc patch schedulers.config.openshift.io/cluster --type merge -p '{"spec":{"mastersSchedulable":false}}'". |
Beta Was this translation helpful? Give feedback.
Just for documentation. We decided to add the following node affinity to our custom deployments/statefulsets:
Now all custom Deployments/Statefulsets will never be deployed on nodes with label key "node-role.kubernetes.io/master". OKD/OpenShift stuff is still deployable on master nodes. It is not so restrictive as "oc patch schedulers.config.openshift.io/cluster --type merge -p '{"spec":{"mastersSchedulable":false}}'".