scheduler: set priorityClass #974

ffromani · 2024-08-05T08:05:28Z

We currently don't set the priority for the secondary scheduler pod:

 oc get pod -n numaresources secondary-scheduler-8955769bc-c5zxh -o json | jq .spec.priority
0

but we should. We need to evaluate, the correct priority is probably system-node-critical
This change should be backported to all the active branches.

The text was updated successfully, but these errors were encountered:

shajmakh · 2024-08-07T06:25:13Z

/assign @shajmakh

ffromani · 2024-08-07T06:43:29Z

We currently don't set the priority for the secondary scheduler pod:
 oc get pod -n numaresources secondary-scheduler-8955769bc-c5zxh -o json | jq .spec.priority
0
but we should. We need to evaluate, the correct priority is probably system-node-critical This change should be backported to all the active branches.

random thoughts:

e2e test not necessary, a controller test and/or even a unit test which ensures rhe proprityClass{Name} field is set should be sufficient
need some reasearch to figure out if we want to make it system-node-critical or system-cluster-critical. We should probably set the same as (OCP?) scheduler pods
we may want to bump the priority if replica==1 (the current default) but this is debatable.

So far the scheduler priority is set to default which is 0 this is risky especially when the preemtion of pods is needed to fit more important pods. The NRS is important enough to deserve the most critical priority class system-node-critical which is the same priority for the kube-scheduler. addresses openshift-kni#974 Signed-off-by: Shereen Haj <[email protected]>

So far the scheduler priority is set to default which is 0 this is risky especially when the preemtion of pods is needed to fit more important pods. The NRS is important enough to deserve the most critical priority class system-node-critical which is the same priority for the kube-scheduler. We need this priority set always regardless how many replicas are set for the scheduler, and especially if we look to optimize the HA of the scheduler. We choose system-node-critical over system-cluster-critical because we don't want to allow SS preemption by higher-priority pods. If it was set to system-cluster-critical and an event is triggered that requires pod eviction, which would be for scheduling system-node-critical workloads, the SS would be at risk of being evicted. although this would be very rare and the evicted pod will be rescheduled, there is no convincing reason not to make it node-critical. addresses openshift-kni#974 Signed-off-by: Shereen Haj <[email protected]>

shajmakh · 2024-08-13T07:42:59Z

addressed in 4.17 (#979) and 4.16 (#980) so far

openshift-ci bot assigned shajmakh Aug 7, 2024

shajmakh mentioned this issue Aug 12, 2024

sched: controller: set scheduler priority #979

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scheduler: set priorityClass #974

scheduler: set priorityClass #974

ffromani commented Aug 5, 2024

shajmakh commented Aug 7, 2024

ffromani commented Aug 7, 2024

shajmakh commented Aug 13, 2024

scheduler: set priorityClass #974

scheduler: set priorityClass #974

Comments

ffromani commented Aug 5, 2024

shajmakh commented Aug 7, 2024

ffromani commented Aug 7, 2024

shajmakh commented Aug 13, 2024