Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scheduler: set priorityClass #974

Open
ffromani opened this issue Aug 5, 2024 · 3 comments
Open

scheduler: set priorityClass #974

ffromani opened this issue Aug 5, 2024 · 3 comments
Assignees

Comments

@ffromani
Copy link
Member

ffromani commented Aug 5, 2024

We currently don't set the priority for the secondary scheduler pod:

 oc get pod -n numaresources secondary-scheduler-8955769bc-c5zxh -o json | jq .spec.priority
0

but we should. We need to evaluate, the correct priority is probably system-node-critical
This change should be backported to all the active branches.

@shajmakh
Copy link
Member

shajmakh commented Aug 7, 2024

/assign @shajmakh

@ffromani
Copy link
Member Author

ffromani commented Aug 7, 2024

We currently don't set the priority for the secondary scheduler pod:

 oc get pod -n numaresources secondary-scheduler-8955769bc-c5zxh -o json | jq .spec.priority
0

but we should. We need to evaluate, the correct priority is probably system-node-critical This change should be backported to all the active branches.

random thoughts:

  • e2e test not necessary, a controller test and/or even a unit test which ensures rhe proprityClass{Name} field is set should be sufficient
  • need some reasearch to figure out if we want to make it system-node-critical or system-cluster-critical. We should probably set the same as (OCP?) scheduler pods
  • we may want to bump the priority if replica==1 (the current default) but this is debatable.

shajmakh added a commit to shajmakh/numaresources-operator that referenced this issue Aug 12, 2024
So far the scheduler priority is set to default which is 0 this is risky
especially when the preemtion of pods is needed to fit more important pods.

The NRS is important enough to deserve the most critical priority class
system-node-critical which is the same priority for the
kube-scheduler.

addresses openshift-kni#974

Signed-off-by: Shereen Haj <[email protected]>
shajmakh added a commit to shajmakh/numaresources-operator that referenced this issue Aug 12, 2024
So far the scheduler priority is set to default which is 0 this is risky
especially when the preemtion of pods is needed to fit more important pods.

The NRS is important enough to deserve the most critical priority class
system-node-critical which is the same priority for the
kube-scheduler. We need this priority set always regardless how many
replicas are set for the scheduler, and especially if we look to
optimize the HA of the scheduler.

We choose system-node-critical over system-cluster-critical because we don't want to allow SS preemption by higher-priority pods. If it was set to system-cluster-critical and an event is triggered that requires pod eviction, which would be for scheduling system-node-critical workloads, the SS would be at risk of being evicted. although this would be very rare and the evicted pod will be rescheduled, there is no convincing reason not to make it node-critical.

addresses openshift-kni#974

Signed-off-by: Shereen Haj <[email protected]>
@shajmakh
Copy link
Member

addressed in 4.17 (#979) and 4.16 (#980) so far

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants