-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Configurable tolerations #194
Comments
@jennydaman so you want pman to set toleration on pods instead of adding them in pod-definition.yml .. I hope I am getting this correct.. |
To rephrase my question in the p.s.: pman is supposed to be a common interface over kubernetes, swarm, SLURM, ... so scheduler-specific configuration is antithetical to its intention. |
@Prakashh21 what/where is |
I used pod-definition.yml just as a reference name , yes what I meant was scheduler specific configuration in setting tolerations , here we want pman to set tolerations on pods right? |
|
I still don't understand what you mean by
Closely related issue: being able to configure pman with a set of affinity labels. Using tolerations and affinities, we can deploy multiple pman instances which correspond to different configurations, e.g. one pman will prefer low-CPU, high-memory, another pman will prefer high CPU, high memory, ... *GPU-intensive does not necessarily mean graphically intensive, e.g. machine learning |
@jennydaman pod-definition.yml is the configuration/manifests/specification of the pods which are to be scheduled on the cluster nodes , tolerations are set on pods defined in their manifests that is what I was saying , and pod-defination.yml was the example name of the manifests , I hope I was clear.. |
so what you're saying is , we'll have multiple instances of pman , each would prefer to schedule pods on different set of nodes (catering to different types of work loads) through set tolerations and affinities , this sounds cool , but tell me this if we have multiple instances of pman then how would pfcon know to which pman instance it should send the job description...? will this be defined in the job description it self ... or....... ? |
GPU nodes are tainted with
PreferNoSchedule
. They can still be scheduled to if a pod specs its containers withresoureces.limits['nvidia.com/gpu'] = 1
, but it would be better ifpman
can be configured to conditionally set tolerations on jobs.https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/
p.s. Per-compute env config is starting to get unwieldy such as with swarm and kubernetes, how can we manage this more concisely?
The text was updated successfully, but these errors were encountered: