-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test queues #1172
Comments
@matt-chan instead of excluding nodes from partition, I'm thinking on have a parameter to define how many cores/VM should always been on for each queue/partitions. |
Hi Xavier, Yes I think that behavior would be best if we could achieve it, but I'm not certain it's possible. I originally tried to make a PR before making this feature request but I couldn't figure out how to do it. I'm not sure Cyclecloud and slurm have that functionality. Your team is definitely better at this stuff than I am. If you can figure it out, it would be a great feature! Just to make sure we're on the same page, it is the number of idle VMs we want to keep in each queue right? So if there are 5 jobs, and the idle setting is 2 VMs, there should be 7 VMs running in total? |
@matt-chan the way it works is that it will always keep x number of nodes always running. If they are filled by jobs then new nodes will be added up to the quota define for that queue/partition. |
@xpillons I now implemented a simple solution for this.
The admin can set a Let me know if you are interested in a PR for this P.S. This creates one extra job every 5 minutes per queue. There may be more "official" ways of doing this via the slurm config https://slurm.schedmd.com/power_save.html#config but it already does the job |
@ltalirz sounds a great start. Need to be run on the scheduler. Also ideally it should read the config file and pickup partition names, number of nodes to allocate. |
This is already how it works; the cronjob is - name: set up cronjob for queue warmup
cron:
name: "queue-warmup"
job: "/usr/local/sbin/queue-warmup.sh {{ warmup_queues | map(attribute='name') | join(' ') }}"
minute: "*/5"
weekday: 1-5
user: "root"
state: "present"
vars:
warmup_queues: "{{ queues | selectattr('warmup', 'defined') | selectattr('warmup', 'equalto', true) }}" Modification for keeping >1 warm nodes will require some modifications (more touching of nodes needed) but should be doable I guess. In practice, 1 idling node (at all times) is already a great improvement in user experience and often all you need. |
In what area(s)?
Describe the feature
Hi Xavier,
It would be great if we could set a few test queues in azhop. This would let our users run quick jobs without having to wait for node spinup time.
Currently, I'm approximating the behavior by setting a large idle time on some queues, but would be nice to have a setting which actually keeps the nodes alive forever using the slurm setting here: https://learn.microsoft.com/en-us/azure/cyclecloud/slurm?view=cyclecloud-8#excluding-a-partition. Also another common feature of these test queues is a short job timelimit. I don't see a way to set this from cyclecloud right now though, even though it is in /etc/slurm/cyclecloud.conf.
Thanks!
Matt
The text was updated successfully, but these errors were encountered: