Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

updating fairshare #691

Draft
wants to merge 6 commits into
base: gh-pages
Choose a base branch
from
Draft
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
updating fairshare
David Gaines committed Oct 4, 2024

Partially verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
We cannot verify signatures from co-authors, and some of the co-authors attributed to this commit require their commits to be signed.
commit ce5d473372ea16782ffa67925668b4ebfe2708bb
Original file line number Diff line number Diff line change
@@ -1,22 +1,41 @@
# Job Priorities on Kestrel
*Job priority on Kestrel is determined by a number of factors including queue wait time (AGE), job size, the need for limited resources (PARTITION), request for priority boost (QOS), and Fair-Share.*
*Job priority on Kestrel is determined by a number of factors including time the job is eligible to run in the queue (age),
the size of the job (jobsize), resources requested and their partition (partition), quality of service and the
associated priority (qos), and the relative fair-share of the individual allocation.*

Learn about [job partitions and scheduling policies](./index.md).

## How to View Your Job's Priority
The ```sprio``` command may be used to look at your job's priority. Priority for a job in the queue is calculated as the sum of these components:
## Job Priority & Scheduling

| Component | Contribution |
| ----------| ------------ |
| AGE | Jobs accumulate priority points per minute the job spends eligible in the queue.|
| JOBSIZE | Larger jobs have some priority advantage to allow them to accumulate needed nodes faster.|
| PARTITION | Jobs routed to partitions with special features (memory, disk, GPUs) have priority to use nodes equipped with those features.|
| QOS | Jobs associated with projects that have exceeded their annual allocation are assigned low priority.<br>Jobs associated with projects that have an allocation remaining are assigned normal priority. These jobs start before jobs with a low priority.<br>A job may request high priority using --qos=high. Jobs with high priority start before jobs with low or normal priority. Jobs with qos=high use allocated hours at 2x the normal rate.|
| FAIR-SHARE| Each projects Fair-Share value will be (Project Allocation) / (Total Kestrel Allocation). Those using less than their fair share in the last 2 weeks will have increased priority. Those using more than their fair share in the last 2 weeks will have decreased priority. |
The Slurm scheduler has two scheduling loops: (1) the main scheduling loop, which schedules jobs in strict priority order, and (2) the backfill scheduling loop, that allows lower priority jobs to be scheduled (as long as the expected start time of higher priority jobs is not affected). In both cases, Slurm schedules in strict priority, with higher priority jobs being considered first for scheduling; however, due to the resources requested or other configuration options, there may be
availability for backfill to schedule lower priority jobs (with the same caveat as before, that lower priority jobs can not
affect the expected start time of higher priority jobs).

The ```squeue --start <JOBID>``` command can be helpful in estimating when a job will run.
An individual job's priority is a combination of multiple factors: (1) age, (2) nodes requested or jobsize, (3) partition
factor, (4) quality of service (qos), and (5) the relative fair-share of the individual allocation. There is a weighting
factor associated with each of these components (shown below) that determines the relative contribution of each factor to
a job's priority.

| Component | Weighting Factor | Nominal Weight| Note |
| :---| :---: | :---: | :--- |
| AGE | 30,589,200 |4% | Jobs accumulate AGE priority while in the queue and eligible to run (up to a maximum of 14 days) |
| JOBSIZE | 221,771,700 | 29%| TO BE CHANGED
| PARTITION | 38,236,500 | 5% | Not currently implemented in Kestrel; all jobs receive max partition priority.|
| QOS | 76,473,000 | 10%| A job may request high-priority using --qos=high. Jobs with this flag selected receive maximum
| FAIR-SHARE| 397,659,600 | 55% | A project is under-served (and receives a higher fair-share priority) if the projects' usage is low relative to the size of its' allocation. There is additional complexity discussed below.|

## Fairshare

A project's fairshare is a function of: (1) the project allocation, (2) the sum of the siblings allocations, and (3) recent usage of both the project and the siblings. The top level of the fairshare tree is the allocation pool (EERE/NREL) with 85% of the machine assigned to EERE and 15% to NREL. The level fairshare for both EERE and NREL would be calculated using the following equation, where EERE and NREL are siblings:

$$Level Fairshare = \frac{S}{U}$$

where

$$S = \frac{Sraw_{self}}{Sraw_{self+siblings}}, \quad U = \frac{Uraw_{self}}{Uraw_{self+siblings}}$$

This is repeated at each level of the fairshare tree, where the siblings at each level are used in the calculation (e.g., at the EERE Office level, the siblings for EERE_WETO are the other EERE offices; within EERE_WETO, the siblings are the other projects contained in that office). Once the level fairshare calculations are complete, a ranked list is built using a depth first traversal of the fairshare tree and a projects fairshare priority is proportional its' position on this list.

The ```scontrol show job <JOBID>``` command can be useful for troubleshooting why a job is not starting.

## How to Get High Priority for a Job
You can submit your job to run at high priority or you can request a node reservation.
@@ -31,4 +50,5 @@ If you are doing work that requires real-time Kestrel access in conjunction with

Your project allocation will be charged for the entire time you have the nodes reserved, whether you use them or not.

To request a reservation, contact [HPC Help](mailto://[email protected]).
To request a reservation, contact [HPC Help](mailto://[email protected]).