-
Notifications
You must be signed in to change notification settings - Fork 6
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
upadted readme for task -> job changes
- Loading branch information
Showing
1 changed file
with
17 additions
and
17 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,56 +1,56 @@ | ||
# Dead Simple Queue (dSQ) | ||
|
||
Dead simple queue is a [slurm](https://slurm.schedmd.com/)-only successor to SimpleQueue. It wraps around slurm's [`sbatch`](https://slurm.schedmd.com/sbatch.html) to help you submit independent tasks as job arrays. In many ways, the array indeces look like normal jobs, but are far easier to manage. It's primary advantage over SimpleQueue is that your job allocation will only ever use the resources needed to complete the remaining tasks. `dSQ` is **not** recommended for situations where the initialiazation of the task takes most of its execution time and it is re-usable. These situations are much better handled by a worker-based job handler. | ||
Dead simple queue is a [slurm](https://slurm.schedmd.com/)-only successor to SimpleQueue. It wraps around slurm's [`sbatch`](https://slurm.schedmd.com/sbatch.html) to help you submit independent jobs as job arrays. In many ways, the array indeces look like normal jobs, but are far easier to manage. It's primary advantage over SimpleQueue is that your job allocation will only ever use the resources needed to complete the remaining jobs. `dSQ` is **not** recommended for situations where the initialiazation of the job takes most of its execution time and it is re-usable. These situations are much better handled by a worker-based job handler. | ||
|
||
## Task File: | ||
## Job File: | ||
|
||
First, you'll need to generate a task file. Each line of this task file needs to specify exactly what you want run for each task, including any modules that need to be loaded or modifications to your environment variables. Empty lines or lines that begin with `#` will be ignored when submitting your job array. **Note:** slurm jobs begin in the directory from which your job was submitted, so be wary of relative paths. This also means that you don't need to `cd` to the working directory if you submit your job there. | ||
First, you'll need to generate a job file. Each line of this job file needs to specify exactly what you want run for each job, including any modules that need to be loaded or modifications to your environment variables. Empty lines or lines that begin with `#` will be ignored when submitting your job array. **Note:** slurm jobs begin in the directory from which your job was submitted, so be wary of relative paths. This also means that you don't need to `cd` to the working directory if you submit your job there. | ||
|
||
## Usage: | ||
|
||
`dSQ.py` takes a few arguments, then passes the rest directly to sbatch, either by writing a script to stdout or by directly submitting the job for you. Without specifying any additional sbatch arguments, some defaults will be set. run `sbatch --help` or see [schedMD's sbatch documentation](https://slurm.schedmd.com/sbatch.html) for more info on sbatch options. | ||
|
||
|
||
``` | ||
dSQ.py --taskfile taskfile [dSQ args] [slurm args] | ||
dSQ.py --jobfile jobfile [dSQ args] [slurm args] | ||
Required dSQ arguments: | ||
--taskfile TASKFILE Task file, one task per line | ||
--jobfile JOBFILE Job file, one job per line | ||
Optional dSQ arguments: | ||
-h, --help show this help message and exit | ||
--version show program's version number and exit | ||
--submit Submit the job array on the fly instead of printing to stdout. | ||
--max-tasks MAX_TASKS | ||
Maximum number of simultaneously running tasks from the job array | ||
--max-jobs MAX_JOBS | ||
Maximum number of simultaneously running jobs from the job array | ||
``` | ||
|
||
## Output | ||
|
||
dSQ creates a file named `job_jobid_status.tsv` which will report the success or failure of each task as it finishes. Note this file will not contain information for any tasks that were canceled (e.g. by the user with scancel) before they began. This file contains details about the completed tasks in the following tab-separated columns: | ||
dSQ creates a file named `job_jobid_status.tsv` which will report the success or failure of each job as it finishes. Note this file will not contain information for any jobs that were canceled (e.g. by the user with scancel) before they began. This file contains details about the completed jobs in the following tab-separated columns: | ||
|
||
* Task_ID: the zero-based line number from your task file | ||
* Exit_Code: exit code returned from your task | ||
* Job_ID: the zero-based line number from your job file | ||
* Exit_Code: exit code returned from your job | ||
* Time_Started: time started, formatted as YYYY-MM-DD HH:MM:SS | ||
* Time_Ended: time started, formatted as YYYY-MM-DD HH:MM:SS | ||
* Time_Elapsed: in seconds | ||
* Task: the line from your task file | ||
* Job: the line from your job file | ||
|
||
## Autopsy Report | ||
|
||
If you would like to generate a list of tasks that did not run, either due to failure or because they were cancelled before running, use `dSQAutopsy`. Just specify your original task file and the status file generated by `dSQ`: | ||
If you would like to generate a list of jobs that did not run, either due to failure or because they were cancelled before running, use `dSQAutopsy`. Just specify your original job file and the status file generated by `dSQ`: | ||
|
||
``` | ||
usage: dSQAutopsy taskfile status.tsv | ||
usage: dSQAutopsy jobfile status.tsv | ||
A helper script for analyzing the success state of your tasks after a dSQ | ||
run has completed. Specify the taskfile and the status.tsv file generated | ||
by the dSQ job and dSQAutopsy will print the tasks that didn't run or | ||
A helper script for analyzing the success state of your jobs after a dSQ | ||
run has completed. Specify the jobfile and the status.tsv file generated | ||
by the dSQ job and dSQAutopsy will print the jobs that didn't run or | ||
completed with non-zero exit codes. It will also report count of each to | ||
stderr. | ||
positional arguments: | ||
taskfile Task file, one task per line | ||
jobfile Job file, one job per line | ||
statusfile The status.tsv file generated from your dSQ run | ||
``` | ||
|