Run many jobs on multiple GPUs, so I can enjoy my weekends while GPUs work :P
Currently, we do a Producer-Consumer pattern where the producer reads a text file of commands. One command per line and feeds into a queue. There are atleast n consumer processes, one per GPU which read from the queue and run the command on the corresponding GPU. For each job, the STDOUT
and STDERR
outputs are stored in a folder with the job_id
.
Currently, we support only static number of GPUs
Currently, hog is not hosted on PyPI (hopefully will be done soon).
Alternative would be to clone this repo, add it to it your PATH
variable and call from there.
We currently support running hog as a CLI command
A basic use case would be
hog --job_file foo.txt --gpus 1,2,3
This would read jobs from foo.txt
and run them on GPUs 1
, 2
and 3
concurrently.
See the section on Format of Job File for more information on how to write the job file.
job_file
file to read jobs fromjob_yielder
file withyielder
method that generates jobs programmaticallygpus
comma-separated IDs of GPUs to use. Can run more than one concurrent per GPUoutput_dir
Directory to store outputs from runs. Defaults tohog_run
prefix
prefix to attach to each per-job folder name. Defaults tojob_
so you will have folders namedjob_0
,job_1
... underoutput_dir
We have the --job_yielder
flag that allows users to define their method to generate jobs instead of using a job_file
. To use this, define a method named yielder
in another file, say test.py
and call hog as below
hog --job_yielder test.py ...other flags
hog
will now run the yielder
method from test.py
to generate the jobs to put into queue.
We do not have restrictions on how many concurrent jobs can run on the same GPU. It is important to note that in some cases it might be better to run only one job at any given point on a GPU. In other cases, for example, running multiple tensorflow instances, it might be possible to run several concurrent sessions on the same GPU. It is up to the user to decide which one is better suited for their use-case.
To run multiple concurrent programs on the same GPU, use multiple instances of the ID while setting the --gpus
flag.
For example, --gpus 0,0,1,2,2,2
will run two concurrent jobs on GPU 0
, one on GPU 1
, three on GPU 2
Inside output_dir
, there is one folder per job according to flags passed. Say we have job_1
, inside we have the following files
INFO
basic information about the job such as job name, command, GPU the command was run onjob_0.ERR
capturedSTDERR
output of the jobjob_0.OUT
capturedSTDOUT
output of the jobSUCCESS
/FAILURE
empty file showing whether the job succeeded or not
- A job is a bash command or
&&
separated sequence of commands to be executed - Specify one job per line
- Lines starting with
#
and empty lines are ignored
- Incorporate using
hog
as a decorator to make it more flexible - Allow users to override default task to be done for each job
- Hooks, both pre-run and post-run (for things like email alerts, logging to a DB .etc)
- Use
multiprocessing.logging
instead ofprint
statements - Allow changing GPUs available at runtime through a
gpu_file
argument - Have a test suite