Description
New feature
I develop wr which is a workflow runner like Nextflow, but can also just be used as a backend scheduler. It can schedule to LSF and OpenStack right now.
The benefit to Nextflow users of going via wr instead of using Nextflow’s existing LSF or Kubernetes support is:
- wr makes more efficient use of LSF: it can pick an appropriate queue, use job arrays, and “reuse” job slots. In a simple test I did, Nextflow using wr in LSF mode was 2 times faster than Nextflow using its own LSF scheduler.
- wr’s OpenStack support is incredibly easy to use and set up (basically a single command to run), and provides auto scaling up and down. Kubernetes, by comparison, is really quite complex to get working on OpenStack, doesn’t auto scale, and wastes resources with multiple nodes needed even while no workflows are being operated on. I was able to get Nextflow to work with wr in OpenStack mode (but the shared disk requirement for Nextflow’s state remains a concern).
Usage scenario
Users with access to LSF or OpenStack clusters who want to run their Nextflow workflows efficiently and easily.
Suggest implementation
Since I don’t know Java well enough to understand how to implement this “correctly”, I wrote a simple bsub emulator in wr, which is what my tests so far have been based on. I submit the Nextflow command as a job to wr, turning on the bsub emulation, and configure Nextflow to use its existing LSF scheduler. While running under the emulation, Nextflow’s bsub calls actually call wr.
Of course the proper way to do this would be have Nextflow call wr directly (either the wr command line, or it’s REST API). The possibly tricky thing with regard to having it work in OpenStack mode is having it tell wr about OpenStack-specific things like what image to use, what hardware flavour to use, pass details on how to mount S3 etc. (the bsub emulation handles all of this).
Here's what I did for my LSF test...
echo_1000_sleep.nf:
#!/usr/bin/env nextflow
num = Channel.from(1..1000)
process echo_sleep {
input:
val x from num
output:
stdout result
"echo $x && sleep 1"
}
result.subscribe { println it }
workflow.onComplete {
println "Pipeline completed at: $workflow.complete"
println "Execution status: ${ workflow.success ? 'OK' : 'failed' }"
}
nextflow.config:
process {
executor='lsf'
queue='normal'
memory='100MB'
}
install wr:
wget https://github.com/VertebrateResequencing/wr/releases/download/v0.17.0/wr-linux-x86-64.zip
unzip wr-linux-x86-64.zip
mv wr /to/somewhere/in/my/PATH/wr
run:
wr manager start -s lsf
echo "nextflow run ./echo_1000_sleep.nf" | wr add --bsub -r 0 -i nextflow --cwd_matters --memory 1GB
Here's what I did to get it to work in OpenStack...
nextflow_install.sh:
sudo apt-get update
sudo apt-get install openjdk-8-jre-headless -y
wget -qO- https://get.nextflow.io | bash
sudo mv nextflow /usr/bin/nextflow
put input files in S3:
s3cmd put nextflow.config s3://sb10/nextflow/nextflow.config
s3cmd put echo_1000_sleep.nf s3://sb10/nextflow/echo_1000_sleep.nf
~/.openstack_rc:
[your rc file containing OpenStack environment variables downloaded from Horizon]
run:
source ~/.openstack_rc
wr cloud deploy --os 'Ubuntu Xenial' --username ubuntu
echo "cp echo_1000_sleep.nf /shared/echo_1000_sleep.nf && cp nextflow.config /shared/nextflow.config && cd /shared && nextflow run echo_1000_sleep.nf" | wr add --bsub -r 0 -o 2 -i nextflow --memory 1GB --mounts 'ur:sb10/nextflow' --cloud_script nextflow_install.sh --cloud_shared
The NFS share at /shared
created by the --cloud_shared
option is slow and limited in size; a better solution would be to set up your own high performance shared filesystem in OpenStack (eg. GlusterFS), then add to nextflow_install.sh
to mount this share. Or even better, is there a way to have Nextflow not store state on disk? If it could just query wr for job completion status, that would be better.