Dockerize supersmart #52

rvosa · 2015-04-13T14:39:41Z

Should be self-explanatory: with the rise of docker it makes a lot of sense to release a dockerized SUPERSMART. Deferred for now.

rvosa · 2015-05-14T08:58:04Z

This move will also rid us of this trouble: hashicorp/vagrant#3341

rvosa · 2015-05-21T14:10:29Z

So, what would be the advantages of Docker:

more efficient resource use, less overhead from a VM
smaller image, so a quicker download
no "logging in", execution is more or less instantaneous
commands can be invoked directly from the host, so pipeline tools (e.g. galaxy) don't have to run inside the VM

Disadvantages:

Docker is not yet available on Windows (I think)
Porting to Docker is going to be a bit of work
there seem to be issues with how Docker syncs with STDOUT (garbled output in terminal)
the monolithic VM model is a bit unusual in Docker, we should probably compose a SUPERSMART stack from different layers. More work.

rvosa · 2015-05-21T18:48:48Z

We might be able to do something like this, to re-use our puppet manifest: https://puppetlabs.com/blog/docker-and-puppet-for-application-management

rvosa · 2015-07-16T13:36:53Z

It would make sense, once we make this move, to do so in compliance with the recommendations of bioboxes (http://bioboxes.org/guide/developer/)

michaelbarton · 2015-07-16T16:23:07Z

@bioboxes/core-team would be happy to help with creating a biobox for
supersmart if that is something you are interested in. We have no specification
for phylogenetic software, and this along with multiple sequence aligners would
be great additions since these are common tasks in bioinformatics. The first
step would be thinking about the minimum inputs and outputs for a biobox. From
reading the supersmart documentations, would this be taking a multiple sequence
alignment and returning a newick tree?

rvosa · 2015-07-17T10:17:11Z

The inputs and outputs are a bit debatable. In the current version, supersmart is an entire pipeline (in a VM) that starts out with a list of species names and fossil calibration points and results in an annotated tree. In the process, quite a few different tools are used:

a SQLite database
standalone BLAST
multiple sequence aligners (this is configurable, we usually use muscle or mafft)
tree inference tools (configurable, usually exabayes and beast)
treePL
BioPerl, Bio::Phylo and an API we're writing for the pipeline

It would probably be most useful if a bioboxed version of the pipeline would be composed of multiple containers so that some of these tools can be re-used independently by the community (for example: treePL is quite a hassle to compile, so would be nice if people had bioboxed access to it). Consequently, there might be a little ecosystem of boxes with their own inputs and outputs. At least, that's kind of how we're thinking about this right now - but I would be keen to hear your advice.

michaelbarton · 2015-07-17T13:00:14Z

It would probably be most useful if a bioboxed version of the pipeline would
be composed of multiple containers so that some of these tools can be re-used
independently by the community (for example: treePL is quite a hassle to
compile, so would be nice if people had bioboxed access to it).

If you mean a Docker container comprising a pipeline of smaller Docker
containers, this is possible using Docker-in-Docker however will require using
the Docker --priviledged flag which most people are uncomfortable with.

Alternatively if you are suggesting decomposing your pipeline into Docker
containers for the tools, and a separate tool that runs these containers, this
is something I think bioboxes can help with. For instance we could, with you
help, create biobox standards for the MSA and tree building steps, and then it
would be possible for you or your users to swap one tool for another in the
supersmart pipeline with minimal effort.

Consequently, there might be a little ecosystem of boxes with their own
inputs and outputs. At least, that's kind of how we're thinking about this
right now - but I would be keen to hear your advice.

I believe an effort like nucleotid.es for MSAs or tree builders would also be
extremely useful to the community since these are both tools used a great deal.

rvosa · 2015-07-17T14:07:43Z

If you mean a Docker container comprising a pipeline of smaller Docker
containers, this is possible using Docker-in-Docker however will require using
the Docker --priviledged flag which most people are uncomfortable with.

I am now actually discussing technology that I don't have enough hands-on experience with (yet) to really know what I'm talking about, but what I had in mind is docker compose. Are we talking about the same thing?

For instance we could, with you help, create biobox standards for the MSA and tree building steps, and then it would be possible for you or your users to swap one tool for another in the supersmart pipeline with minimal effort.

Yup, something like that. For the pipeline we have done the work to define how to compile all the required tools (and have recorded that in our puppet manifest). I think it would be most useful if that groundwork was available to as many people as possible. Which probably means it should be broken up into different boxes that people can mix and match as needed.

michaelbarton · 2015-07-30T22:13:06Z

I am now actually discussing technology that I don't have enough hands-on experience with (yet) to really know what I'm talking about, but what I had in mind is [docker compose](https://docs.docker.com/compose/). Are we talking about the same thing?

As I understand it, Docker compose is used to create linked applications of multiple long running containers such as web servers and so forth. I do not think it has the ability to construct workflows where the output of one container is passed as the input to another. I believe the common workflow language, nextflow.io and galaxy are both however working on this.

Yup, something like that. For the pipeline we have done the work to define how to compile all the required tools (and have recorded that in our [puppet manifest](https://github.com/naturalis/supersmart/blob/master/conf/manifests/default.pp)). I think it would be most useful if that groundwork was available to as many people as possible. Which probably means it should be broken up into different boxes that people can mix and match as needed.

I agree, in particular I think bioboxes is most useful where there are many different implementations of essentially the same task. This then makes it useful to be able to swap one container for another of the same type without having to change anything else in the pipeline.

lonelyjoeparker · 2016-09-21T12:37:48Z

+1.... We're considering at Supersmart for http://science.kew.org/strategic-output/plant-and-fungal-trees-life and a Dockerized build (better yet an AMI) could be really useful.

rvosa added this to the v2.0 milestone Apr 13, 2015

rvosa added the deployment label May 7, 2015

rvosa mentioned this issue Jul 16, 2015

Is this collaborative? bioboxes/commentary-article#3

Closed

michaelbarton mentioned this issue Jul 16, 2015

Write a call-for-contributions commentary article for gigascience. bioboxes/rfc#87

Closed

rvosa mentioned this issue Nov 25, 2016

Memory problems when threading in a VM #105

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dockerize supersmart #52

Dockerize supersmart #52

rvosa commented Apr 13, 2015

rvosa commented May 14, 2015

rvosa commented May 21, 2015

rvosa commented May 21, 2015

rvosa commented Jul 16, 2015

michaelbarton commented Jul 16, 2015

rvosa commented Jul 17, 2015

michaelbarton commented Jul 17, 2015

rvosa commented Jul 17, 2015

michaelbarton commented Jul 30, 2015 via email

lonelyjoeparker commented Sep 21, 2016

Dockerize supersmart #52

Dockerize supersmart #52

Comments

rvosa commented Apr 13, 2015

rvosa commented May 14, 2015

rvosa commented May 21, 2015

rvosa commented May 21, 2015

rvosa commented Jul 16, 2015

michaelbarton commented Jul 16, 2015

rvosa commented Jul 17, 2015

michaelbarton commented Jul 17, 2015

rvosa commented Jul 17, 2015

michaelbarton commented Jul 30, 2015 via email

lonelyjoeparker commented Sep 21, 2016