Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dockerize supersmart #52

Open
rvosa opened this issue Apr 13, 2015 · 10 comments
Open

Dockerize supersmart #52

rvosa opened this issue Apr 13, 2015 · 10 comments
Milestone

Comments

@rvosa
Copy link
Member

rvosa commented Apr 13, 2015

Should be self-explanatory: with the rise of docker it makes a lot of sense to release a dockerized SUPERSMART. Deferred for now.

@rvosa rvosa added this to the v2.0 milestone Apr 13, 2015
@rvosa
Copy link
Member Author

rvosa commented May 14, 2015

This move will also rid us of this trouble: hashicorp/vagrant#3341

@rvosa
Copy link
Member Author

rvosa commented May 21, 2015

So, what would be the advantages of Docker:

  • more efficient resource use, less overhead from a VM
  • smaller image, so a quicker download
  • no "logging in", execution is more or less instantaneous
  • commands can be invoked directly from the host, so pipeline tools (e.g. galaxy) don't have to run inside the VM

Disadvantages:

  • Docker is not yet available on Windows (I think)
  • Porting to Docker is going to be a bit of work
  • there seem to be issues with how Docker syncs with STDOUT (garbled output in terminal)
  • the monolithic VM model is a bit unusual in Docker, we should probably compose a SUPERSMART stack from different layers. More work.

@rvosa
Copy link
Member Author

rvosa commented May 21, 2015

We might be able to do something like this, to re-use our puppet manifest: https://puppetlabs.com/blog/docker-and-puppet-for-application-management

@rvosa
Copy link
Member Author

rvosa commented Jul 16, 2015

It would make sense, once we make this move, to do so in compliance with the recommendations of bioboxes (http://bioboxes.org/guide/developer/)

@michaelbarton
Copy link

@bioboxes/core-team would be happy to help with creating a biobox for
supersmart if that is something you are interested in. We have no specification
for phylogenetic software, and this along with multiple sequence aligners would
be great additions since these are common tasks in bioinformatics. The first
step would be thinking about the minimum inputs and outputs for a biobox. From
reading the supersmart documentations, would this be taking a multiple sequence
alignment and returning a newick tree?

@rvosa
Copy link
Member Author

rvosa commented Jul 17, 2015

The inputs and outputs are a bit debatable. In the current version, supersmart is an entire pipeline (in a VM) that starts out with a list of species names and fossil calibration points and results in an annotated tree. In the process, quite a few different tools are used:

  • a SQLite database
  • standalone BLAST
  • multiple sequence aligners (this is configurable, we usually use muscle or mafft)
  • tree inference tools (configurable, usually exabayes and beast)
  • treePL
  • BioPerl, Bio::Phylo and an API we're writing for the pipeline

It would probably be most useful if a bioboxed version of the pipeline would be composed of multiple containers so that some of these tools can be re-used independently by the community (for example: treePL is quite a hassle to compile, so would be nice if people had bioboxed access to it). Consequently, there might be a little ecosystem of boxes with their own inputs and outputs. At least, that's kind of how we're thinking about this right now - but I would be keen to hear your advice.

@michaelbarton
Copy link

It would probably be most useful if a bioboxed version of the pipeline would
be composed of multiple containers so that some of these tools can be re-used
independently by the community (for example: treePL is quite a hassle to
compile, so would be nice if people had bioboxed access to it).

If you mean a Docker container comprising a pipeline of smaller Docker
containers, this is possible using Docker-in-Docker however will require using
the Docker --priviledged flag which most people are uncomfortable with.

Alternatively if you are suggesting decomposing your pipeline into Docker
containers for the tools, and a separate tool that runs these containers, this
is something I think bioboxes can help with. For instance we could, with you
help, create biobox standards for the MSA and tree building steps, and then it
would be possible for you or your users to swap one tool for another in the
supersmart pipeline with minimal effort.

Consequently, there might be a little ecosystem of boxes with their own
inputs and outputs. At least, that's kind of how we're thinking about this
right now - but I would be keen to hear your advice.

I believe an effort like nucleotid.es for MSAs or tree builders would also be
extremely useful to the community since these are both tools used a great deal.

@rvosa
Copy link
Member Author

rvosa commented Jul 17, 2015

If you mean a Docker container comprising a pipeline of smaller Docker
containers, this is possible using Docker-in-Docker however will require using
the Docker --priviledged flag which most people are uncomfortable with.

I am now actually discussing technology that I don't have enough hands-on experience with (yet) to really know what I'm talking about, but what I had in mind is docker compose. Are we talking about the same thing?

For instance we could, with you help, create biobox standards for the MSA and tree building steps, and then it would be possible for you or your users to swap one tool for another in the supersmart pipeline with minimal effort.

Yup, something like that. For the pipeline we have done the work to define how to compile all the required tools (and have recorded that in our puppet manifest). I think it would be most useful if that groundwork was available to as many people as possible. Which probably means it should be broken up into different boxes that people can mix and match as needed.

@michaelbarton
Copy link

michaelbarton commented Jul 30, 2015 via email

@lonelyjoeparker
Copy link

+1.... We're considering at Supersmart for http://science.kew.org/strategic-output/plant-and-fungal-trees-life and a Dockerized build (better yet an AMI) could be really useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants