Roary ILP Bacterial Annotation Pipeline
This tool is currently under heavy development, so expect some bugs but feel free to report issues
Annotate your protein sequences with Prokka and determine a pan genome with Roary. This genome is refined with the usage of ILPs that solve the best matching for each pairwise strain mmseqs2 comparison.
A common task when you have a bunch of bacterial genomes in your hands is the calculation of a core gene set. So, we want to know, which genes are homologous and shared between certain bacteria. However, defining homology only based an sequence similarity often underestimates the true core gene set, in particular when diverse species are compared. RIBAP combines sequence homology information from Roary with smart pairwise ILP calculations to produce a more complete core gene set - even on genus level. First, RIBAP performs annotations with Prokka, calculates the core gene set using Roary and pairwise ILPs, and finally visualizes the results in an interactive HTML table garnished with protein multiple sequence alignments and trees. RIBAP comes with Nextflow and Docker/Conda support for easy execution.
Easy, you just need a working nextflow
and docker
or conda
installation, see below! You have nextflow
and docker
? Give it a try:
nextflow run hoelzer-lab/ribap --fasta "$HOME/.nextflow/assets/hoelzer-lab/ribap/data/*.fasta"
You have nextflow
and conda
? Okay:
nextflow run hoelzer-lab/ribap --fasta "$HOME/.nextflow/assets/hoelzer-lab/ribap/data/*.fasta" -profile conda
You need some of this dependencies? See below.
- runs with the workflow manager
nextflow
usingdocker
orconda
- this means all programs are automatically pulled via
docker
orconda
- only
docker
orconda
andnextflow
need to be installed (per defaultdocker
is used)
Needed in both cases (conda
, docker
)
sudo apt-get update
sudo apt install -y default-jre
curl -s https://get.nextflow.io | bash
sudo mv nextflow /bin/
Just copy the commands and follow the installation instructions. Let the installer configure conda
for you. You need to specify -profile conde
to run the pipeline with conda support.
cd
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
See here if you need a different installer besides Linux used above.
If you dont have experience with bioinformatic tools just copy the commands into your terminal to set everything up:
sudo apt-get install -y docker-ce docker-ce-cli containerd.io
sudo usermod -a -G docker $USER
- restart your computer
- try out the installation by entering the following
Dependencies
- docker (add docker to your Usergroup, so no sudo is needed)
- nextflow + java runtime
- git (should be already installed)
- wget (should be already installed)
- tar (should be already installed)
- Docker installation here
- Nextflow installation here
- move or add the nextflow executable to a bin path
- add docker to your User group via
sudo usermod -a -G docker $USER
Get or update the workflow:
nextflow pull hoelzer-lab/ribap
Get help:
nextflow run hoelzer-lab/ribap --help
Run with RAxML tree calculation and specified output dir:
nextflow run hoelzer-lab/ribap --fasta '*.fasta' --tree --outdir ~/ribap