read2tree

read2tree is a software tool that allows to obtain alignment matrices for tree inference. For this purpose it makes use of the OMA DB and a set of reads. Its strength lies in the fact that it bipasses the several standard steps when obtaining such a matrix in regular analysis. These steps are read filtereing, assembly, gene prediction, gene annotation, all vs all comparison, orthology prediction, alignment and concatination. All this steps are included.

Getting Started

read2tree was build and tested with python 3.5.1. To set up read2tree on your local machine please follow the instructions below.

git clone https://github.com/dvdylus/read2tree.git
python setup.py install

Prerequisites

read2tree integrates multiple software tools and allows to infer a phylogenetic tree skipping several steps of a usual pipeline such as assembly, annotation and orthology prediction. It offers a fast alternative to usual tree inference pipelines.

mafft - Multiple sequence alignment software
fasttree - Dependency Management
ngmlr - Long read mapper for ONT or PacBio read data
ngm - Short read mapper for paired end reads
pyopa - Implementation of Smith Waterman alignment algorithm in python
pyoma - Library for retrieval of nucleotide sequences from oma run
pyham - Library to work with HOGs
samtools - Set of programs to interact with high-throughput sequencing data

Installing

For mafft, fasttree, ngmlr, ngm and samtools please follow the instructions provided by the individual packages. Make sure that executables are in PATH. Or you can just follow the instructions below, since all the packages are available under conda.

CONDA

Install miniconda
Setup bioconda channels

    conda config --add channels defaults
    conda config --add channels conda-forge
    conda config --add channels bioconda

Install required tools

    conda install mafft
    conda install fasttree
    conda install ngmlr
    conda install nextgenmap
    conda install samtools
    conda install pysam

Python

Install the tool

    python setup.py install

Running the tests

Once successfully installed you can test the package using:

python -W ignore bin/read2tree --standalone_path tests/data/marker_genes/ --reads tests/data/mapper/test3/test_1a.fq tests/data/mapper/test3/test_2a.fq  --output_path test/output/

In the folder 'tests/data/output' you should be able to find the following folders:

folder/file	description
01_ref_ogs_aa	contains the selected OGs with amino acid data
01_ref_ogs_dna	contains the selected OGs with dna data
02_ref_dna	contains the OGs reshuffeled by available species
03_align_aa	contains mafft alignment of aa data
03_align_dna	contains codon replacement of aa alignments
04_mapping_test_1b	contains the consensus sequences from the mapping
05_ogs_map_test_1b_aa	contains the OGs with additional sequence test_1b
05_ogs_map_test_1b_dna	contains the OGs with additional sequence test_1b
06_align_test_1b_aa	contains the alignment with additional sequence test_1b
06_align_test_1b_dna	contains the alignment with additional sequence test_1b
concat_test_1b_aa.phy	concatenated alignments from 06 amino acid folder
concat_test_1b_dna.phy	concatenated alignments from 06 dna folder
test_1b_all_cov.txt	summary of average numbers of reads used for selected sequences
test_1b_all_sc.txt	summary of average consensus length of reconstructed sequences

Running

To run read2tree two things are required as input:

The reads directly or as SRA or ENA submission index (submission scripts for lsf and sge are porvided (check the scripts folder))
As set of reference orthologous groups from the omabrowser that can be obtained with either the All vs All export or the marker gene export. This also means that some beforehand knowledge about the species to place or to add is required

Prerequisites

Make sure that species names are clearly labeled by a 5 letter code (e.g. Amphiura filiformis = AMPFI)
Needs either OMA standalone export or OMA marker gene export as reference input
If you are using your own OMA run the formatting is crucial

Running on clusters

Run the first step of read2tree such that folders 01, 02 and 03 are computed (this allows for mapping). This can be done using the '--reference' option.
Since read2tree re-orders the OGs into the included species, it is possible to split the mapping step per species using multiple threads for the mapper. For this the '--single_mapping' option is available.

LSF

SGE

Built With

pyCharm - Python IDE

Contributing

Please read CONTRIBUTING.md for details on our code of conduct, and the process for submitting pull requests to us.

Versioning

We use SemVer for versioning. For the versions available, see the tags on this repository.

Authors

David Dylus - Initial work - dvddylus

License

This project is licensed under the MIT License - see the LICENSE.md file for details

Acknowledgments

Alex Warwick for help how to initiate such a package. __

Name		Name	Last commit message	Last commit date
Latest commit History 408 Commits
.idea		.idea
bin		bin
docs/.ipynb_checkpoints		docs/.ipynb_checkpoints
read2tree		read2tree
scripts		scripts
src		src
tests		tests
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
TODO		TODO
down_py_script.sh		down_py_script.sh
r2t_py_script.sh		r2t_py_script.sh
requirements.txt		requirements.txt
rm_py_script.sh		rm_py_script.sh
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

read2tree

Getting Started

Prerequisites

Installing

CONDA

Python

Running the tests

Running

Prerequisites

Running on clusters

LSF

SGE

Built With

Contributing

Versioning

Authors

License

Acknowledgments

About

Releases

Packages

Contributors 2

Languages

dvdylus/read2tree

Folders and files

Latest commit

History

Repository files navigation

read2tree

Getting Started

Prerequisites

Installing

CONDA

Python

Running the tests

Running

Prerequisites

Running on clusters

LSF

SGE

Built With

Contributing

Versioning

Authors

License

Acknowledgments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages