Assembly and quantification metatranscriptome using metagenome data.
Version: see VERSION
MetaGT is a bioinformatics analysis pipeline used for improving and quantification metatranscriptome assembly using metagenome data. The pipeline supports Illumina sequencing data and complete metagenome and metatranscriptome assemblies. The pipeline involves the alignment of metatranscriprome assembly to the metagenome assembly with further extracting CDSs, which are covered by transcripts.
The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It comes with docker containers making installation trivial and results highly reproducible. The Nextflow DSL2 implementation of this pipeline uses one container per process which makes it much easier to maintain and update software dependencies.
-
Install
nextflow
-
Install any of
Conda
for full pipeline reproducibility -
Download the pipeline, e.g. by cloning metaGT GitHub repository:
git clone [email protected]:ablab/metaGT.git
-
Test it on a minimal dataset by running:
nextflow run metaGT -profile test,conda
-
Start running your own analysis!
Typical command for analysis using reads:
nextflow run metaGT -profile <conda> --dna_reads '*_R{1,2}.fastq.gz' --rna_reads '*_R{1,2}.fastq.gz'
Typical command for analysis using multiple files with reads:
nextflow run metaGT -profile <conda> --dna_reads '*.yaml' --rna_reads '*.yaml' --yaml
Typical command for analysis using assemblies:
nextflow run metaGT -profile <conda> --genome '*.fasta' --transcriptome '*.fasta'
Optionally, if raw reades are used:
- Sequencing quality control (
FastQC
) - Assembly metagenome or metatranscriptome (
metaSPAdes, rnaSPAdes
)
By default, the pipeline currently performs the following:
- Annotation metagenome (
Prokka
) - Aligning metatranscriptome on metagenome (
minimap2
) - Annotation unaligned transcripts (
TransDecoder
) - Clustering covered CDS and CDS from unaligned transcripts (
MMseqs2
) - Quantifying abundances of transcripts (
kallisto
)
MetaGT was developed by Daria Shafranskaya and Andrey Prjibelski. If you use it in your research please cite:
MetaGT: A pipeline for de novo assembly of metatranscriptomes with the aid of metagenomic data
If you have any questions, please leave an issue at out GitHub page.