Skip to content

Latest commit

 

History

History
89 lines (63 loc) · 4.62 KB

README.rst

File metadata and controls

89 lines (63 loc) · 4.62 KB

VTAM - Validation and Taxonomic Assignation of Metabarcoding Data

https://img.shields.io/pypi/v/vtam.svg?color=blue https://static.pepy.tech/personalized-badge/vtam?period=month&units=international_system&left_color=gray&right_color=blue&left_text=Downloads https://readthedocs.org/projects/vtam/badge/?version=latest https://app.travis-ci.com/aitgon/vtam.svg?branch=master

VTAM is a metabarcoding package with various commands to process high throughput sequencing (HTS) data of amplicons of one or several metabarcoding markers in FASTQ format and produce a table of amplicon sequence variants (ASVs) assigned to taxonomic groups. If you use VTAM in scientific works, please cite the following article:

González, A., Dubut, V., Corse, E., Mekdad, R., Dechatre, T. and Meglécz, E.. VTAM: A robust pipeline for processing metabarcoding data using internal controls. bioRxiv: 10.1101/2020.11.06.371187v1.

Commands for a quick installation:

conda create --name vtam python=3.9 -y
conda activate vtam

Then install dependencies

python3 -m pip install cutadapt
conda install -c bioconda blast -y
conda install -c bioconda vsearch -y
python3 -m pip install vtam

Commands for a quick working example:

vtam example
cd example
snakemake --printshellcmds --resources db=1 --snakefile snakefile.yml --cores 4 --configfile asper1/user_input/snakeconfig_mfzr.yml --until asvtable_taxa

The table of amplicon sequence variants (ASV) is here:

(vtam) user@host:~/vtam/example$ head -n4 asper1/run1_mfzr/asvtable_default_taxa.tsv
run marker  variant sequence_length read_count      tpos1_run1      tnegtag_run1    14ben01 14ben02 clusterid       clustersize     chimera_borderlineltg_tax_id    ltg_tax_name    ltg_rank        identity        blast_db        phylum  class   order   family  genus   species sequence
run1        MFZR    25      181     478     478     0       0       0       25      1       False   131567  cellular organisms      no rank 80      coi_blast_db_20200420                                                   ACTATACCTTATCTTCGCAGTATTCTCAGGAATGCTAGGAACTGCTTTTAGTGTTCTTATTCGAATGGAACTAACATCTCCAGGTGTACAATACCTACAGGGAAACCACCAACTTTACAATGTAATCATTACAGCTCACGCATTCCTAATGATCTTTTTCATGGTTATGCCAGGACTTGTT
run1        MFZR    51      181     165     0       0       0       165     51      1       False                                   coi_blast_db_20200420           ACTATATTTAATTTTTGCTGCAATTTCTGGTGTAGCAGGAACTACGCTTTCATTGTTTATTAGAGCTACATTAGCGACACCAAATTCTGGTGTTTTAGATTATAATTACCATTTGTATAATGTTATAGTTACGGGTCATGCTTTTTTGATGATCTTTTTTTTAGTAATGCCTGCTTTATTG
run1        MFZR    88      175     640     640     0       0       0       88      1       False   1592914 Caenis pusilla  species 100     coi_blast_db_20200420   Arthropoda      Insecta Ephemeroptera   Caenidae        Caenis  Caenis pusilla  ACTATATTTTATTTTTGGGGCTTGATCCGGAATGCTGGGCACCTCTCTAAGCCTTCTAATTCGTGCCGAGCTGGGGCACCCGGGTTCTTTAATTGGCGACGATCAAATTTACAATGTAATCGTCACAGCCCATGCTTTTATTATGATTTTTTTCATGGTTATGCCTATTATAATC

The database of intermediate data is here:

 (vtam) user@host:~/vtam/example$ sqlite3 asper1/db.sqlite '.tables'
FilterChimera                    Sample
FilterChimeraBorderline          SampleInformation
FilterCodonStop                  SortedReadFile
FilterIndel                      TaxAssign
FilterLFN                        Variant
FilterLFNreference               VariantReadCount
FilterMinReplicateNumber         wom_Execution
FilterMinReplicateNumber2        wom_FileInputOutputInformation
FilterMinReplicateNumber3        wom_Option
FilterPCRerror                   wom_TableInputOutputInformation
FilterRenkonen                   wom_TableModificationTime
Marker                           wom_ToolWrapper
ReadCountAverageOverReplicates   wom_TypeInputOrOutput
Run

The VTAM documentation is hosted at ReadTheDocs.

VTAM is maintained by Aitor González (aitor dot gonzalez at univ-amu dot fr) and Emese Meglécz (emese dot meglecz at univ-amu dot fr).