Snipe (SeNsItive Pathogen dEtection), a pipeline for improving the ability of existing strain-typing tools to detect common pathogens from contaminated food samples at low abundances. Snipe consists of three core modules:
- The snipeMap module will map unassembled metagenomic reads against a target library and remove sequences that align to the filter and host libraries.
- The snipeId module will reassign ambiguous reads, identify microbial strains present in the sample, and estimate proportions of reads from each genome.
- The snipeRec module will align raw reads to SSRs and generate reports containing read proportions to each genome after rectification by the a posteriori probabilities.
For any issues or concerns, please contact us at [email protected]
Species name | Number of SSRs |
---|---|
Escherichia coli | 98 |
Salmonella enterica | 387 |
Staphylococcus aureus | 169 |
Listeria monocytogenes | 157 |
Campylobacter jejuni | 132 |
Vibrio cholerae | 261 |
Vibrio parahaemolyticus | 1206 |
Proteus mirabilis | 1122 |
Yersinia enterocolitica | 141 |
Clostridium perfringens | 2377 |
It is recommended to create a new conda environment:
conda create -n python37 python=3.7
# Activate this environment:
conda activate python37
• numpy (v1.15.0)
conda install -c conda-forge numpy
• pandas (v0.24.2)
conda install -c conda-forge pandas
• bowtie2 (v2.2.5)
conda install -c bioconda bowtie2
• pysam (v0.15.3)
conda install -c bioconda pysam
First of all, we should:
- change directory (cd) to snipe folder
- cd into snipe directory and call snipeIndex module help for details
cd ../snipe python snipe.py -h
We need the database of strains, which can be downloaded from NCBI. First you need to make sure that the index has been established otherwise the software will take a moment to build the index.
call snipeMap module help for details
python snipe.py MAP -h
python snipe.py MAP -1 map_inputread1 -2 map_inputread2 -targetRefFiles map_targetRef -filterRefFiles map_filterRef -indexDir map_indexdir -outDir map_outdir -
outAlign map_outalign -expTag map_exp_tag -numThreads map_numthreads
Required arguments:
-1, string Input Read Fastq File (Pair 1)
-2, string Input Read Fastq File (Pair 2)
-targetRefFiles, string Target Reference Genome Fasta Files Full Path (Comma Separated)
-filterRefFiles, string Filter Reference Genome Fasta Files Full Path (Comma Separated)
-outAlign, string Output Alignment File Name (Default=outalign.sam)
-expTag, string Experiment Tag added to files generated for identification
Optional arguments:
-outDir, string Output Directory (Default=. (current directory))
-indexDir, string index directory (default=. (current directory))
-numThreads, int Number of threads to use by aligner (bowtie2) if different from default (8)
First you need to make sure that the map module is finished. ID module will use file .sam generated previously with MAP module.
call snipeId module help for details
python snipe.py ID -h
python snipe.py ID -outDir id_outdir -alignFile id_ali_file -expTag id_exp_tag
Required arguments:
-alignFile, string Alignment file path
-expTag, string Experiment tag added to output file for easy identification
Optional arguments:
-outDir, string Output Directory (Default=. (current directory))
Make sure the SSRs index has been established.
call snipeRec module help for details
python snipe.py REC -h
python snipe.py REC -ssrRef map_ssrRefDir -1 rec_inputread1 -2 rec_inputread2 -idReport id_ali_file -dictTarget targetInfo_dict -dictTemplate file3 -outDir path2 -numThreads 1
Required arguments:
-ssrRef, string the directory of the species specific regions
-1, string Input Read Fastq File (Pair 1)
-2, string Input Read Fastq File (Pair 2)
-idReport, string alignment file generated by ID module
-dictTarget, string the dict which contains accession id to species name
-dictTemplate, string the dict which contains accession id to strain name
-expTag, string Experiment tag added to output file for easy identification
Optional arguments:
-outDir, string Output Directory (Default=.(current directory))
-numThreads, int Number of threads to use default (1)
bowtie2 --version
python -V
import pysam, pandas, numpy
pysam.__version__
pandas.__version__
numpy.__version__
python ./snipe/snipe.py MAP -1 example/demo_R1.fastp35.fastq -2 example/demo_R2.fastp35.fastq -targetRefFiles ./refDB/target.fna -filterRefFiles ./refDB/filter.fna -indexDir ./refDB/ -outDir ./ -outAlign demo.sam -expTag demo -numThreads 44
python ./snipe/snipe.py ID -alignFile ./demo.sam -fileType sam -outDir ./ -expTag demo
python ./snipe/snipe.py REC -ssrRef ./core/ -1 ./example/demo_R1.fastp35.fastq -2 ./example/demo_R2.fastp35.fastq -idReport demo-sam-report.tsv -dictTarget ./dict/dict_target -dictTemplate ./dict/dict_template -expTag demo -outDir ./ -numThreads 44
Columns in the TSV file:
This is the name of the genome found in the alignment file.
Accession ID used by NCBI Genebank database.
This represents the percentage of reads that are mapped to the genome in Column 1 after using SSRs rectification.
This represents the percentage of reads that are mapped to the genome in Column 1 (reads aligning to multiple genomes are assigned proportionally) after reassignment is performed.
This represents probability after using SSRs rectification .
This represents the number of reads that are mapped to the SSRs.
This represents the abundance after using SSRs rectification.
This represents the abundance before using SSRs rectification.
This represents the percentage of reads that are mapped to the genome in Column 1 after assigning each read uniquely to the genome with the highest score and after pathoscope reassignment is performed.
This represents the number of best hit reads that are mapped to the genome in Column 1 (may include a fraction when a read is aligned to multiple top hit genomes with the same highest score) and after pathoscope reassignment is performed.