JUST A SIMPLE PREDICTOR.
- JASPER is a free bioinformatics tools for predicting virus hosts.
- JASPER uses a bunch of bioinformatics tools to prediction virus hosts. It includes genome-genome alignment, CRISPR spacers analysis, tRNA analysis and more.
- JASPER contains few, independent modules
blast
,crispr
,trna
,wish
,mash
,merge
. - JASPER can be installed with Docker.
You need Python >= 3.7
to use JASPER.
Jasper depends on good file naming convention. The best is to use sequence ID as file name, e.x. NC_008876.fna
.
Software will use this id to name every temp file that needs to be created and also it will use this ID in results file.
WARNING It's not the best idea to use |
char in your filename and also in sequence header. Just use normal fasta
naming like >NC_00876 additional_info more_additional_info
.
If you put multiple contigs in a single file, there is no problem with that. Jasper repairs every file, by default naming it <id from filename>|<#contig>
e.x.:
>NC_000856|1
ATGCT....
>NC_000856|2
ATGCA....
# and so on
So even if you have, for instance, one genome in your file, then Jasper will change its ID
to <id from filename>|1
.
Jasper uses input files that ends with [fa, fna, fasta]
only!
NCBI-Blast+
PILER-CR
WIsH
Mash
tRNAscan-SE
JASPER uses additional software. It calls every program with subprocess
so every program that is stated in above
should be installed and added to PATH
.
On Ubuntu:
- To install NCBI-Blast+ use
sudo apt install ncbi-blast+
- To install PILER-CR go here, download compiled software, move somewhere and add
to
$PATH
under namepilercr
. - To install tRNAscan-SE go here, download, compile, move somewhere and add
to
$PATH
under nametRNAscan-SE
. Remember that tRNAscan-SE needsInfernal
to work properly. - To install WIsH go here, download, compile, move somewhere and add to
$PATH
under nameWIsH
. - To install Mash go here, download release, move somewhere and add to
$PATH
under namemash
.
Source code for additional software:
Remember to install everything and add it to path
You can also download the script:
install_dependencies.sh
After that go to JASPER's main directory and:
python setup.py install
By defaults some pip on linux drops scripts to ~/.local/bin
. Add it to your $PATH
at the end.
export PATH="$HOME/.local/bin:$PATH"
Now you're done, and you can start using jasper-vh
.
You can also install Jasper with Docker by using provided Dockerfile
. To do it you can do something like cd jasper && docker build -t jasper:v1 .
After that you can run it straight from entrypoint or in interactive mode with shell as main entrypoint -> docker run -it --entrypoint /bin/bash jasper:v1
.
To copy files between container and host use docker cp
command. Example:
- copy to container -
docker cp ~/Desktop/some_folder jasper:v1/app
- copy from container -
docker cp jasper:v1/app/some_folder ~/Desktop
You want to have:
- installed jasper python package using
setuptools
orpip
. - installed each tool and added to
PATH
If you want to test, go to proj directory and type python -m unittest discover
. It's recommended to do that, since it
performs tool check (ensures that user has all dependencies and proper python version).
JASPER uses a bunch of arguments. A lot of parameters are BLAST parameters and can be configured with JSON file and
passed to JASPER. It's also recommended using jasper
in empty directory. This ensures, that none of the user's file
will be overwritten or damaged. Just do mkdir jasper_results && cd jasper_results
and you're good to go.
jasper-vh blast --virus path/to/virus/dir --create-db host_db --host /path/to/host/dir --clear
jasper-vh crispr --host path/to/host/dir --create-db vir_db --host /path/to/vir/dir --clear
jasper-vh trna --host path/to/host/dir --virus /path/to/vir/dir --clear
jasper-vh wish --host path/to/host/dir --virus /path/to/vir/dir --clear
jasper-vh mash --host path/to/host/dir --virus /path/to/vir/dir --clear
jasper-vh merge {blast,crispr,trna,mash,wish}.csv --output final_results.csv
For more check --help
on jasper individual modules: jasper-vh {blast,crispr,trna,wish,mash,merge} --help
JASPER
produces output in a special format:
The resulting file is a csv file with additional lines that start with #
(for easy parsing).
A resulting file is grouped by virus genome. For each viral genome, there are number of Score
columns (number of score
columns are eq to number of tools used/resulting files merged).
Under each group there is an STD (standard deviation) of that column, which indicates a level of variation. It gives an idea of how much the proposed host are different between each other in a group.
Sample:
Virus | Host | blastScore | crisprScore | mashScore | wishScore |
---|---|---|---|---|---|
NC_024389 | NC_008531 | 294.0 | NaN | 0.537086 | -1.3626 |
NC_024389 | NC_008531 | 294.0 | NaN | 0.537086 | -1.3626 |
NC_024389 | NC_017331 | NaN | NaN | NaN | -1.35805 |
# Std | 0.0 | NaN | 0.0 | 0.002 | |
NC_024391 | NC_009641 | 31180.0 | NaN | 0.760474 | -1.33066 |
NC_024391 | NC_017349 | 17892.0 | NaN | 0.789652 | -1.32703 |
NC_024391 | NC_017349 | 17892.0 | NaN | 0.789652 | -1.32703 |
# Std | 6264.023 | NaN | 0.014 | 0.002 | |
NC_024392 | NC_018586 | 270.0 | 28.0 | 0.484772 | -1.36183 |
NC_024392 | NC_021827 | 270.0 | NaN | NaN | -1.36077 |
NC_024392 | NC_021830 | 270.0 | NaN | NaN | -1.3607 |
NC_024392 | NC_021839 | 270.0 | NaN | NaN | -1.36207 |
NC_024392 | NC_021840 | 270.0 | NaN | NaN | -1.35926 |
NC_024392 | NC_017547 | 74.0 | 37.0 | 0.517779 | -1.36224 |
NC_024392 | NC_021823 | 246.0 | 37.0 | 0.537086 | -1.36111 |
NC_024392 | NC_021823 | 246.0 | 37.0 | 0.537086 | -1.36111 |
NC_024392 | NC_010001 | NaN | NaN | NaN | -1.34728 |
# Std | 63.37 | 3.897 | 0.021 | 0.004 |
Std
is equal to NaN
only when the whole column is equal to NaN
which means that there were no results in given
tool for given hosts.
You can provide blast config as a *.json
file. Every module uses a different task so there are few arguments that are
forbidden:
['query', 'db', 'outfmt', 'max_target_seqs', 'num_alignments']
- Edgar, R.C. (2007) PILER-CR: fast and accurate identification of CRISPR repeats, BMC Bioinformatics, Jan 20;8:18
- Fichant and Burks, J. Mol. Biol. (1991) Identification of tRNA genes in genomic DNA, 220:659-671.
- Clovis Galiez, Matthias Siebert et al. WIsH: who is the host? Predicting prokaryotichosts from metagenomic phage contigs
- Ondov, B.D., Treangen, T.J., Melsted, P. et al. Mash: fast genome and metagenome distance estimation using MinHash. Genome Biol 17, 132 (2016).
- NCBI-BLAST+