-
Notifications
You must be signed in to change notification settings - Fork 2
Tutorial
Here, we demonstrate an example run of detectEVE on a small dataset. For this, we assume the tool has already been installed and default databases have been set up (see Installation and database setup)
Make sure that the module is activated and that the environment is up-to-date
mamba activate detectEVE
mamba env update --file workflow/envs/env.yaml
We will be using the following files from the examples folder: ATLV01_cut.fna
and Flavi_ref.fasta
. The folder and files are downloaded automatically during detectEVE
download to path/to/detectEVE/examples
.
In this tutorial we provide an EVE screening with two options for viral protein databases:
- RVDB(80) - provided during database-setup (see Installation and database setup
- small custom database including Flavivirus sequences -
Flavi_ref.fasta
from the examples folder
If you would like to try the custom database, we first need to setup a binary DIAMOND database file for Flavi_ref.fasta
. If you would like to only try RVDB.dmnd
then skip this step:
cd path/to/detectEVE/examples
diamond makedb --in Flavi_ref.fasta --db Flavi
mv Flavi.dmnd ../databases
Next, the config.yaml
file needs to be adjusted to include the Flavi.dmnd
. Skip this step if you would like to use RVDB.dmnd
:
- open
config.yaml
with your favourite text editor and changeRVDB.dmnd
toFlavi.dmnd
# NOTE: with custom databases make sure to set: taxonlist: "" (empty string)
#db: "rvdb80.dmnd"
db: "Flavi.dmnd"
- If custom database was not build with diamond taxonomy options (as is the case with
Flavi.dmnd
), change taxonomy settings to""
(empty string):
taxonlist: ""
#taxonlist: "--taxonlist 2732396,2731342" # screen only for Orthornaviridae and Monodnaviridae
# taxonlist: "" <<< use with custom dbs that haven't been built with a diamond taxonomy !
We now conduct an EVE search against our target viral database using the example genome assembly file ATLV01_cut.fna
:
cd /path/to/detectEVE
./detectEVE examples/ATLV01_cut.fna
detectEVE will create an output folder detectEVE-(time)
if not otherwise specified. If --cores
is not specified, detectEVE will take all availabe. Here examples for additional parameters:
# examples
./detectEVE -o test examples/ATLV01_cut.fna # save output in folder 'test'
./detectEVE --snake '--cores 8' examples/ATLV01_cut.fna # use a maximum of 8 threads
# see help page for more options
./detectEVE -h``
After some time, you will receive your output files with the following EVE hits:
ATLV01_cut-validatEVEs.tsv
after RVDB.dmnd
search:
eve_id confidence eve_score suggests because locus top_evalue top_pident top_desc top_viral_desc top_viral_lineage max_count_phylum
ATLV01_cut_EVE001 high 96 viral (19), maybe-viral (1) VDB (13), UDB Viruses (6), uncharacterized protein (1) ATLV01019207.1_3580-4280:- 3.01e-73 54.7 polyprotein [Karumba virus] acc=YP_009388577.1 polyprotein [Karumba virus] acc=YP_009388577.1 k__Viruses;K__Orthornavirae;p__Kitrinoviricota;c__Flasuviricetes;o__Amarillovirales;f__Flaviviridae;g__unclassified Flaviviridae genus;s__Karumba virus Kitrinoviricota
ATLV01_cut_EVE002 high 95 viral (19), maybe-viral (1) VDB (13), UDB Viruses (5), glycoprotein protein (1), viral (1) ATLV01019207.1_2615-3409:- 1.8099999999999996e-112 60.5 putative glycoprotein [Anopheles darlingi virus] acc=QBK47202.1 putative glycoprotein [Anopheles darlingi virus] acc=QBK47202.1 k__Viruses;K__Orthornavirae;p__Negarnaviricota;c__Monjiviricetes;o__Mononegavirales;f__Xinmoviridae;g__Madalivirus;s__Madalivirus amazonaense Negarnaviricota
ATLV01_cut_EVE003 high 94 viral (15), maybe-viral (1) VDB (12), UDB Viruses (3), glycoprotein protein (1) ATLV01019207.1_1147-1368:- 3.46e-18 48.6 putative glycoprotein [Gambie virus] acc=AOR51379.1 putative glycoprotein [Gambie virus] acc=AOR51379.1 k__Viruses;K__Orthornavirae;p__Negarnaviricota;c__Monjiviricetes;o__Mononegavirales;f__Xinmoviridae;g__Gambievirus;s__Gambievirus senegalense Negarnaviricota
ATLV01_cut-validatEVEs.tsv
after Flavi.dmnd
search:
eve_id confidence eve_score suggests because locus top_evalue top_pident top_desc top_viral_desc top_viral_lineage max_count_phylum
ATLV01_cut_EVE001 high 96 viral (19), maybe-viral (1) VDB (13), UDB Viruses (6), uncharacterized protein (1) ATLV01019207.1_3580-4280:- 3.1e-76 54.7 polyprotein [Karumba virus] Genome polyprotein (Fragment) n=3 Tax=unclassified Flaviviridae RepID=A0A1C9U5I9_9FLAV k__Viruses;K__Orthornavirae;p__Kitrinoviricota;c__Flasuviricetes;o__Amarillovirales;f__Flaviviridae;g__unclassified Flaviviridae genus;s__unclassified Flaviviridae species unclassified root phylum