SARS-CoV-2 Variants of Concern (VOC) & Interest (VOI) pose high risks to global public health. COVID-MVP tracks mutations from VOCs and VOIs to enable interactive visualization in near-real time. COVID-MVP has 3 modules: A Nextflow-wrapped workflow (nf-ncov-voc) for identifying mutations in genomic data; a Python module that integrates functional annotations from the Pokay repository, based on literature curation; and an interactive visualization for prevalence of mutations in variants and their functional impact, based on Dash & Plotly frameworks.
You can find a deployed version of this application (without user upload functionality) at https://covidmvp.cidgoh.ca/.
0. (For users that plan to upload their own data) Install Nextflow
$ git clone [email protected]:cidgoh/COVID-MVP.git
We use Conda, but you can use venv. We recommend using Python 3.9 in your virtual environment. Older Python versions may break the application.
$ conda create --name=COVID-MVP python=3.9
$ conda activate COVID-MVP
(COVID-MVP) $ pip install -r requirements.txt
If you do not run the application from the root directory, some of the JavaScript assets will not be compiled.
(COVID-MVP) $ python app.py
Go to http://127.0.0.1:8050/.
Click the legend button at the top for an in-app explanation of the heatmap view.
The left axis encodes variants. VOC are in bold, and VOI are in italics.
The right axis encodes the number of genomic sequences analyzed for each variant.
The top axis encodes the nucleotide position of variant mutations, with respect to the reference SARS-CoV-2 genome from Wuhan.
The bottom axis encodes the amino acid position of variant mutations, in the following format:
Genic mutations: {GENE}.{AMINO ACID POSITION WITHIN THAT GENE}
Intergenic: {NEAREST DOWNSTREAM GENE}. {NUMBER OF NUCLEOTIDES UPSTREAM}
The heatmap cells encode the presence of mutations. The color of these cells encodes mutation frequency. Insertions, deletions, functional mutations, and variants with a sample size of one are indicated as follows:
Hovering over cells displays detailed mutation information. Clicking cells opens a modal with detailed mutation function descriptions, and their citations.
The histogram bars encode the total number of mutations across all VOI/VOC every 100 nucleotide positions. The horizontal bar directly under the histogram bars maps SARS-CoV-2 genes to the histogram x-axis. The black horizontal bar at the bottom maps the current position of the heatmap viewport to the SARS-CoV-2 genome.
A tabular subset of fields for a single VOI/VOC, modified from the application data used to generate the heatmap and histogram views. You can alternate between variants by clicking on the heatmap cells.
There are several tools in the top of the interface that can be used to edit the visualization.
Clicking the select lineages btn opens a modal that allows you to rearrange and hide variants.
The mutation frequency slider allows you to filter heatmap cells by mutation frequency.
The clade defining switch allows you to filter in and out heatmap cells corresponding to non-clade defining mutations.
The upload button allows you to upload your own SARs-CoV-2 genomic
data in FASTA
or VCF
format. You can find examples of files users can
upload in test_data/.
You must have Nextflow installed to upload files.
The download button allows you to download a zip object containing surveillance reports for each reference variant. You can find examples of these reports in surveillance_reports/.
This pipeline generates the data files for the visualization.
Stand-alone repository of functional annotations, that we use as an annotation source in this application.
We encourage you to add any problems with the application as an issue in this repository, but if you need to contact us by email, you can email us at [email protected].
@ivansg44: Visualization development
@Anoosha-Sehar: Functional annotation
@anwarMZ: Genomic analysis
@miseminger: Functional annotation and data standardization
@despean: Application deployment
William Hsiao, Gary Van Domselaar, and Paul Gordon
The results here are in whole or part based upon data hosted at the Canadian VirusSeq Data Portal: https://virusseq-dataportal.ca/. We wish to acknowledge the following organisations/laboratories for contributing data to the Portal: Canadian Public Health Laboratory Network (CPHLN), CanCOGGeN VirusSeq, Saskatchewan - Roy Romanow Provincial Laboratory(RRPL), Nova Scotia Health Authority, Alberta ProvLab North(APLN), Queen's University / Kingston Health Sciences Centre, National Microbiology Laboratory(NML), BCCDC Public Health Laboratory, Public Health Ontario(PHO), Newfoundland and Labrador - Eastern Health, Unity Health Toronto, Ontario Institute for Cancer Research(OICR), Manitoba Cadham Provincial Laborator, and Manitoba Cadham Provincial Laboratory.