SARS-CoV-2 variant calling

This pipeline performs consensus assembly and variant calling for amplicon sequencing data (Illumina or Oxford Nanopore) generated using the ARTIC protocol. The user can specify a primer set to be used for trimming alignments; this is assumed to be ARTIC V4.1 if not specified.

Pipeline overview

The pipeline takes in a single FASTQ file (interleaved if Illumina) and processes it as follows:

Map reads to the Wuhan-Hu-1 reference and trim ARTIC primer sequences
Generate a consensus sequence (bcftools for Illumina; medaka for Oxford Nanopore)
Call variants with a limit of 200x coverage, as recommended by the ARTIC network. While indels and SNVs are reported for Illumina data, only SNVs are reported for Oxford Nanopore based on benchmarking studies that indicate small indel detection is unreliable.
Assign Pangolin and Nextclade lineages
Predict amino acid mutations
Predict consequences of compound variants (ex: adjacent SNVs on the same codon; frame-shifting indels followed by frame-restoring indels)

Rigorous quality checks are implemented throughout the pipeline, including flagging of variants in low complexity regions for error-prone Oxford Nanopore data and conservative lineage calls (no lineage assignments will be reported if the consensus sequence has too many N’s or is too fragmented).

In addition to a results JSON, a PDF report is generated for each sample that tells you at a glance whether primer dropout has occurred, which amino acid mutations are present, and whether the sample contains a variant of concern. An example report is shown below.

Quick start

docker build -t covid19 .

Run the pipeline in the Docker image (note that fastq files are stored in git lfs so you may need to git lfs pull before executing):

docker \
  run \
  --rm \
  --workdir /data \
  --volume `pwd`:/data \
  --entrypoint /bin/bash \
  --env ONE_CODEX_REPORT_FILENAME=report.pdf \
  --env INSTRUMENT_VENDOR=Illumina \
  --env ARTIC_PRIMER_VERSION=4.1 \
  covid19 \
  jobscript.sh \
  data/twist-target-capture/RNA_control_spike_in_10_6_100k_reads.fastq.gz

For Oxford Nanopore:

docker \
  run \
  --rm \
  --workdir /data \
  --volume `pwd`:/data \
  --entrypoint /bin/bash \
  --env ONE_CODEX_REPORT_FILENAME=report.pdf \
  --env INSTRUMENT_VENDOR="Oxford Nanopore" \
  --env ARTIC_PRIMER_VERSION=4.1 \
  covid19 \
  jobscript.sh \
  data/twist-target-capture/RNA_control_spike_in_10_6_100k_reads.fastq.gz

Development & Testing

To run tests, run pytest.

The requirements.txt file lists dependencies for quickly running some golden output tests across a variety of datasets. This repository is set up to use Github Actions to automatically build the Docker image and run these tests, to ensure that parameter and pipeline changes don't affect variant calls or consensus sequence generation.

Currently, integration tests are run on:

Simulated Illumina data from the SARS-CoV-2 reference including simulated variants across the genome
Example Twist hybrid capture data (Illumina)
Example ARTIC v1 amplicon sequencing data (Illumina)

It also uses pre-commit to keep things clean and orderly. To get started, first install the requirements (Python 3 required): pip install -r requirements.txt. Then install the pre-commit hooks: pre-commit install --install-hooks.

Acknowledgments

Many thanks are due across the community, including but not limited to:

@tseemann, @gkarthik, @nickloman, and many others for quick discussions on optimal SNP calling for both amplicon (ARTIC primers) and non-amplicon sequencing approaches
@nickloman, @joshquick, @rambaut, @k-florek and others working on the ARTIC protocol for SARS-CoV-2
@pangolin and @nextclade for surveillance tools
Voigt lab for dnaplotlib

Name		Name	Last commit message	Last commit date
Latest commit History 272 Commits
.github		.github
data		data
reference		reference
tests		tests
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
covid19_call_variants.ont.sh		covid19_call_variants.ont.sh
covid19_call_variants.sh		covid19_call_variants.sh
environment.yml		environment.yml
generate_tsv.py		generate_tsv.py
insert_coverage_stats.py		insert_coverage_stats.py
jobscript.sh		jobscript.sh
post_process_variants.sh		post_process_variants.sh
report.ipynb		report.ipynb
requirements.txt		requirements.txt
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SARS-CoV-2 variant calling

Pipeline overview

Quick start

Development & Testing

Acknowledgments

About

Releases 6

Packages

Contributors 7

Languages

License

onecodex/sars-cov-2

Folders and files

Latest commit

History

Repository files navigation

SARS-CoV-2 variant calling

Pipeline overview

Quick start

Development & Testing

Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Releases 6

Packages 0

Contributors 7

Languages

Packages