Skip to content

Commit

Permalink
Merge branch 'master' into dependabot/github_actions/jlumbroso/free-d…
Browse files Browse the repository at this point in the history
…isk-space-1.3.1
  • Loading branch information
alethomas authored Apr 9, 2024
2 parents 511d4fd + 20a8034 commit 3c0250b
Show file tree
Hide file tree
Showing 15 changed files with 532 additions and 179 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/check-todos.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,4 +11,4 @@ jobs:
steps:
- uses: "actions/checkout@master"
- name: "TODO to Issue"
uses: "alstr/todo-to-issue-action@v4.12"
uses: "alstr/todo-to-issue-action@v4.13"
59 changes: 0 additions & 59 deletions docs/about.md

This file was deleted.

Binary file added docs/assets/Report_panel.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/UnCoVar_favicon.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/UnCoVar_virus_white_no_shadow.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/UnCoVar_wf_new.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
158 changes: 107 additions & 51 deletions docs/configuration.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,13 @@
# Configuration
# Advanced Configuration

The config file, found under `config/config.yaml` can be used to adapt your analysis.

## Execution Mode

Accepted values: `patient`, `environment`. Defaults to `patient`.
```yaml
# execution mode. Can be either "patient" or "environment"
mode: environment
```
Defines the execution mode of UnCoVar.
Expand All @@ -15,73 +20,124 @@ environment (e.g. wastewater) and to contain different SARS-CoV-2 strains.
The parts of the workflow responsible for creating and analysing individual
genomes (e.g. assembly, lineage calling via Pangolin) are disabled.

## Adapters
## Sending lab number

There are three ways to transfer adapter sequences to UnCoVar to remove them
from the raw data.
UnCoVar automatically generates a multi-Fasta file and a corresponding `.csv` for
all samples with a `1` flag for `inlcude_in_high_genome_summary` in the sample sheet,
that match the given `quality-criteria` (see below). The reporting format and the
quality criteria are inspired by the [requirements for SARS-CoV-2 genome submission](https://www.rki.de/DE/Content/InfAZ/N/Neuartiges_Coronavirus/DESH/Qualitaetskriterien.pdf?__blob=publicationFile)
to the [Robert-Koch-Institute, Germany](https://www.rki.de/DE/Content/InfAZ/N/Neuartiges_Coronavirus/nCoV.html).
The sending lab number will be included in the `.csv` file

### Config File
## Data handling

The adapter sequences used can be specified in the config file under
`preprocessing` -> `kit adapters`.
With the root of the UnCoVar workflow as working directory, we recommended to
use the following folder structure:

For **paired-end data**, the adapters can be detected by per-read overlap
analysis, which seeks the overlap for each pair of reads. The adapter sequences
can be specified for read one by `—adapter_sequence` and for
read two by`—adapter_sequence_r2`. An example for [Illuminas TruSeq library] (<https://www.illumina.com/products/by-type/sequencing-kits/library-prep-kits/truseq-rna-v2.html>)
is shown below:

```yaml
"--adapter_sequence = AGATCGGAAGAGCACACGTCTGAACTCCAGTCA
--adapter_sequence_r2 = AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT”
```text
├── archive
├── incoming
└── uncovar
└── data
└── 2023-12-24
```

Adapters for **single-end data** can be specified only using the
`—adapter_sequence` option.
The structure can be adjusted to via the config under `data-handling`:

```yaml
"--adapter_sequence = AGATCGGAAGAGCACACGTCTGAACTCCAGTCA"
data-handling:
# flag for using the following data-handling structure
# True: data-handling structure is used as shown below
# False: only the sample sheet needs to be updated (manually)
use-data-handling: True
# flag for archiving data
# True: data is archived in path defined below
# False: data is not archived
archive-data: False
# path of incoming data, which is moved to the
# data directory by the preprocessing script
incoming: ../incoming/
# path to store data within the workflow
data: data/
# path to archive data from incoming and
# the results from the latest run to
archive: ../archive/
```

### Sample Sheet

The second way to remove adapter sequences is to specify the adapter sequence
per sample in the sample sheet. The adapters must be entered in a column
called `adapters`. For paired-end and single-end format, see above. Here is
an exemplary samples sheet:

| sample_name | fq1 | fq2 | date | is_amplicon_data | technology | adapters |
| ----------- | ----------- | ----------- | ---------- | ---------------- | ---------- | -------------------------------------------------- |
| example-1 | PATH/TO/fq1 | PATH/TO/fq2 | 1970-01-01 | 1 | illumina | --adapter_sequence=ACGT --adapter_sequence_r2=TGCA |
| example-2 | PATH/TO/fq | | 1970-01-01 | 1 | ion | --adapter_sequence=ACGT |

If an adapter sequence is specified for a sample in the sample sheet, this
adapter sequence is used to trim the sequences of only this sample. For
empty entries, UnCoVar uses the adapter sequence from the config file.

### Pre-Defined Adapters

UnCoVar supports two different sequencing kits and their respective adapters,
namely:
## Quality criteria

1. [Revelo RNA-Seq library preparation kit](https://lifesciences.tecan.com/revelo-rna-seq-library-prep-kit?p=tab--5)
1. [EasySeq RC-PCR SARS CoV-2 Whole Genome Sequencing kit](https://www.nimagen.com/shop/products/rc-cov096/easyseq-sars-cov-2-novel-coronavirus-whole-genome-sequencing-kit)
The quality criteria can be adjusted to your individual needs. By default they match
the quality criteria needed for submitting to the RKI (see **Sending lab number**
above)

The `adapters` column in the sample sheet is used to trim the adapter sequences
of these kits. Revelo adapters are trimmed by specifying
`revelo-rna-seq` in the column per sample, while the Nimagen adapters are
removed by specifying `nimagen-easy-seq`. A short example:
```yaml
quality-criteria:
illumina:
# minimal length of acceptable reads
min-length-reads: 30
# average quality of acceptable reads (PHRED)
min-PHRED: 20
ont:
# minimal length of acceptable reads
min-length-reads: 200
# average quality of acceptable reads (PHRED)
min-PHRED: 10
# identity to virus reference genome (see-above) of reconstructed sequence
min-identity: 0.9
# share N in the reconstructed sequence
max-n: 0.05
# minimum local sequencing depth without filtering of PCR duplicates
min-depth-with-PCR-duplicates: 20
# minimum local sequencing depth after filtering PCR duplicates
min-depth-without-PCR-duplicates: 10
# minimum informative allele frequency
min-allele: 0.9
```

| sample_name | fq1 | fq2 | date | is_amplicon_data | technology | adapters |
| ----------- | ----------- | ----------- | ---------- | ---------------- | ---------- | ---------------- |
| example-1 | PATH/TO/fq1 | PATH/TO/fq2 | 1970-01-01 | 0 | illumina | revelo-rna-seq |
| example-2 | PATH/TO/fq | PATH/TO/fq2 | 1970-01-01 | 1 | illumina | nimagen-easy-seq |
## Preprocessing

### Customized Primer Removal
Here different preprocessing can be adjustet. Per default the standard Illumina adapters
are trimmed. For samples prepared with an amplicon sequencing approach, you can
define the path to the primer file in `.bed` format. If you are processing Nanopore
samples, you can also define the primer version via changing the number.

The default primer file is a bed file from the [ARTIC network](https://github.com/artic-network/artic-ncov2019/tree/master/primer_schemes/nCoV-2019/V3>).
However, the primers for clipping can be customized. First, the custom primers must
be saved in bed format. Next, the path to this file must be changed in the config.
Go to the config folder and open config.yaml. In the "preprocessing" subcategory,
change the path after "amplicon-primers" to the path where your primer file
can be found.

```yaml
preprocessing:
# only for *non* Oxford Nanopore data. Adapters to trim.
# see: https://www.nimagen.com/shop/products/rc-cov096/easyseq-sars-cov-2-novel-coronavirus-whole-genome-sequencing-kit
kit-adapters: "--adapter_sequence GCGAATTTCGACGATCGTTGCATTAACTCGCGAA --adapter_sequence_r2 AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT"
# only for Oxford Nanopore data.
# ARTIC primer version to clip from reads. See
# https://github.com/artic-network/artic-ncov2019/tree/master/primer_schemes/nCoV-2019/V4
# for more information
artic-primer-version: 4
# path to amplicon primers in bed format for hard-clipping on paired end files (illumina) or url to file that should be downloaded
amplicon-primers: "resources/SARS-CoV-2-artic-v4_1.primer.bed"
# GenBank accession of reference sequence of the amplicon primers
amplicon-reference: "MN908947"
```

## Assembly

In this section you define which assembler you want to use for the genome reconstruction.
UnCoVar uses MEGAHIT and metaSPAdes by default, as those achieved the best results
in a benchmarking comparison. The assembly options can be changed independently.

There are several other options available:

- megahit-std
- megahit-meta-large
- megahit-meta-sensitive
- trinity
- velvet
- metaspades
- coronaspades
- spades
- rnaviralspades
12 changes: 0 additions & 12 deletions docs/faq.md

This file was deleted.

40 changes: 24 additions & 16 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -1,30 +1,38 @@
# UnCoVar -- an open, extensible framework for virus genome analysis
<h1>
Workflow for Transparent and Robust Virus Variant Calling, Genome Reconstruction
and Lineage Assignment
</h1>

<picture>
<source media="(prefers-color-scheme: dark)" srcset="https://github.com/IKIM-Essen/uncovar/assets/77535027/8e17c6fc-ff7a-4c25-afc9-7888036d693e">
<source media="(prefers-color-scheme: light)" srcset="https://github.com/IKIM-Essen/uncovar/assets/77535027/c99f5a94-749b-422e-b319-1e3700d40a8e">
<img alt="UnCoVar Logo dark/light">
</picture>

[![Snakemake](https://img.shields.io/badge/snakemake-≥6.3.0-brightgreen.svg)](https://snakemake.bitbucket.io)
[![Snakemake](https://img.shields.io/badge/snakemake-≥7.32.4-brightgreen.svg)](https://snakemake.bitbucket.io)
[![GitHub actions status](https://github.com/koesterlab/snakemake-workflow-sars-cov2/workflows/Tests/badge.svg?branch=master)](https://github.com/koesterlab/snakemake-workflow-sars-cov2/actions?query=branch%3Amaster+workflow%3ATests)
[![Docker Repository on Quay](https://quay.io/repository/uncovar/uncovar/status "Docker Repository on Quay")](https://quay.io/repository/uncovar/uncovar)

A Reproducible and Scalable Workflow for Transparent and Robust Virus Variant Calling and Lineage Assignment using SARS-CoV-2 as an example.
## Workflow Overview

- Using state of the art tools, easily extended for other viruses
<img src="./assets/UnCoVar_wf_new.png" alt="UnCoVar workflow" width="90%"/>

![UnCoVar tools](./assets/tools.png)
## Highlights

- Tools and database updates for critical components via Conda
- Using state of the art tools, easily extended for other viruses

- Built using modern design patterns with Conda and SnakeMake
- Tool and database updates for critical components via Conda

- Built using modern design patterns with Conda and Snakemake

- Extensible and easy to customize

- Submission Ready Genomes

- Customizable reporting with comprehensive visualization

![UnCoVar visuals](./assets/uncovar-displays.png)
![UnCoVar visuals](./assets/Report_panel.png)

- Submission Ready Genomes
Four different example elements of the results generated by UnCoVar:

- a: The genome coverage of the aligned reads, visualized for multiple samples

- b: evaluation of known protein alterations from VOCs for one sample

- c: a pileup of reads at the position of one protein alteration. The mutations
observed for multiple reads (grey bars) for a single sample, here in the S gene

- d: The lineage assignments inferred for single reads for one sample
Loading

0 comments on commit 3c0250b

Please sign in to comment.