diff --git a/CHANGELOG.md b/CHANGELOG.md
index f808a8df..343c4eb0 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -2,11 +2,16 @@
All notable changes to the call-gSV pipeline.
## [Unreleased]
+---
+
+## [3.0.0] - 2021-07-01
### Added
- Added ability to call germline SVs with Manta
- Added parameters to control which SV caller is used (run_delly & run_manta)
- Added pipeline version from manifest to pipeline logging output
- Added CNV VCF outputs to QC processes
+- Added two new processes to regenotype SVs or CNVs with Delly (regenotype_gSV_Delly & regenotype_gCNV_Delly)
+- Added parameters and input mode to control which type of regenotyping is run (mode, run_regenotyping & run_discovery)
### Changed
- Changed output directories to correspond with tool name casing
diff --git a/README.md b/README.md
index 5a781552..67420c25 100644
--- a/README.md
+++ b/README.md
@@ -5,9 +5,13 @@
- [How To Run](#how-to-run)
- [Flow Diagram](#flow-diagram)
- [Pipeline Steps](#pipeline-steps)
- - [1. Calling Structural Variants](#1-calling-structural-variants)
- - [2. Calling Copy Number Variants](#2-calling-copy-number-variants)
- - [3. Check Output Quality](#3-check-output-quality)
+ - [Discovery](#discovery)
+ - [1. Calling Structural Variants](#1-calling-structural-variants)
+ - [2. Calling Copy Number Variants](#2-calling-copy-number-variants)
+ - [3. Check Output Quality](#3-check-output-quality)
+ - [Regenotyping](#regenotyping)
+ - [1. Regenotyping Structural Variants](#1-regenotyping-structural-variants)
+ - [2. Regenotyping Copy Number Variants](#2-regenotyping-copy-number-variants)
- [Inputs](#inputs)
- [Outputs](#outputs)
- [Testing and Validation](#testing-and-validation)
@@ -20,7 +24,7 @@
## Overview
-The call-gSV nextflow pipeline, calls structural variants and copy number variants utilizing [Delly](https://github.com/dellytools/delly) and [Manta](https://github.com/Illumina/manta). It is suitable for detecting copy-number variable deletion and tandem duplication events as well as balanced rearrangements such as inversions or reciprocal translocations and validates the output quality with [BCFtools](https://github.com/samtools/bcftools). The pipeline has been engineered to run in a 4 layer stack in a cloud-based scalable environment of CycleCloud, Slurm, Nextflow and Docker. Additionally it has been validated with the SMC-HET dataset and reference GRCh38 reference genome, where paired-end FASTQ's were created with BAM Surgeon.
+The call-gSV nextflow pipeline, calls structural variants and copy number variants utilizing [Delly](https://github.com/dellytools/delly) and [Manta](https://github.com/Illumina/manta). Additionally, the pipeline can also regenotype previously identified structural variants or copy number variants with Delly. It is suitable for detecting copy-number variable deletion and tandem duplication events as well as balanced rearrangements such as inversions or reciprocal translocations and validates the output quality with [BCFtools](https://github.com/samtools/bcftools). The pipeline has been engineered to run in a 4 layer stack in a cloud-based scalable environment of CycleCloud, Slurm, Nextflow and Docker. Additionally it has been validated with the SMC-HET dataset and the GRCh38 reference genome, using paired-end FASTQ's that were back-extracted from BAMs created by BAM Surgeon.
Developer's Notes:
@@ -59,6 +63,10 @@ A directed acyclic graph of your pipeline.
## Pipeline Steps
+### Discovery
+
+The "discovery" branch of the call-gSV pipeline allows you to identify germline structural variants and copy number variants utilizing either Delly or Manta. After variants are identified, basic quality checks are performed on the outputs of the processes.
+
### 1. Calling Structural Variants
The first step of the pipeline requires an aligned and sorted BAM file and BAM index as an input for variant calling with [Delly](https://github.com/dellytools/delly) or [Manta](https://github.com/Illumina/manta). Delly combines short-range and long-range paired-end mapping and split-read analysis for the discovery of balanced and unbalanced structural variants at single-nucleotide breakpoint resolution (deletions, tandem duplications, inversions and translocations.) Structural variants are called, annotated and merged into a single BCF file. A default exclude map of Delly can be incorporated as an input which removes the telomeric and centromeric regions of all human chromosomes since these regions cannot be accurately analyzed with short-read data.
@@ -103,6 +111,21 @@ Currently the following filters are applied by Delly when calling copy number v
### 3. Check Output Quality
For Delly, VCF files are generated from the BCFs to run the vcf-validate command from [VCFTools](https://vcftools.github.io/perl_module.html#vcf-validator) and vcfstats from [RTGTools](https://cdn.rawgit.com/RealTimeGenomics/rtg-tools/master/installer/resources/tools/RTGOperationsManual/rtg_command_reference.html#vcfstats). Outputs from both provide preliminary summary statistics that can be viewed and evaluated in preparation for downstream cohort-wide re-calling and re-genotyping. In the Manta branch of the pipeline, a stats directory is generated under the specific output directory /Manta-/results/stats where information can be found regarding the SVs identified.
+
+
+### Regenotyping
+
+The "regenotyping" branch of the call-gSV pipeline allows you to regenotype previously identified structural variants or copy number variants using Delly.
+
+### 1. Regenotyping Structural Variants
+
+Similar to the "discovery" process, the first step of the regenotyping pipeline requires an aligned and sorted BAM file, BAM index, and a merged sites BCF (from the merge-SVsites pipeline) as inputs for structural variant regenotyping with [Delly](https://github.com/dellytools/delly). The provided sample is genotyped with the merged sites list. Structural variants are annotated and merged into a single BCF file. A default exclude map of Delly can be incorporated as an input which removes the telomeric and centromeric regions of all human chromosomes since these regions cannot be accurately analyzed with short-read data.
+
+
+### 2. Regenotyping Copy Number Variants
+
+The second possible step of the regenotyping pipeline requires an aligned and sorted BAM file, BAM index, and a merged sites BCF as an input, as well as the BCF output from the initial structural variant calling (to refine breakpoints) and a mappability map. Any CNVs identified are annotated and output as a single BCF file.
+
---
@@ -117,6 +140,7 @@ For Delly, VCF files are generated from the BCFs to run the vcf-validate command
| patient | string | The patient name to be passed to final BCF/VCF. No white space is allowed. |
| sample | string | The sample name to be passed to final BCF/VCF. No white space is allowed. |
| input_bam | path | Absolute path to the BAM file for the sample. |
+| mode | string | When running regenotyping, either 'SV' or 'CNV'. Required for regenotyping, ignored in the discovery mode and can be left blank. |
### Nextflow Config File Parameters
@@ -125,6 +149,9 @@ For Delly, VCF files are generated from the BCFs to run the vcf-validate command
| `dataset_id` | yes | string | Boutros lab dataset id. |
| `blcds_registered_dataset` | yes | boolean | Affirms if dataset should be registered in the Boutros Lab Data registry. Default value is false. |
| `sge_scheduler` | yes | boolean | Affirms whether job will be executed on the SGE cluster. Default value is false. |
+| `run_discovery` | yes | boolean | Specifies whether or not to run the "disovery" branch of the pipeline. Default value is true. (either run_discovery or run_regenotyping must be true) |
+| `run_regenotyping` | yes | boolean | Specifies whether or not to run the "regenotyping" branch of the pipeline. Default value is false. (either run_discovery or run_regenotyping must be true) |
+| `merged_sites` | yes | path | The path to the merged sites.bcf file. Must be populated if running the regenotyping branch. |
| `input_csv` | yes | string | Absolute path to the input csv file for the pipeline. |
| `reference_fasta` | yes | path | Absolute path to the reference genome `fasta` file. The reference genome is used by Delly for structural variant calling. |
| `reference_fasta_index` | no | path | Absolute path to the reference genome `fasta` index file. The reference genome is used by Delly for structural variant calling. If this path is not specified, call-gSV will assume the index file exists in the same directory as the `reference_fasta` |
diff --git a/call-gSV-flowchart-diagram.drawio.svg b/call-gSV-flowchart-diagram.drawio.svg
index 6e3f3551..e932cb6e 100644
--- a/call-gSV-flowchart-diagram.drawio.svg
+++ b/call-gSV-flowchart-diagram.drawio.svg
@@ -1,3 +1,3 @@
-
\ No newline at end of file
+
\ No newline at end of file