Skip to content

Commit

Permalink
Merge pull request #31 from uclahs-cds/ts-regenotype-docs
Browse files Browse the repository at this point in the history
Regenotype documentation updates for v3.0.0 release
  • Loading branch information
timothyjsanders authored Jul 2, 2021
2 parents 82e2cee + 5c5c1b4 commit 314cad6
Show file tree
Hide file tree
Showing 3 changed files with 37 additions and 5 deletions.
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,16 @@
All notable changes to the call-gSV pipeline.

## [Unreleased]
---

## [3.0.0] - 2021-07-01
### Added
- Added ability to call germline SVs with Manta
- Added parameters to control which SV caller is used (run_delly & run_manta)
- Added pipeline version from manifest to pipeline logging output
- Added CNV VCF outputs to QC processes
- Added two new processes to regenotype SVs or CNVs with Delly (regenotype_gSV_Delly & regenotype_gCNV_Delly)
- Added parameters and input mode to control which type of regenotyping is run (mode, run_regenotyping & run_discovery)

### Changed
- Changed output directories to correspond with tool name casing
Expand Down
35 changes: 31 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,13 @@
- [How To Run](#how-to-run)
- [Flow Diagram](#flow-diagram)
- [Pipeline Steps](#pipeline-steps)
- [1. Calling Structural Variants](#1-calling-structural-variants)
- [2. Calling Copy Number Variants](#2-calling-copy-number-variants)
- [3. Check Output Quality](#3-check-output-quality)
- [Discovery](#discovery)
- [1. Calling Structural Variants](#1-calling-structural-variants)
- [2. Calling Copy Number Variants](#2-calling-copy-number-variants)
- [3. Check Output Quality](#3-check-output-quality)
- [Regenotyping](#regenotyping)
- [1. Regenotyping Structural Variants](#1-regenotyping-structural-variants)
- [2. Regenotyping Copy Number Variants](#2-regenotyping-copy-number-variants)
- [Inputs](#inputs)
- [Outputs](#outputs)
- [Testing and Validation](#testing-and-validation)
Expand All @@ -20,7 +24,7 @@

## Overview

The call-gSV nextflow pipeline, calls structural variants and copy number variants utilizing [Delly](https://github.com/dellytools/delly) and [Manta](https://github.com/Illumina/manta). It is suitable for detecting copy-number variable deletion and tandem duplication events as well as balanced rearrangements such as inversions or reciprocal translocations and validates the output quality with [BCFtools](https://github.com/samtools/bcftools). The pipeline has been engineered to run in a 4 layer stack in a cloud-based scalable environment of CycleCloud, Slurm, Nextflow and Docker. Additionally it has been validated with the SMC-HET dataset and reference GRCh38 reference genome, where paired-end FASTQ's were created with BAM Surgeon.
The call-gSV nextflow pipeline, calls structural variants and copy number variants utilizing [Delly](https://github.com/dellytools/delly) and [Manta](https://github.com/Illumina/manta). Additionally, the pipeline can also regenotype previously identified structural variants or copy number variants with Delly. It is suitable for detecting copy-number variable deletion and tandem duplication events as well as balanced rearrangements such as inversions or reciprocal translocations and validates the output quality with [BCFtools](https://github.com/samtools/bcftools). The pipeline has been engineered to run in a 4 layer stack in a cloud-based scalable environment of CycleCloud, Slurm, Nextflow and Docker. Additionally it has been validated with the SMC-HET dataset and the GRCh38 reference genome, using paired-end FASTQ's that were back-extracted from BAMs created by BAM Surgeon.

<b><i>Developer's Notes:</i></b>

Expand Down Expand Up @@ -59,6 +63,10 @@ A directed acyclic graph of your pipeline.

## Pipeline Steps

### Discovery

The "discovery" branch of the call-gSV pipeline allows you to identify germline structural variants and copy number variants utilizing either Delly or Manta. After variants are identified, basic quality checks are performed on the outputs of the processes.

### 1. Calling Structural Variants

The first step of the pipeline requires an aligned and sorted BAM file and BAM index as an input for variant calling with [Delly](https://github.com/dellytools/delly) or [Manta](https://github.com/Illumina/manta). Delly combines short-range and long-range paired-end mapping and split-read analysis for the discovery of balanced and unbalanced structural variants at single-nucleotide breakpoint resolution (deletions, tandem duplications, inversions and translocations.) Structural variants are called, annotated and merged into a single BCF file. A default exclude map of Delly can be incorporated as an input which removes the telomeric and centromeric regions of all human chromosomes since these regions cannot be accurately analyzed with short-read data.
Expand Down Expand Up @@ -103,6 +111,21 @@ Currently the following filters are applied by Delly when calling copy number v
### 3. Check Output Quality

For Delly, VCF files are generated from the BCFs to run the vcf-validate command from [VCFTools](https://vcftools.github.io/perl_module.html#vcf-validator) and vcfstats from [RTGTools](https://cdn.rawgit.com/RealTimeGenomics/rtg-tools/master/installer/resources/tools/RTGOperationsManual/rtg_command_reference.html#vcfstats). Outputs from both provide preliminary summary statistics that can be viewed and evaluated in preparation for downstream cohort-wide re-calling and re-genotyping. In the Manta branch of the pipeline, a stats directory is generated under the specific output directory <outputDirectory>/Manta-<version number>/results/stats where information can be found regarding the SVs identified.
<br>

### Regenotyping

The "regenotyping" branch of the call-gSV pipeline allows you to regenotype previously identified structural variants or copy number variants using Delly.

### 1. Regenotyping Structural Variants

Similar to the "discovery" process, the first step of the regenotyping pipeline requires an aligned and sorted BAM file, BAM index, and a merged sites BCF (from the merge-SVsites pipeline) as inputs for structural variant regenotyping with [Delly](https://github.com/dellytools/delly). The provided sample is genotyped with the merged sites list. Structural variants are annotated and merged into a single BCF file. A default exclude map of Delly can be incorporated as an input which removes the telomeric and centromeric regions of all human chromosomes since these regions cannot be accurately analyzed with short-read data.
<br>

### 2. Regenotyping Copy Number Variants

The second possible step of the regenotyping pipeline requires an aligned and sorted BAM file, BAM index, and a merged sites BCF as an input, as well as the BCF output from the initial structural variant calling (to refine breakpoints) and a mappability map. Any CNVs identified are annotated and output as a single BCF file.
<br>

---

Expand All @@ -117,6 +140,7 @@ For Delly, VCF files are generated from the BCFs to run the vcf-validate command
| patient | string | The patient name to be passed to final BCF/VCF. No white space is allowed. |
| sample | string | The sample name to be passed to final BCF/VCF. No white space is allowed. |
| input_bam | path | Absolute path to the BAM file for the sample. |
| mode | string | When running regenotyping, either 'SV' or 'CNV'. Required for regenotyping, ignored in the discovery mode and can be left blank. |

### Nextflow Config File Parameters

Expand All @@ -125,6 +149,9 @@ For Delly, VCF files are generated from the BCFs to run the vcf-validate command
| `dataset_id` | yes | string | Boutros lab dataset id. |
| `blcds_registered_dataset` | yes | boolean | Affirms if dataset should be registered in the Boutros Lab Data registry. Default value is false. |
| `sge_scheduler` | yes | boolean | Affirms whether job will be executed on the SGE cluster. Default value is false. |
| `run_discovery` | yes | boolean | Specifies whether or not to run the "disovery" branch of the pipeline. Default value is true. (either run_discovery or run_regenotyping must be true) |
| `run_regenotyping` | yes | boolean | Specifies whether or not to run the "regenotyping" branch of the pipeline. Default value is false. (either run_discovery or run_regenotyping must be true) |
| `merged_sites` | yes | path | The path to the merged sites.bcf file. Must be populated if running the regenotyping branch. |
| `input_csv` | yes | string | Absolute path to the input csv file for the pipeline. |
| `reference_fasta` | yes | path | Absolute path to the reference genome `fasta` file. The reference genome is used by Delly for structural variant calling. |
| `reference_fasta_index` | no | path | Absolute path to the reference genome `fasta` index file. The reference genome is used by Delly for structural variant calling. If this path is not specified, call-gSV will assume the index file exists in the same directory as the `reference_fasta` |
Expand Down
Loading

0 comments on commit 314cad6

Please sign in to comment.