From 3a167aa623fa0dee1291de3c9a58b58cb8cf6510 Mon Sep 17 00:00:00 2001 From: Sorel Fitz-Gibbon Date: Fri, 27 Jan 2023 13:55:13 -0800 Subject: [PATCH 1/8] somatic-sniper done --- README.md | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+) diff --git a/README.md b/README.md index 6bca3619..18d75cb6 100644 --- a/README.md +++ b/README.md @@ -105,6 +105,30 @@ MuSE source: https://github.com/wwylab/MuSE Version: 2.0 (Released on Aug 25, 2021) GitHub Package: https://github.com/uclahs-cds/docker-MuSE/pkgs/container/muse +## Pipeline Steps + +### SomaticSniper +#### 1. SomaticSniper v1.0.5.0 +Compares a pair of tumor and normal bam files and outputs unfiltered vcf file listing single nucleotide positions that are different between tumor and normal. +#### 2. Filter out ambiguous positions. +This takes several steps, listed below, and starts with the same input files given to SomaticSniper. +##### a. Get pileup summaries +Summarizes counts of reads that support reference, alternate and other alleles for given sites. This is done for both input bam files and the results are used in the next step. +##### b. Filter pileup outputs +Uses `samtools.pl varFilter` to filter each pileup output (tumor and normal), then further filters each to keep only indels with QUAL > 20. `samtools.pl` is packaged with SomaticSniper. +##### c. Filter SomaticSniper vcf +Uses `snpfilter.pl` (packaged with SomaticSniper): +i. filter SomaticSniper vcf using normal indel pileup (from step `b`). +ii. filter vcf output from step `i` using tumor indel pileup (from step `b`). +##### d. Summarize alignment information for retained variant positions +Extract positions from filtered vcf file and use with `bam-readcount` to generate a summary of read alignment metrics for each position. +##### e. Final filtering of variants using metrics summarized above +Uses `fpfilter.pl` (packaged with SomaticSniper), resulting in final high confidence vcf file. + +### Strelka2 +### Mutect 2 +### MuSE + ## Inputs To run the pipeline, one `input.yaml` and one `input.config` are needed, as follows. From 3c85941ce9e435ba8ad1c70fdb30f117a87d7091 Mon Sep 17 00:00:00 2001 From: Sorel Fitz-Gibbon Date: Thu, 2 Feb 2023 17:24:43 -0800 Subject: [PATCH 2/8] pipeline steps for strelka and mutect2 --- README.md | 37 ++++++++++++++++++++++++++++++++----- module/mutect2-processes.nf | 2 +- 2 files changed, 33 insertions(+), 6 deletions(-) diff --git a/README.md b/README.md index 18d75cb6..b25d55d4 100644 --- a/README.md +++ b/README.md @@ -19,6 +19,7 @@ SomaticSniper, Strelka2, and MuSE require there to be **exactly one pair of inpu ### Somatic SNV callers: * [SomaticSniper](https://github.com/genome/somatic-sniper) +* add a sentence for each tool here? * [Strelka2](https://github.com/Illumina/strelka) * [Mutect2](https://gatk.broadinstitute.org/hc/en-us/articles/360037593851-Mutect2) * [MuSE](https://github.com/wwylab/MuSE) @@ -109,24 +110,50 @@ GitHub Package: https://github.com/uclahs-cds/docker-MuSE/pkgs/container/muse ### SomaticSniper #### 1. SomaticSniper v1.0.5.0 -Compares a pair of tumor and normal bam files and outputs unfiltered vcf file listing single nucleotide positions that are different between tumor and normal. +Compare a pair of tumor and normal bam files and output an unfiltered list of single nucleotide positions that are different between tumor and normal, in vcf format. #### 2. Filter out ambiguous positions. This takes several steps, listed below, and starts with the same input files given to SomaticSniper. ##### a. Get pileup summaries -Summarizes counts of reads that support reference, alternate and other alleles for given sites. This is done for both input bam files and the results are used in the next step. +Summarize counts of reads that support reference, alternate and other alleles for given sites. This is done for both of the input bam files and the results are used in the next step. ##### b. Filter pileup outputs -Uses `samtools.pl varFilter` to filter each pileup output (tumor and normal), then further filters each to keep only indels with QUAL > 20. `samtools.pl` is packaged with SomaticSniper. +Use `samtools.pl varFilter` to filter each pileup output (tumor and normal), then further filters each to keep only indels with QUAL > 20. `samtools.pl` is packaged with SomaticSniper. ##### c. Filter SomaticSniper vcf -Uses `snpfilter.pl` (packaged with SomaticSniper): +Use `snpfilter.pl` (packaged with SomaticSniper): i. filter SomaticSniper vcf using normal indel pileup (from step `b`). ii. filter vcf output from step `i` using tumor indel pileup (from step `b`). ##### d. Summarize alignment information for retained variant positions Extract positions from filtered vcf file and use with `bam-readcount` to generate a summary of read alignment metrics for each position. ##### e. Final filtering of variants using metrics summarized above -Uses `fpfilter.pl` (packaged with SomaticSniper), resulting in final high confidence vcf file. +Use `fpfilter.pl` (packaged with SomaticSniper), resulting in a final high confidence vcf file. ### Strelka2 +####1. Manta v1.6.0 +The input pair of tumor/normal bam files are used by Manta to produce candidate small indels via the Manta somatic configuration protocol. [Note, larger (structural) variants are also produced and can be retrieved from the intermediate files directory if save intermediate files is enabled.] +####2. Strelka2 v2.9.10 +The input pair of tumor/normal bam files, along with the candidate small indel file produced by Manta are used by Strelka2 to create lists of somatic single nucleotide and small indel variants, both in vcf format. Lower quality variants that did not pass filtering are subsequently removed, yielding somatic_snvs_pass.vcf and somatic_indels_pass.vcf files + + ### Mutect 2 + +####1. Intervals not provided +In this case calls are made for the entire genome, first for the non-assembled canonical/assembled chromosomes, then for the canonical chromosomes. + - Split the set of non-canonical chromosomes into x intervals for parallelization, where x is defined by the input scatter count. + - Call somatic variants in non-canonical chromosomes with `Mutect2`. + - Split the set of canonical chromosomes into x intervals for parallelization, where x is defined by the input scatter count. + - Call somatic variant in canonical chromosomes with `Mutect2`. + - Merge scattered canonical and non-canonical chromosome outputs (vcfs, statistics and read orientation information). + - Create artifact prior table based on read orientations with GATK's `LearnReadOrientationModel`. + - Filter calls with GATK's `FilterMutectCalls` + +####2. Intervals provided +In this case calls are made only for the intervals provided + - Split the set of provided intervals into x intervals for parallelization, where x is defined by the input scatter count. + - Call somatic variants for these intervals with `Mutect2`. + - Merge all scattered outputs (vcfs, statistics and read orientation information). + - Create artifact prior table based on read orientations with GATK's `LearnReadOrientationModel`. + - Filter calls with GATK's `FilterMutectCalls` + + ### MuSE diff --git a/module/mutect2-processes.nf b/module/mutect2-processes.nf index a497be0a..10297de8 100644 --- a/module/mutect2-processes.nf +++ b/module/mutect2-processes.nf @@ -79,7 +79,7 @@ process run_GetSampleName_Mutect2 { """ } -process call_sSNVInAssembledChromosomes_Mutect2 { +process call_sSNVInAssembledChromosomes_Mutect2 { // Intervals do not have to be in assembled chromosomes container params.docker_image_GATK publishDir path: "${params.workflow_output_dir}/intermediate/${task.process.split(':')[-1]}", From b762a3451a5f46f2b8b9db312786ba00099c2c0b Mon Sep 17 00:00:00 2001 From: Sorel Fitz-Gibbon Date: Fri, 3 Feb 2023 17:20:28 -0800 Subject: [PATCH 3/8] MuSE pipeline steps added --- README.md | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index b25d55d4..680c57b4 100644 --- a/README.md +++ b/README.md @@ -137,9 +137,9 @@ The input pair of tumor/normal bam files, along with the candidate small indel f ####1. Intervals not provided In this case calls are made for the entire genome, first for the non-assembled canonical/assembled chromosomes, then for the canonical chromosomes. - - Split the set of non-canonical chromosomes into x intervals for parallelization, where x is defined by the input scatter count. + - Split the set of non-canonical chromosomes into x intervals for parallelization, where x is defined by the input scatter_count. - Call somatic variants in non-canonical chromosomes with `Mutect2`. - - Split the set of canonical chromosomes into x intervals for parallelization, where x is defined by the input scatter count. + - Split the set of canonical chromosomes into x intervals for parallelization, where x is defined by the input scatter_count. - Call somatic variant in canonical chromosomes with `Mutect2`. - Merge scattered canonical and non-canonical chromosome outputs (vcfs, statistics and read orientation information). - Create artifact prior table based on read orientations with GATK's `LearnReadOrientationModel`. @@ -155,6 +155,12 @@ In this case calls are made only for the intervals provided ### MuSE +####1.`MuSE call` +This step carries out pre-filtering and calculating position-specific summary statistics using the Markov substitution model. +####2.`MuSE sump` +This step computes tier-based cutoffs from a sample-specific error model. +####3.Filter vcf +`MuSE` output has variants labeled as `PASS` or one of `Tier 1-5` for the lower confidence calls (`Tier 5` is lowest). This step keeps only variants labeled `PASS`. ## Inputs From f84c9036a62cc20b20adee8d49e6a9fc835161a8 Mon Sep 17 00:00:00 2001 From: Sorel Fitz-Gibbon Date: Sat, 4 Feb 2023 17:08:36 -0800 Subject: [PATCH 4/8] add brief tool descriptions and MuSE pipeline steps --- README.md | 67 +++++++++++++++++++++++++++++++++---------------------- 1 file changed, 40 insertions(+), 27 deletions(-) diff --git a/README.md b/README.md index 680c57b4..49e7bc7f 100644 --- a/README.md +++ b/README.md @@ -4,6 +4,7 @@ - [Overview](#overview) - [How To Run](#how-to-run) - [Flow Diagrams](#flow-diagrams) + - [Pipeline Steps](#pipeline-steps) - [Inputs](#inputs) - [Outputs](#outputs) - [Testing and Validation](#testing-and-validation) @@ -19,11 +20,13 @@ SomaticSniper, Strelka2, and MuSE require there to be **exactly one pair of inpu ### Somatic SNV callers: * [SomaticSniper](https://github.com/genome/somatic-sniper) -* add a sentence for each tool here? + `SomaticSniper` is an older tool yielding high specificity single nucleotide somatic variants. * [Strelka2](https://github.com/Illumina/strelka) + `Strelka2` here uses candidate indels from `Manta` and calls somatic short mutations (single nucleotide and small indel) filtered with a random forest model. * [Mutect2](https://gatk.broadinstitute.org/hc/en-us/articles/360037593851-Mutect2) + `GATK Mutect2` calls somatic short mutations via local assembly of haplotypes. * [MuSE](https://github.com/wwylab/MuSE) - + `Muse` accounts for tumor heterogeneity and calls single nucleotide somatic variants. ## How To Run Below is a summary of how to run the pipeline. See [here](https://confluence.mednet.ucla.edu/pages/viewpage.action?spaceKey=BOUTROSLAB&title=How+to+run+a+nextflow+pipeline) for more information on running Nextflow pipelines. @@ -59,7 +62,7 @@ python path/to/submit_nextflow_pipeline.py \ --partition_type F72 \ --email jdoe@ucla.edu ``` - +> **Note**: Although --partition_type F2 is an available option for small data sets, Mutect2 and Muse will fail due to lack of memory. --- @@ -127,39 +130,49 @@ Extract positions from filtered vcf file and use with `bam-readcount` to generat Use `fpfilter.pl` (packaged with SomaticSniper), resulting in a final high confidence vcf file. ### Strelka2 -####1. Manta v1.6.0 +#### 1. Manta v1.6.0 The input pair of tumor/normal bam files are used by Manta to produce candidate small indels via the Manta somatic configuration protocol. [Note, larger (structural) variants are also produced and can be retrieved from the intermediate files directory if save intermediate files is enabled.] -####2. Strelka2 v2.9.10 +#### 2. Strelka2 v2.9.10 The input pair of tumor/normal bam files, along with the candidate small indel file produced by Manta are used by Strelka2 to create lists of somatic single nucleotide and small indel variants, both in vcf format. Lower quality variants that did not pass filtering are subsequently removed, yielding somatic_snvs_pass.vcf and somatic_indels_pass.vcf files -### Mutect 2 - -####1. Intervals not provided -In this case calls are made for the entire genome, first for the non-assembled canonical/assembled chromosomes, then for the canonical chromosomes. - - Split the set of non-canonical chromosomes into x intervals for parallelization, where x is defined by the input scatter_count. - - Call somatic variants in non-canonical chromosomes with `Mutect2`. - - Split the set of canonical chromosomes into x intervals for parallelization, where x is defined by the input scatter_count. - - Call somatic variant in canonical chromosomes with `Mutect2`. - - Merge scattered canonical and non-canonical chromosome outputs (vcfs, statistics and read orientation information). - - Create artifact prior table based on read orientations with GATK's `LearnReadOrientationModel`. - - Filter calls with GATK's `FilterMutectCalls` - -####2. Intervals provided -In this case calls are made only for the intervals provided - - Split the set of provided intervals into x intervals for parallelization, where x is defined by the input scatter count. - - Call somatic variants for these intervals with `Mutect2`. - - Merge all scattered outputs (vcfs, statistics and read orientation information). - - Create artifact prior table based on read orientations with GATK's `LearnReadOrientationModel`. - - Filter calls with GATK's `FilterMutectCalls` +### GATK Mutect 2 + +#### 1. Intervals not provided + ##### a. Split non-canonical + Split the set of non-canonical chromosomes into x intervals for parallelization, where x is defined by the input scatter_count. + ##### b. Call non-canonical + Call somatic variants in non-canonical chromosomes with `Mutect2`. + ##### c. Split canonical + Split the set of canonical chromosomes into x intervals for parallelization, where x is defined by the input scatter_count. + ##### d. Call canonical + Call somatic variant in canonical chromosomes with `Mutect2`. + ##### e. Merge + Merge scattered canonical and non-canonical chromosome outputs (vcfs, statistics). + ##### f. Learn read orientations + Create artifact prior table based on read orientations with GATK's `LearnReadOrientationModel`. + ##### g. Filter + Filter variants with GATK's `FilterMutectCalls`, using read orientation prior table as well as standard filters. + +#### 2. Intervals provided + ##### a. Split + Split the set of provided intervals into x intervals for parallelization, where x is defined by the input scatter count. + ##### b. Call + Call somatic variants for the provided intervals with `Mutect2`. + ##### c. Merge + Merge scattered outputs (vcfs, statistics). + ##### d. Learn read orientations + Create artifact prior table based on read orientations with GATK's `LearnReadOrientationModel`. + ##### e. Filter + Filter variants with GATK's `FilterMutectCalls`, using read orientation prior table as well as standard filters. ### MuSE -####1.`MuSE call` +#### 1.`MuSE call` This step carries out pre-filtering and calculating position-specific summary statistics using the Markov substitution model. -####2.`MuSE sump` +#### 2.`MuSE sump` This step computes tier-based cutoffs from a sample-specific error model. -####3.Filter vcf +#### 3.Filter vcf `MuSE` output has variants labeled as `PASS` or one of `Tier 1-5` for the lower confidence calls (`Tier 5` is lowest). This step keeps only variants labeled `PASS`. From 0d0d51ddabd6d316682dc8ddcf714aec955db51c Mon Sep 17 00:00:00 2001 From: Sorel Fitz-Gibbon Date: Sat, 4 Feb 2023 17:22:29 -0800 Subject: [PATCH 5/8] tidy up --- README.md | 35 +++++++++++++++++------------------ 1 file changed, 17 insertions(+), 18 deletions(-) diff --git a/README.md b/README.md index 49e7bc7f..651e6807 100644 --- a/README.md +++ b/README.md @@ -19,14 +19,13 @@ The call-sSNV nextflow pipeline performs somatic SNV calling given a pair of tum SomaticSniper, Strelka2, and MuSE require there to be **exactly one pair of input tumor/normal** BAM files, but Mutect2 will take tumor-only input (no paired normal), as well as tumor/normal BAM pairs from multiple samples from the same individual. ### Somatic SNV callers: -* [SomaticSniper](https://github.com/genome/somatic-sniper) - `SomaticSniper` is an older tool yielding high specificity single nucleotide somatic variants. -* [Strelka2](https://github.com/Illumina/strelka) - `Strelka2` here uses candidate indels from `Manta` and calls somatic short mutations (single nucleotide and small indel) filtered with a random forest model. -* [Mutect2](https://gatk.broadinstitute.org/hc/en-us/articles/360037593851-Mutect2) - `GATK Mutect2` calls somatic short mutations via local assembly of haplotypes. -* [MuSE](https://github.com/wwylab/MuSE) - `Muse` accounts for tumor heterogeneity and calls single nucleotide somatic variants. +* [SomaticSniper](https://github.com/genome/somatic-sniper) is an older tool yielding high specificity single nucleotide somatic variants. + +* [Strelka2](https://github.com/Illumina/strelka) here uses candidate indels from `Manta` and calls somatic short mutations (single nucleotide and small indel) filtered with a random forest model. + +* [GATK Mutect2](https://gatk.broadinstitute.org/hc/en-us/articles/360037593851-Mutect2) calls somatic short mutations via local assembly of haplotypes. + +* [MuSE](https://github.com/wwylab/MuSE) accounts for tumor heterogeneity and calls single nucleotide somatic variants. ## How To Run Below is a summary of how to run the pipeline. See [here](https://confluence.mednet.ucla.edu/pages/viewpage.action?spaceKey=BOUTROSLAB&title=How+to+run+a+nextflow+pipeline) for more information on running Nextflow pipelines. @@ -112,17 +111,17 @@ GitHub Package: https://github.com/uclahs-cds/docker-MuSE/pkgs/container/muse ## Pipeline Steps ### SomaticSniper -#### 1. SomaticSniper v1.0.5.0 +#### 1. `SomaticSniper` v1.0.5.0 Compare a pair of tumor and normal bam files and output an unfiltered list of single nucleotide positions that are different between tumor and normal, in vcf format. #### 2. Filter out ambiguous positions. -This takes several steps, listed below, and starts with the same input files given to SomaticSniper. +This takes several steps, listed below, and starts with the same input files given to `SomaticSniper`. ##### a. Get pileup summaries Summarize counts of reads that support reference, alternate and other alleles for given sites. This is done for both of the input bam files and the results are used in the next step. ##### b. Filter pileup outputs -Use `samtools.pl varFilter` to filter each pileup output (tumor and normal), then further filters each to keep only indels with QUAL > 20. `samtools.pl` is packaged with SomaticSniper. +Use `samtools.pl varFilter` to filter each pileup output (tumor and normal), then further filters each to keep only indels with QUAL > 20. `samtools.pl` is packaged with `SomaticSniper`. ##### c. Filter SomaticSniper vcf -Use `snpfilter.pl` (packaged with SomaticSniper): -i. filter SomaticSniper vcf using normal indel pileup (from step `b`). +Use `snpfilter.pl` (packaged with `SomaticSniper`): +i. filter vcf using normal indel pileup (from step `b`). ii. filter vcf output from step `i` using tumor indel pileup (from step `b`). ##### d. Summarize alignment information for retained variant positions Extract positions from filtered vcf file and use with `bam-readcount` to generate a summary of read alignment metrics for each position. @@ -130,10 +129,10 @@ Extract positions from filtered vcf file and use with `bam-readcount` to generat Use `fpfilter.pl` (packaged with SomaticSniper), resulting in a final high confidence vcf file. ### Strelka2 -#### 1. Manta v1.6.0 -The input pair of tumor/normal bam files are used by Manta to produce candidate small indels via the Manta somatic configuration protocol. [Note, larger (structural) variants are also produced and can be retrieved from the intermediate files directory if save intermediate files is enabled.] -#### 2. Strelka2 v2.9.10 -The input pair of tumor/normal bam files, along with the candidate small indel file produced by Manta are used by Strelka2 to create lists of somatic single nucleotide and small indel variants, both in vcf format. Lower quality variants that did not pass filtering are subsequently removed, yielding somatic_snvs_pass.vcf and somatic_indels_pass.vcf files +#### 1. `Manta` v1.6.0 +The input pair of tumor/normal bam files are used by Manta to produce candidate small indels via the `Manta` somatic configuration protocol. [Note, larger (structural) variants are also produced and can be retrieved from the intermediate files directory if save intermediate files is enabled.] +#### 2. `Strelka2` v2.9.10 +The input pair of tumor/normal bam files, along with the candidate small indel file produced by `Manta` are used by `Strelka2` to create lists of somatic single nucleotide and small indel variants, both in vcf format. Lower quality variants that did not pass filtering are subsequently removed, yielding somatic_snvs_pass.vcf and somatic_indels_pass.vcf files ### GATK Mutect 2 @@ -203,7 +202,7 @@ input: contamination_table: /path/to/contamination.table ``` -* Mutect2 can take other inputs: tumor-only sample and one patient's multiple samples. The pipeline will define `params.tumor_only_mode`, `params.multi_tumor_sample`, and `params.multi_normal_sample`. For tumor-only samples, remove the normal input in `input.yaml`, e.g. [template_tumor_only.yaml](input/example-test-tumor-only.yaml). For multiple samples, put all the input BAMs in the `input.yaml`, e.g. [template_multi_sample.yaml](input/example-test-multi-sample.yaml). Note, for these non-standard inputs, the configuration file must have 'mutect2' listed as the only algorithm. +* `Mutect2` can take other inputs: tumor-only sample and one patient's multiple samples. The pipeline will define `params.tumor_only_mode`, `params.multi_tumor_sample`, and `params.multi_normal_sample`. For tumor-only samples, remove the normal input in `input.yaml`, e.g. [template_tumor_only.yaml](input/example-test-tumor-only.yaml). For multiple samples, put all the input BAMs in the `input.yaml`, e.g. [template_multi_sample.yaml](input/example-test-multi-sample.yaml). Note, for these non-standard inputs, the configuration file must have 'mutect2' listed as the only algorithm. ### input.config ([see template](config/template.config)) From 92eff7c73f6ce72fc24ec9696668832876bbdddd Mon Sep 17 00:00:00 2001 From: Sorel Fitz-Gibbon Date: Sat, 4 Feb 2023 19:49:28 -0800 Subject: [PATCH 6/8] update changelog --- CHANGELOG.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index 1e699ab0..1564ee2c 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -5,7 +5,10 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). ## [Unreleased] +### Changed +- Update `README`: add Pipeline Steps and Tool descriptions +## [6.0.0-rc.1] - 2023-1-30 ### Changed - Update to use `set_resources_allocation` from pipeline-Nextflow-config repo - Update SAMtools to v1.16.1 From 48d7303b35b5042320db91750f15afecdf485d42 Mon Sep 17 00:00:00 2001 From: Sorel Fitz-Gibbon Date: Sat, 4 Feb 2023 20:04:54 -0800 Subject: [PATCH 7/8] update version number in nextflow.config --- nextflow.config | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/nextflow.config b/nextflow.config index 7d5aab55..6b1e04d3 100644 --- a/nextflow.config +++ b/nextflow.config @@ -9,6 +9,6 @@ manifest { nextflowVersion = '>=20.07.1' author = 'Yuan Zhe (Caden) Bugh, Mao Tian, Sorel Fitz-Gibbon' homePage = 'https://github.com/uclahs-cds/pipeline-call-sSNV' - version = '5.0.0' + version = '6.0.0-rc.1' name = 'call-sSNV' } From 6b3b7f67d751d73eb53eee93342889e804961c01 Mon Sep 17 00:00:00 2001 From: Sorel Fitz-Gibbon Date: Tue, 7 Feb 2023 11:50:08 -0800 Subject: [PATCH 8/8] implemented Mao's suggestions --- README.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/README.md b/README.md index 651e6807..103192e4 100644 --- a/README.md +++ b/README.md @@ -112,7 +112,7 @@ GitHub Package: https://github.com/uclahs-cds/docker-MuSE/pkgs/container/muse ### SomaticSniper #### 1. `SomaticSniper` v1.0.5.0 -Compare a pair of tumor and normal bam files and output an unfiltered list of single nucleotide positions that are different between tumor and normal, in vcf format. +Compare a pair of tumor and normal bam files and output an unfiltered list of single nucleotide positions that are different between tumor and normal, in VCF format. #### 2. Filter out ambiguous positions. This takes several steps, listed below, and starts with the same input files given to `SomaticSniper`. ##### a. Get pileup summaries @@ -126,13 +126,13 @@ ii. filter vcf output from step `i` using tumor indel pileup (from step `b`). ##### d. Summarize alignment information for retained variant positions Extract positions from filtered vcf file and use with `bam-readcount` to generate a summary of read alignment metrics for each position. ##### e. Final filtering of variants using metrics summarized above -Use `fpfilter.pl` (packaged with SomaticSniper), resulting in a final high confidence vcf file. +Use `fpfilter.pl` and `highconfidence.pl` (packaged with SomaticSniper), resulting in a final high confidence vcf file. ### Strelka2 #### 1. `Manta` v1.6.0 -The input pair of tumor/normal bam files are used by Manta to produce candidate small indels via the `Manta` somatic configuration protocol. [Note, larger (structural) variants are also produced and can be retrieved from the intermediate files directory if save intermediate files is enabled.] +The input pair of tumor/normal bam files are used by Manta to produce candidate small indels via the `Manta` somatic configuration protocol. *Note, larger (structural) variants are also produced and can be retrieved from the intermediate files directory if save intermediate files is enabled.* #### 2. `Strelka2` v2.9.10 -The input pair of tumor/normal bam files, along with the candidate small indel file produced by `Manta` are used by `Strelka2` to create lists of somatic single nucleotide and small indel variants, both in vcf format. Lower quality variants that did not pass filtering are subsequently removed, yielding somatic_snvs_pass.vcf and somatic_indels_pass.vcf files +The input pair of tumor/normal bam files, along with the candidate small indel file produced by `Manta` are used by `Strelka2` to create lists of somatic single nucleotide and small indel variants, both in vcf format. Lower quality variants that did not pass filtering are subsequently removed, yielding `somatic_snvs_pass.vcf` and `somatic_indels_pass.vcf` files. ### GATK Mutect 2 @@ -143,7 +143,7 @@ The input pair of tumor/normal bam files, along with the candidate small indel f ##### b. Call non-canonical Call somatic variants in non-canonical chromosomes with `Mutect2`. ##### c. Split canonical - Split the set of canonical chromosomes into x intervals for parallelization, where x is defined by the input scatter_count. + Split the set of canonical chromosomes into x intervals for parallelization, where x is defined by the input `params.scatter_count`. ##### d. Call canonical Call somatic variant in canonical chromosomes with `Mutect2`. ##### e. Merge @@ -151,7 +151,7 @@ The input pair of tumor/normal bam files, along with the candidate small indel f ##### f. Learn read orientations Create artifact prior table based on read orientations with GATK's `LearnReadOrientationModel`. ##### g. Filter - Filter variants with GATK's `FilterMutectCalls`, using read orientation prior table as well as standard filters. + Filter variants with GATK's `FilterMutectCalls`, using read orientation prior table and contamination table as well as standard filters. #### 2. Intervals provided ##### a. Split