First full pass of nf-core documentation

nf-core · Mar 14, 2024 · 21243fc · 21243fc
1 parent 0881ae9
commit 21243fc
Show file tree

Hide file tree

Showing 10 changed files with 839 additions and 213 deletions.
diff --git a/README.md b/README.md
@@ -16,31 +16,29 @@
 
 ## Introduction
 
-**nf-core/oncoanalyser** is a Nextflow implementation of the comprehensive cancer DNA and RNA analysis and reporting
-workflow from the Hartwig Medical Foundation. For detailed information on each component of the Hartwig Medical
-Foundation workflow, please refer to [hartwigmedical/hmftools](https://github.com/hartwigmedical/hmftools/).
-
-The oncoanalyser pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool to run tasks across
-multiple compute infrastructures in a very portable manner. It uses Docker/Singularity containers making installation
-trivial and results highly reproducible. The [Nextflow DSL2](https://www.nextflow.io/docs/latest/dsl2.html)
-implementation of this pipeline uses one container per process which makes it much easier to maintain and update
-software dependencies. Where possible, these processes have been submitted to and installed from
-[nf-core/modules](https://github.com/nf-core/modules) in order to make them available to all nf-core pipelines, and to
-everyone within the Nextflow community!
-
-On release, automated continuous integration tests run the pipeline on a full-sized dataset on the AWS cloud
-infrastructure. This ensures that the pipeline runs on AWS, has sensible resource allocation defaults set to run on
-real-world datasets, and permits the persistent storage of results to benchmark between pipeline releases and other
-analysis sources. The results obtained from the full-sized test can be viewed on the [nf-core
-website](https://nf-co.re/oncoanalyser/results).
+**nf-core/oncoanalyser** is a Nextflow implementation of the comprehensive cancer DNA/RNA analysis and reporting
+workflow from the Hartwig Medical Foundation. Both the Hartwig WGS/WTS workflow and targeted sequencing workflow are
+available in oncoanalyser. The targeted sequencing workflow has built-in support for the TSO500 panel and can also run
+custom panels with externally-generated normalisation data.
+
+The key analysis results for each sample are summarised and presented in an ORANGE report (summary page excerpt shown
+below from *[COLO829_wgts.orange_report.pdf](https://pub-29f2e5b2b7384811bdbbcba44f8b5083.r2.dev/oncoanalyser/other/example_report/COLO829_wgts.orange_report.pdf)*):
+
+<p align='center'><img width='750' src='docs/images/COLO829_wgts.orange_report.summary_section.png'></p>
+
+For detailed information on each component of the Hartwig workflow, please refer to
+[hartwigmedical/hmftools](https://github.com/hartwigmedical/hmftools/).
 
 ## Pipeline summary
 
 The following processes and tools can be run with oncoanalyser:
 
-* SNV and MNV calling (`SAGE`, `PAVE`)
-* SV calling (`SV Prep`, `GRIDSS`, `GRIPSS`, `PURPLE`, `LINX`)
+* Simple DNA/RNA alignment (`bwa-mem2`, `STAR`)
+* Post-alignment processing (`MarkDups`)
+* SNV, MNV, INDEL calling (`SAGE`, `PAVE`)
 * CNV calling (`AMBER`, `COBALT`, `PURPLE`)
+* SV calling (`SvPrep`, `GRIDSS`, `GRIPSS`)
+* SV event interpretation (`LINX`)
 * Transcript analysis (`Isofox`)
 * Oncoviral detection (`VIRUSBreakend`, `Virus Interpreter`)
 * HLA calling (`LILAC`)
@@ -51,25 +49,25 @@ The following processes and tools can be run with oncoanalyser:
 
 ## Quick Start
 
-Create a samplesheet containing your inputs:
+Create a samplesheet with your inputs (WGS/WTS FASTQs in this example):
 
-```text
-group_id,subject_id,sample_id,sample_type,sequence_type,filetype,filepath
-P1__wgts,P1,SA,tumor,dna,bam,/path/to/SA.tumor.dna.wgs.bam
-P1__wgts,P1,SB,tumor,rna,bam,/path/to/SB.tumor.rna.wts.bam
-P1__wgts,P1,SC,normal,dna,bam,/path/to/SC.normal.dna.wgs.bam
+```csv
+group_id,subject_id,sample_id,sample_type,sequence_type,filetype,info,filepath
+P1__wgts,P1,SA,normal,dna,fastq,library_id:SA_library;lane:001,/path/to/SA.normal.dna.wgs.001.R1.fastq.gz;/path/to/SA.normal.dna.wgs.001.R2.fastq.gz
+P1__wgts,P1,SB,tumor,dna,fastq,library_id:SB_library;lane:001,/path/to/SB.tumor.dna.wgs.001.R1.fastq.gz;/path/to/SB.tumor.dna.wgs.001.R2.fastq.gz
+P1__wgts,P1,SC,tumor,rna,fastq,library_id:SC_library;lane:001,/path/to/SC.tumor.rna.wts.001.R1.fastq.gz;/path/to/SC.tumor.rna.wts.001.R2.fastq.gz
 ```
 
 Launch oncoanalyser:
 
 ```bash
 nextflow run nf-core/oncoanalyser \
-   -revision v0.3.1 \
-   -profile docker \
-   --mode wgts \
-   --genome GRCh38_hmf \
-   --input samplesheet.csv \
-   --outdir output/
+  -revision v0.3.1 \
+  -profile docker \
+  --mode wgts \
+  --genome GRCh38_hmf \
+  --input samplesheet.csv \
+  --outdir output/
 ```
 
 ## Documentation
@@ -78,16 +76,33 @@ The nf-core/oncoanalyser pipeline comes with documentation about the pipeline
 [usage](https://nf-co.re/oncoanalyser/usage), [parameters](https://nf-co.re/oncoanalyser/parameters) and
 [output](https://nf-co.re/oncoanalyser/output).
 
-## Version support
+## Version information
 
-As oncoanalyser is used in clinical settings and is subject to accreditation standards in some instances, there is a
-need for long-term stability and reliability for feature releases in order to meet operational requirements. This is
+### Extended support
+
+As oncoanalyser is used in clinical settings and subject to accreditation standards in some instances, there is a need
+for long-term stability and reliability for feature releases in order to meet operational requirements. This is
 accomplished through long-term support of several nominated feature releases, which all receive bug fixes and security
 fixes during the period of extended support.
 
 Each release that is given extended support is allocated a separate long-lived git branch with the 'stable' prefix, e.g.
 `stable/1.2.x`, `stable/1.5.x`. Feature development otherwise occurs on the `main` branch.
 
+Versions nominated to have current long-term support:
+
+* TBD
+
+### Release parity
+
+Versioning between oncoanalyser and hmftools naturally differ, however it is often necessary to relate the functional
+equivalence of these two pieces of software. The functional/feature parity with regards to version releases are detailed
+in the below table.
+
+| oncoanalyser        | hmftools |
+| ---                 | ---      |
+| 0.1.0 through 0.2.7 | 5.33     |
+| 0.3.0 through 0.3.1 | 5.34     |
+
 ## Credits
 
 The oncoanalyser pipeline was written by Stephen Watts while in the [Genomics Platform

diff --git a/REFERENCE_DATA_STAGING.md b/REFERENCE_DATA_STAGING.md
@@ -0,0 +1,93 @@
+# Reference data staging
+
+Download and unpack
+
+> All reference data is retrieved here, exclude unused files as desired; using GRCh38_hmf below
+
+```bash
+base_url=https://pub-29f2e5b2b7384811bdbbcba44f8b5083.r2.dev/genomes
+
+fps='
+genomes/GRCh37_hmf/Homo_sapiens.GRCh37.GATK.illumina.fasta
+genomes/GRCh37_hmf/bwa_index/0.7.17-r1188.tar.gz
+genomes/GRCh37_hmf/bwa_index/2.2.1/Homo_sapiens.GRCh37.GATK.illumina.fasta.0123
+genomes/GRCh37_hmf/bwa_index/2.2.1/Homo_sapiens.GRCh37.GATK.illumina.fasta.bwt.2bit.64
+genomes/GRCh37_hmf/bwa_index_image/0.7.17-r1188/Homo_sapiens.GRCh37.GATK.illumina.fasta.img
+genomes/GRCh37_hmf/gridss_index/2.13.2/Homo_sapiens.GRCh37.GATK.illumina.fasta.gridsscache
+genomes/GRCh37_hmf/samtools_index/1.16/Homo_sapiens.GRCh37.GATK.illumina.fasta.dict
+genomes/GRCh37_hmf/samtools_index/1.16/Homo_sapiens.GRCh37.GATK.illumina.fasta.fai
+genomes/GRCh37_hmf/star_index/gencode_19/2.7.3a.tar.gz
+genomes/GRCh38_hmf/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna
+genomes/GRCh38_hmf/bwa_index/0.7.17-r1188.tar.gz
+genomes/GRCh38_hmf/bwa_index/2.2.1/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.0123
+genomes/GRCh38_hmf/bwa_index/2.2.1/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.bwt.2bit.64
+genomes/GRCh38_hmf/bwa_index_image/0.7.17-r1188/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.img
+genomes/GRCh38_hmf/gridss_index/2.13.2/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gridsscache
+genomes/GRCh38_hmf/samtools_index/1.16/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.dict
+genomes/GRCh38_hmf/samtools_index/1.16/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.fai
+genomes/GRCh38_hmf/star_index/gencode_38/2.7.3a.tar.gz
+hmf_reference_data/hmftools/5.34_37--2.tar.gz
+hmf_reference_data/hmftools/5.34_38--2.tar.gz
+hmf_reference_data/panels/tso500_5.34_37--1.tar.gz
+hmf_reference_data/panels/tso500_5.34_38--1.tar.gz
+virusbreakend/virusbreakenddb_20210401.tar.gz
+'
+
+parallel -j4 wget -c -x -nH -P reference_data/ ${base_url}/{} ::: ${fps}
+find reference_data/ -name '*.tar.gz' | parallel -j0 'cd {//} && tar -xzvf {/}'
+```
+
+Create Nextflow config file for local reference data
+
+```bash
+cat <<EOF > refdata.local.config
+params {
+    genomes {
+        'GRCh37_hmf' {
+            fasta           = "$(pwd)/genomes/GRCh37_hmf/Homo_sapiens.GRCh37.GATK.illumina.fasta"
+            fai             = "$(pwd)/genomes/GRCh37_hmf/samtools_index/1.16/Homo_sapiens.GRCh37.GATK.illumina.fasta.fai"
+            dict            = "$(pwd)/genomes/GRCh37_hmf/samtools_index/1.16/Homo_sapiens.GRCh37.GATK.illumina.fasta.dict"
+            bwa_index       = "$(pwd)/genomes/GRCh37_hmf/bwa_index/0.7.17-r1188.tar.gz"
+            bwa_index_bseq  = "$(pwd)/genomes/GRCh37_hmf/bwa_index/2.2.1/Homo_sapiens.GRCh37.GATK.illumina.fasta.0123"
+            bwa_index_biidx = "$(pwd)/genomes/GRCh37_hmf/bwa_index/2.2.1/Homo_sapiens.GRCh37.GATK.illumina.fasta.bwt.2bit.64"
+            bwa_index_image = "$(pwd)/genomes/GRCh37_hmf/bwa_index_image/0.7.17-r1188/Homo_sapiens.GRCh37.GATK.illumina.fasta.img"
+            gridss_index    = "$(pwd)/genomes/GRCh37_hmf/gridss_index/2.13.2/Homo_sapiens.GRCh37.GATK.illumina.fasta.gridsscache"
+            star_index      = "$(pwd)/genomes/GRCh37_hmf/star_index/gencode_19/2.7.3a.tar.gz"
+        }
+        'GRCh38_hmf' {
+            fasta           = "$(pwd)/reference_data/genomes/GRCh38_hmf/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna"
+            fai             = "$(pwd)/reference_data/genomes/GRCh38_hmf/samtools_index/1.16/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.fai"
+            dict            = "$(pwd)/reference_data/genomes/GRCh38_hmf/samtools_index/1.16/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.dict"
+            bwa_index       = "$(pwd)/reference_data/genomes/GRCh38_hmf/bwa_index/0.7.17-r1188/"
+            bwa_index_bseq  = "$(pwd)/reference_data/genomes/GRCh38_hmf/bwa_index/2.2.1/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.0123"
+            bwa_index_biidx = "$(pwd)/reference_data/genomes/GRCh38_hmf/bwa_index/2.2.1/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.bwt.2bit.64"
+            bwa_index_image = "$(pwd)/reference_data/genomes/GRCh38_hmf/bwa_index_image/0.7.17-r1188/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.img"
+            gridss_index    = "$(pwd)/reference_data/genomes/GRCh38_hmf/gridss_index/2.13.2/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gridsscache"
+            star_index      = "$(pwd)/reference_data/genomes/GRCh38_hmf/star_index/gencode_38/2.7.3a/"
+        }
+    }
+
+    ref_data_hmf_data_path = "$(pwd)/reference_data/hmf_reference_data/hmftools/5.34_38--2/"
+    ref_data_panel_data_path = "$(pwd)/reference_data/hmf_reference_data/panels/tso500_5.34_38--1/"
+    ref_data_virusbreakenddb_path = "$(pwd)/reference_data/virusbreakend/virusbreakenddb_20210401/"
+}
+EOF
+```
+
+Run oncoanalyser with local reference data
+
+> Assumes existing samplesheet at `samplesheet.csv`
+
+```bash
+nextflow run oncoanalyser/main.nf \
+  \
+  -config refdata.local.config \
+  -profile docker \
+  \
+  --mode targeted \
+  --panel tso500 \
+  --genome GRCh38_hmf \
+  \
+  --input samplesheet.csv \
+  --outdir output/
+```
diff --git a/assets/samplesheet.csv b/assets/samplesheet.csv
diff --git a/docs/images/COLO829_wgts.orange_report.summary_section.png b/docs/images/COLO829_wgts.orange_report.summary_section.png
diff --git a/docs/images/mqc_fastqc_adapter.png b/docs/images/mqc_fastqc_adapter.png
diff --git a/docs/images/mqc_fastqc_counts.png b/docs/images/mqc_fastqc_counts.png
diff --git a/docs/images/mqc_fastqc_quality.png b/docs/images/mqc_fastqc_quality.png