Skip to content

Commit

Permalink
First full pass of nf-core documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
scwatts committed Mar 14, 2024
1 parent 0881ae9 commit 21243fc
Show file tree
Hide file tree
Showing 10 changed files with 839 additions and 213 deletions.
83 changes: 49 additions & 34 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,31 +16,29 @@

## Introduction

**nf-core/oncoanalyser** is a Nextflow implementation of the comprehensive cancer DNA and RNA analysis and reporting
workflow from the Hartwig Medical Foundation. For detailed information on each component of the Hartwig Medical
Foundation workflow, please refer to [hartwigmedical/hmftools](https://github.com/hartwigmedical/hmftools/).

The oncoanalyser pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool to run tasks across
multiple compute infrastructures in a very portable manner. It uses Docker/Singularity containers making installation
trivial and results highly reproducible. The [Nextflow DSL2](https://www.nextflow.io/docs/latest/dsl2.html)
implementation of this pipeline uses one container per process which makes it much easier to maintain and update
software dependencies. Where possible, these processes have been submitted to and installed from
[nf-core/modules](https://github.com/nf-core/modules) in order to make them available to all nf-core pipelines, and to
everyone within the Nextflow community!

On release, automated continuous integration tests run the pipeline on a full-sized dataset on the AWS cloud
infrastructure. This ensures that the pipeline runs on AWS, has sensible resource allocation defaults set to run on
real-world datasets, and permits the persistent storage of results to benchmark between pipeline releases and other
analysis sources. The results obtained from the full-sized test can be viewed on the [nf-core
website](https://nf-co.re/oncoanalyser/results).
**nf-core/oncoanalyser** is a Nextflow implementation of the comprehensive cancer DNA/RNA analysis and reporting
workflow from the Hartwig Medical Foundation. Both the Hartwig WGS/WTS workflow and targeted sequencing workflow are
available in oncoanalyser. The targeted sequencing workflow has built-in support for the TSO500 panel and can also run
custom panels with externally-generated normalisation data.

The key analysis results for each sample are summarised and presented in an ORANGE report (summary page excerpt shown
below from *[COLO829_wgts.orange_report.pdf](https://pub-29f2e5b2b7384811bdbbcba44f8b5083.r2.dev/oncoanalyser/other/example_report/COLO829_wgts.orange_report.pdf)*):

<p align='center'><img width='750' src='docs/images/COLO829_wgts.orange_report.summary_section.png'></p>

For detailed information on each component of the Hartwig workflow, please refer to
[hartwigmedical/hmftools](https://github.com/hartwigmedical/hmftools/).

## Pipeline summary

The following processes and tools can be run with oncoanalyser:

* SNV and MNV calling (`SAGE`, `PAVE`)
* SV calling (`SV Prep`, `GRIDSS`, `GRIPSS`, `PURPLE`, `LINX`)
* Simple DNA/RNA alignment (`bwa-mem2`, `STAR`)
* Post-alignment processing (`MarkDups`)
* SNV, MNV, INDEL calling (`SAGE`, `PAVE`)
* CNV calling (`AMBER`, `COBALT`, `PURPLE`)
* SV calling (`SvPrep`, `GRIDSS`, `GRIPSS`)
* SV event interpretation (`LINX`)
* Transcript analysis (`Isofox`)
* Oncoviral detection (`VIRUSBreakend`, `Virus Interpreter`)
* HLA calling (`LILAC`)
Expand All @@ -51,25 +49,25 @@ The following processes and tools can be run with oncoanalyser:

## Quick Start

Create a samplesheet containing your inputs:
Create a samplesheet with your inputs (WGS/WTS FASTQs in this example):

```text
group_id,subject_id,sample_id,sample_type,sequence_type,filetype,filepath
P1__wgts,P1,SA,tumor,dna,bam,/path/to/SA.tumor.dna.wgs.bam
P1__wgts,P1,SB,tumor,rna,bam,/path/to/SB.tumor.rna.wts.bam
P1__wgts,P1,SC,normal,dna,bam,/path/to/SC.normal.dna.wgs.bam
```csv
group_id,subject_id,sample_id,sample_type,sequence_type,filetype,info,filepath
P1__wgts,P1,SA,normal,dna,fastq,library_id:SA_library;lane:001,/path/to/SA.normal.dna.wgs.001.R1.fastq.gz;/path/to/SA.normal.dna.wgs.001.R2.fastq.gz
P1__wgts,P1,SB,tumor,dna,fastq,library_id:SB_library;lane:001,/path/to/SB.tumor.dna.wgs.001.R1.fastq.gz;/path/to/SB.tumor.dna.wgs.001.R2.fastq.gz
P1__wgts,P1,SC,tumor,rna,fastq,library_id:SC_library;lane:001,/path/to/SC.tumor.rna.wts.001.R1.fastq.gz;/path/to/SC.tumor.rna.wts.001.R2.fastq.gz
```

Launch oncoanalyser:

```bash
nextflow run nf-core/oncoanalyser \
-revision v0.3.1 \
-profile docker \
--mode wgts \
--genome GRCh38_hmf \
--input samplesheet.csv \
--outdir output/
-revision v0.3.1 \
-profile docker \
--mode wgts \
--genome GRCh38_hmf \
--input samplesheet.csv \
--outdir output/
```

## Documentation
Expand All @@ -78,16 +76,33 @@ The nf-core/oncoanalyser pipeline comes with documentation about the pipeline
[usage](https://nf-co.re/oncoanalyser/usage), [parameters](https://nf-co.re/oncoanalyser/parameters) and
[output](https://nf-co.re/oncoanalyser/output).

## Version support
## Version information

As oncoanalyser is used in clinical settings and is subject to accreditation standards in some instances, there is a
need for long-term stability and reliability for feature releases in order to meet operational requirements. This is
### Extended support

As oncoanalyser is used in clinical settings and subject to accreditation standards in some instances, there is a need
for long-term stability and reliability for feature releases in order to meet operational requirements. This is
accomplished through long-term support of several nominated feature releases, which all receive bug fixes and security
fixes during the period of extended support.

Each release that is given extended support is allocated a separate long-lived git branch with the 'stable' prefix, e.g.
`stable/1.2.x`, `stable/1.5.x`. Feature development otherwise occurs on the `main` branch.

Versions nominated to have current long-term support:

* TBD

### Release parity

Versioning between oncoanalyser and hmftools naturally differ, however it is often necessary to relate the functional
equivalence of these two pieces of software. The functional/feature parity with regards to version releases are detailed
in the below table.

| oncoanalyser | hmftools |
| --- | --- |
| 0.1.0 through 0.2.7 | 5.33 |
| 0.3.0 through 0.3.1 | 5.34 |

## Credits

The oncoanalyser pipeline was written by Stephen Watts while in the [Genomics Platform
Expand Down
93 changes: 93 additions & 0 deletions REFERENCE_DATA_STAGING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
# Reference data staging

Download and unpack

> All reference data is retrieved here, exclude unused files as desired; using GRCh38_hmf below
```bash
base_url=https://pub-29f2e5b2b7384811bdbbcba44f8b5083.r2.dev/genomes

fps='
genomes/GRCh37_hmf/Homo_sapiens.GRCh37.GATK.illumina.fasta
genomes/GRCh37_hmf/bwa_index/0.7.17-r1188.tar.gz
genomes/GRCh37_hmf/bwa_index/2.2.1/Homo_sapiens.GRCh37.GATK.illumina.fasta.0123
genomes/GRCh37_hmf/bwa_index/2.2.1/Homo_sapiens.GRCh37.GATK.illumina.fasta.bwt.2bit.64
genomes/GRCh37_hmf/bwa_index_image/0.7.17-r1188/Homo_sapiens.GRCh37.GATK.illumina.fasta.img
genomes/GRCh37_hmf/gridss_index/2.13.2/Homo_sapiens.GRCh37.GATK.illumina.fasta.gridsscache
genomes/GRCh37_hmf/samtools_index/1.16/Homo_sapiens.GRCh37.GATK.illumina.fasta.dict
genomes/GRCh37_hmf/samtools_index/1.16/Homo_sapiens.GRCh37.GATK.illumina.fasta.fai
genomes/GRCh37_hmf/star_index/gencode_19/2.7.3a.tar.gz
genomes/GRCh38_hmf/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna
genomes/GRCh38_hmf/bwa_index/0.7.17-r1188.tar.gz
genomes/GRCh38_hmf/bwa_index/2.2.1/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.0123
genomes/GRCh38_hmf/bwa_index/2.2.1/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.bwt.2bit.64
genomes/GRCh38_hmf/bwa_index_image/0.7.17-r1188/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.img
genomes/GRCh38_hmf/gridss_index/2.13.2/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gridsscache
genomes/GRCh38_hmf/samtools_index/1.16/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.dict
genomes/GRCh38_hmf/samtools_index/1.16/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.fai
genomes/GRCh38_hmf/star_index/gencode_38/2.7.3a.tar.gz
hmf_reference_data/hmftools/5.34_37--2.tar.gz
hmf_reference_data/hmftools/5.34_38--2.tar.gz
hmf_reference_data/panels/tso500_5.34_37--1.tar.gz
hmf_reference_data/panels/tso500_5.34_38--1.tar.gz
virusbreakend/virusbreakenddb_20210401.tar.gz
'

parallel -j4 wget -c -x -nH -P reference_data/ ${base_url}/{} ::: ${fps}
find reference_data/ -name '*.tar.gz' | parallel -j0 'cd {//} && tar -xzvf {/}'
```

Create Nextflow config file for local reference data

```bash
cat <<EOF > refdata.local.config
params {
genomes {
'GRCh37_hmf' {
fasta = "$(pwd)/genomes/GRCh37_hmf/Homo_sapiens.GRCh37.GATK.illumina.fasta"
fai = "$(pwd)/genomes/GRCh37_hmf/samtools_index/1.16/Homo_sapiens.GRCh37.GATK.illumina.fasta.fai"
dict = "$(pwd)/genomes/GRCh37_hmf/samtools_index/1.16/Homo_sapiens.GRCh37.GATK.illumina.fasta.dict"
bwa_index = "$(pwd)/genomes/GRCh37_hmf/bwa_index/0.7.17-r1188.tar.gz"
bwa_index_bseq = "$(pwd)/genomes/GRCh37_hmf/bwa_index/2.2.1/Homo_sapiens.GRCh37.GATK.illumina.fasta.0123"
bwa_index_biidx = "$(pwd)/genomes/GRCh37_hmf/bwa_index/2.2.1/Homo_sapiens.GRCh37.GATK.illumina.fasta.bwt.2bit.64"
bwa_index_image = "$(pwd)/genomes/GRCh37_hmf/bwa_index_image/0.7.17-r1188/Homo_sapiens.GRCh37.GATK.illumina.fasta.img"
gridss_index = "$(pwd)/genomes/GRCh37_hmf/gridss_index/2.13.2/Homo_sapiens.GRCh37.GATK.illumina.fasta.gridsscache"
star_index = "$(pwd)/genomes/GRCh37_hmf/star_index/gencode_19/2.7.3a.tar.gz"
}
'GRCh38_hmf' {
fasta = "$(pwd)/reference_data/genomes/GRCh38_hmf/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna"
fai = "$(pwd)/reference_data/genomes/GRCh38_hmf/samtools_index/1.16/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.fai"
dict = "$(pwd)/reference_data/genomes/GRCh38_hmf/samtools_index/1.16/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.dict"
bwa_index = "$(pwd)/reference_data/genomes/GRCh38_hmf/bwa_index/0.7.17-r1188/"
bwa_index_bseq = "$(pwd)/reference_data/genomes/GRCh38_hmf/bwa_index/2.2.1/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.0123"
bwa_index_biidx = "$(pwd)/reference_data/genomes/GRCh38_hmf/bwa_index/2.2.1/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.bwt.2bit.64"
bwa_index_image = "$(pwd)/reference_data/genomes/GRCh38_hmf/bwa_index_image/0.7.17-r1188/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.img"
gridss_index = "$(pwd)/reference_data/genomes/GRCh38_hmf/gridss_index/2.13.2/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gridsscache"
star_index = "$(pwd)/reference_data/genomes/GRCh38_hmf/star_index/gencode_38/2.7.3a/"
}
}
ref_data_hmf_data_path = "$(pwd)/reference_data/hmf_reference_data/hmftools/5.34_38--2/"
ref_data_panel_data_path = "$(pwd)/reference_data/hmf_reference_data/panels/tso500_5.34_38--1/"
ref_data_virusbreakenddb_path = "$(pwd)/reference_data/virusbreakend/virusbreakenddb_20210401/"
}
EOF
```

Run oncoanalyser with local reference data

> Assumes existing samplesheet at `samplesheet.csv`
```bash
nextflow run oncoanalyser/main.nf \
\
-config refdata.local.config \
-profile docker \
\
--mode targeted \
--panel tso500 \
--genome GRCh38_hmf \
\
--input samplesheet.csv \
--outdir output/
```
12 changes: 0 additions & 12 deletions assets/samplesheet.csv

This file was deleted.

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed docs/images/mqc_fastqc_adapter.png
Binary file not shown.
Binary file removed docs/images/mqc_fastqc_counts.png
Binary file not shown.
Binary file removed docs/images/mqc_fastqc_quality.png
Binary file not shown.
Loading

0 comments on commit 21243fc

Please sign in to comment.