diff --git a/README.md b/README.md index 3f6d560a..75d14b4c 100644 --- a/README.md +++ b/README.md @@ -127,7 +127,18 @@ For more information see the [useage docs](https://phac-nml.github.io/mikrokondo ### Output/Results -Explanation of how to interpret and/or export data. Include an example table of output if applicable with columns explained. +All output files will be written into the `outdir` (specified by the user). More explicit tool results can be found in both the [Workflow](workflows/CleanAssemble/) and [Subworkflow](subworkflows/) sections of the docs. Here is a brief description of the outdir structure: + +- **annotations** - dir containing all annotation tool output. +- **assembly** - dir containing all assembly tool related output, including quality, 7 gene MLST and taxon determination. +- **pipeline_info** - dir containing all pipeline related information including software versions used and execution reports. +- **ReadQuality** - dir containing all read tool related output, including contamination, fastq, mash, and subsampled read sets (when present) +- **subtyping** - dir containing all subtyping tool related output, including SISTR, ECtyper, etc. +- **SummaryReport** - dir containing collated results files for all tools, including: + - Individual sample flatted json reports + - **final_report** - All tool results for all samples in both .json (including a flattened version) and .tsv format +- **bco.json** - data providence file generated from the nf-prov plug-in +- **manifest.json** - data providence file generated from the nf-prov plug-in ## Run example data diff --git a/.pages b/docs/.pages similarity index 100% rename from .pages rename to docs/.pages diff --git a/docs/images/mikrokondo_mermaid.svg b/docs/images/mikrokondo_mermaid.svg new file mode 100644 index 00000000..1bce78d5 --- /dev/null +++ b/docs/images/mikrokondo_mermaid.svg @@ -0,0 +1 @@ +
Annotate genome
Subtype genome
Determine species
Bin contigs
QC assemblies
Polish assemblies
Hybrid assembly
Assemble reads
QCread
set assembly flag
to `metagenomic`
Legend
Subworkflow
Workflow
Module
Decision
Bakta annotate
Annotate genome
Abricate
Parse mash/kraken2
Subtype genome
SEROTYPING TOOL
ECtyper
Lissero
ShigeFinder
SISTR
SPAtyper
Kleborate
Locidex
Determine species
Default
Mash screen
Metagenomic
Kraken
Kraken
Bin contigs
Bin contigs
Quast
QC assemblies
CheckM
LineageWF
7 gene MLST
Filter Quast
Illumina
Polish assemblies
Pilon iterate
Pacbio/nanopore
Medaka polish
Default
Hybrid assembly
Flye assemble
Bandage image
Create contig index
(Minimap2 index)
Generate SAM
(Minimap2 map)
Racon polish
Pilon interate
Unicycler
Unicycler assemble
Bandage image
Bandage image
Assemble reads
Illumina
Spades assemble
Pacbio/nanopore
Flye assemble
Create contig index
(Minimap2 index)
Generate SAM
(Minimap2 map)
Racon polish
Remove contaminants
(Minimap2)
QC reads / clean reads
Fastp
parse Fastp
trimming
estimate coverage
(kat_hist)
Subsample
(seqTK_sample)
Contamination check
(Mash screen)
Separate reads
(parse Mash)
Check input
Reads
Assembly
CleanAssemble
Metagenomic
Long read
or hybrid
short read
Isolate
Post assembly
Isolate
Metagenomic
\ No newline at end of file diff --git a/docs/index.md b/docs/index.md index 30a49434..1b2741aa 100644 --- a/docs/index.md +++ b/docs/index.md @@ -15,4 +15,4 @@ This workflow will detect what pathogen(s) is present and apply the applicable m ## Workflow Schematics (Subject to change) -![Pipeline](images/20230921_Mikrokondo-worflow2.png "Workflow") +![Pipeline](images/mikrokondo_mermaid.svg "Workflow") diff --git a/docs/usage/useage.md b/docs/usage/useage.md index 5063519f..02c465be 100644 --- a/docs/usage/useage.md +++ b/docs/usage/useage.md @@ -1,6 +1,20 @@ # Running MikroKondo -### Samplesheet +## Useage + +MikroKondo can be run like most other nextflow pipelines. The most basic usage is as follows: +`nextflow run main.nf --input PATH_TO_SAMPLE_SHEET --outdir OUTPUT_DIR --platform SEQUENCING_PLATFORM -profile CONTAINER_TYPE` + +Many parameters can be altered or accessed from the command line. For a full list of parameters to be altered please refer to the `nextflow.config` file in the repo. + +## Input + +This pipeline requires the following as input: + +### Sample files (gzip) +This pipeline requires sample files to be gzipped (symlinks may be problematic). + +### Samplesheet (CSV) Mikrokondo requires a sample sheet to be run. This FOFN (file of file names) contains the samples names and allows a user to combine read-sets based on that name if provided. The sample-sheet can utilize the following header fields: - sample @@ -9,7 +23,6 @@ Mikrokondo requires a sample sheet to be run. This FOFN (file of file names) con - long_reads - assembly -**The sample sheet must be in csv format and sample files must be gzipped** Example layouts for different sample-sheets include: @@ -37,12 +50,8 @@ _Starting with assembly only_ |------|--------| |sample_name|path_to_assembly| -## Useage -MikroKondo can be run like most other nextflow pipelines. The most basic usage is as follows: -`nextflow run main.nf --input PATH_TO_SAMPLE_SHEET --outdir OUTPUT_DIR --platform SEQUENCING_PLATFORM -profile CONTAINER_TYPE` - -Many parameters can be altered or accessed from the command line. For a full list of parameters to be altered please refer to the `nextflow.config` file in the repo. +## Command line arguments > **Note:** All the below settings can be permanently changed in the `nextflow.config` file within the `params` section. For example, to permanently set a nanopore chemistry and use Kraken for speciation: ``` @@ -50,8 +59,6 @@ Many parameters can be altered or accessed from the command line. For a full lis --nanopore_chemistry "r1041_e82_400bps_hac_v4.2.0" // Note the quotes used here ``` -### Common command line arguments - #### Nf-core boiler plate options - `--publish_dir_mode`: Method used to save pipeline results to output directory @@ -118,3 +125,18 @@ Different container services can be specified from the command line when running - `slurm_p true`: slurm execurtor will be used. - `slurm_profile STRING`: a string to allow the user to specify which slurm partition to use. + +## Output + +All output files will be written into the `outdir` (specified by the user). More explicit tool results can be found in both the [Workflow](/workflows/CleanAssemble/) and [Subworkflow](/subworkflows/assemble_reads/) sections of the docs. Here is a brief description of the outdir structure: + +- **annotations** - dir containing all annotation tool output. +- **assembly** - dir containing all assembly tool related output, including quality, 7 gene MLST and taxon determination. +- **pipeline_info** - dir containing all pipeline related information including software versions used and execution reports. +- **ReadQuality** - dir containing all read tool related output, including contamination, fastq, mash, and subsampled read sets (when present) +- **subtyping** - dir containing all subtyping tool related output, including SISTR, ECtyper, etc. +- **SummaryReport** - dir containing collated results files for all tools, including: + - Individual sample flatted json reports + - **final_report** - All tool results for all samples in both .json (including a flattened version) and .tsv format +- **bco.json** - data providence file generated from the nf-prov plug-in +- **manifest.json** - data providence file generated from the nf-prov plug-in \ No newline at end of file diff --git a/docs/workflows/CleanAssemble.md b/docs/workflows/CleanAssemble.md index defd6e6c..a4877694 100644 --- a/docs/workflows/CleanAssemble.md +++ b/docs/workflows/CleanAssemble.md @@ -11,7 +11,7 @@ ## Steps -1. **[QC reads](subworkflows/clean_reads)** subworkflow steps in brief are listed below, for further information see [clean_reads.nf](subworkflows/local/clean_reads.nf) +1. **[QC reads](/subworkflows/clean_reads)** subworkflow steps in brief are listed below, for further information see [clean_reads.nf](https://github.com/phac-nml/mikrokondo/blob/main/subworkflows/local/clean_reads.nf) - Reads are checked for known sequencing contamination - Quality metrics are calculated - Reads are trimmed @@ -19,9 +19,9 @@ - Read set subsampled to set level (OPTIONAL) - Read set is assessed to be either an isolate or metagenomic sample (from presence of multiple taxa) -2. **[Assemble reads](/subworkflows/assemble_reads)** using the `params.platform` flag, read sets will be diverted to either the assemble_reads (short reads) or hybrid_assembly (short and/or long reads) workflow. Though the data is handled differently in eash subworklow, both generate a contigs file and a bandage image, with an option of initial polishing via Racon. See [assemble_reads.nf](subworkflows/local/assemble_reads.nf) and [hybrid_assembly.nf](subworkflows/local/hybrid_assembly.nf) subworkflow pages for more details. +2. **[Assemble reads](/subworkflows/assemble_reads)** using the `params.platform` flag, read sets will be diverted to either the assemble_reads (short reads) or hybrid_assembly (short and/or long reads) workflow. Though the data is handled differently in eash subworklow, both generate a contigs file and a bandage image, with an option of initial polishing via Racon. See [assemble_reads.nf](https://github.com/phac-nml/mikrokondo/blob/main/subworkflows/local/assemble_reads.nf) and [hybrid_assembly.nf](https://github.com/phac-nml/mikrokondo/blob/main/subworkflows/local/hybrid_assembly.nf) subworkflow pages for more details. -3. **[Polish assembles](/subworkflows/polish_assemblies)** (OPTIONAL) Polishing of contigs can be added [polish_assemblies.nf](subworkflows/local/polish_assemblies.nf). To make changes to the default workflow, see setting 'optional flags' page. +3. **[Polish assembles](/subworkflows/polish_assemblies)** (OPTIONAL) Polishing of contigs can be added [polish_assemblies.nf](https://github.com/phac-nml/mikrokondo/blob/main/subworkflows/local/polish_assemblies.nf). To make changes to the default workflow, see setting 'optional flags' page. ## Input - Next generation sequencing reads: diff --git a/docs/workflows/PostAssembly.md b/docs/workflows/PostAssembly.md index fd884c66..d518fdff 100644 --- a/docs/workflows/PostAssembly.md +++ b/docs/workflows/PostAssembly.md @@ -13,11 +13,11 @@ This workflow is triggered in two ways: 1. when assemblies are used for initial - `subtype_genome.nf` ## Steps -1. **Determine type** using the `metagenomic_samples` flag, this workflow will direct assemblies to the following two paths: - a. Isolate: proceeds to step 2. - b. Metagenomic: runs the following two modules before proceeding to step 2. - i. [kraken2.nf](modules/local/kraken.nf) runs kraken 2 on contigs - ii. [bin_kraken2.nf](modules/local/bin_kraken2.nf) bins contigs to respective genus level taxa +1. **Determine type** using the `metagenomic_samples` flag, this workflow will direct assemblies to the following two paths: + 1. Isolate: proceeds to step 2. + 2. Metagenomic: runs the following two modules before proceeding to step 2. + 1. [kraken.nf](https://github.com/phac-nml/mikrokondo/blob/main/modules/local/kraken.nf) runs kraken2 on contigs + 2. [bin_kraken2.nf](https://github.com/phac-nml/mikrokondo/blob/main/modules/local/bin_kraken2.nf) bins contigs to respective genus level taxa 2. **[QC assemblies](/subworkflows/qc_assembly)** (OPTIONAL) runs quast and assigns quality metrics to generated assemblies 3. **[Determine species](/subworkflows/determine_species)** (OPTIONAL) runs classifier tool (default: [Mash](https://github.com/marbl/Mash)) to determine sample or binned species 4. **[Subtype genome](/subworkflows/subtype_genome)** (OPTIONAL) species specific subtyping tools are launched using a generated MASH screen report. diff --git a/utils/mikrokondo_mermaid.js b/utils/mikrokondo_mermaid.js new file mode 100644 index 00000000..e226d211 --- /dev/null +++ b/utils/mikrokondo_mermaid.js @@ -0,0 +1,66 @@ +flowchart LR + CI(Check input):::lightGreen --> R((Reads)); + CI --> A((Assembly)); + R --> CA(CleanAssemble):::pink; + CA --> QC(QC reads / clean reads):::lightGreen; + QC --> QC1("Remove contaminants
(Minimap2)"):::orange; + QC1 --> QC2(Fastp):::orange; + QC2 --> QC3(parse Fastp
trimming):::orange; + QC3 --> QC4("estimate coverage
(kat_hist)"):::orange; + QC4 --> QC5("Subsample
(seqTK_sample)"):::orange; + QC5 --> QC6("Contamination check
(Mash screen)"):::orange; + QC6 --> QC7("Separate reads
(parse Mash)"):::orange; + QC --> SR((short read)); + SR --> I((Isolate)); + SR --> M((Metagenomic)); + I --> AR(Assemble reads):::lightGreen; + AR --> AR1(Bandage image):::orange; + AR1 --> AR2((Illumina)); + AR2 --> AR22(Spades assemble):::orange; + AR1 --> AR3((Pacbio/nanopore)); + AR3 --> AR32(Flye assemble):::orange; + AR22 --> HAD3; + AR32 --> HAD3; + M -->|"set assembly flag
to `metagenomic`"| AR; + QC --> H((Long read
or hybrid)); + H --> HA(Hybrid assembly):::lightGreen; + HA --> HAD((Default)); + HAD --> HAD1(Flye assemble):::orange; + HAD1 --> HAD2(Bandage image):::orange; + HAD2 --> HAD3("Create contig index
(Minimap2 index)"):::orange; + HAD3 --> HAD4("Generate SAM
(Minimap2 map)"):::orange; + HAD4 --> HAD5(Racon polish):::orange; + HAD5 --> HAD6(Pilon interate):::orange; + HA --> HAU((Unicycler)); + HAU --> HAU1(Unicycler assemble):::orange; + HAU1 --> HAU2(Bandage image):::orange; + QC7 --> PA(Polish assemblies):::lightGreen; + HAD6 --> PA; + HAU2 --> PA; + PA --> PAI((Illumina)); + PAI --> PAI1(Pilon iterate):::orange; + PA --> PAN((Pacbio/nanopore)); + PAN --> PAN1(Medaka polish):::orange; + PAI1 --> PASS(Post assembly):::pink; + PAN1 --> PASS; + A --> PASS; + PASS --> I1((Isolate)); + PASS --> M1((Metagenomic)); + I1 --> QCA(QC assemblies):::lightGreen; + + + subgraph legend [Legend] + direction LR; + wk(Workflow):::pink --> sw(Subworkflow):::lightGreen; + sw --> m(Module):::orange; + d((Decision)); + end + + + + + + + classDef lightGreen fill:#0ABC9B,stroke:#0ABC9B,stroke-width:2px,rx:10px,ry:10px; + classDef pink fill:#F681CB,stroke:#F681CB,stroke-width:2px,rx:10px,ry:10px; + classDef orange fill:#F2B581,stroke:#F2B581,stroke-width:2px,rx:10px,ry:10px;