continued docs updates: fixed nav, included mermaid workflow diagram …

…and coding, wrote output section
phac-nml · Mar 8, 2024 · 9319a7b · 9319a7b
1 parent 8002e92
commit 9319a7b
Show file tree

Hide file tree

Showing 8 changed files with 119 additions and 19 deletions.
diff --git a/README.md b/README.md
@@ -127,7 +127,18 @@ For more information see the [useage docs](https://phac-nml.github.io/mikrokondo
 
 ### Output/Results
 
-Explanation of how to interpret and/or export data. Include an example table of output if applicable with columns explained.
+All output files will be written into the `outdir` (specified by the user). More explicit tool results can be found in both the [Workflow](workflows/CleanAssemble/) and [Subworkflow](subworkflows/) sections of the docs. Here is a brief description of the outdir structure:
+
+- **annotations** - dir containing all annotation tool output.
+- **assembly** - dir containing all assembly tool related output, including quality, 7 gene MLST and taxon determination.
+- **pipeline_info** - dir containing all pipeline related information including software versions used and execution reports.
+- **ReadQuality** - dir containing all read tool related output, including contamination, fastq, mash, and subsampled read sets (when present)
+- **subtyping** - dir containing all subtyping tool related output, including SISTR, ECtyper, etc.
+- **SummaryReport** - dir containing collated results files for all tools, including: 
+   - Individual sample flatted json reports
+   - **final_report** - All tool results for all samples in both .json (including a flattened version) and .tsv format
+- **bco.json** - data providence file generated from the nf-prov plug-in
+- **manifest.json** - data providence file generated from the nf-prov plug-in
 
 ## Run example data
 

diff --git a/.pages → docs/.pages b/.pages → docs/.pages
diff --git a/docs/images/mikrokondo_mermaid.svg b/docs/images/mikrokondo_mermaid.svg
diff --git a/docs/index.md b/docs/index.md
@@ -15,4 +15,4 @@ This workflow will detect what pathogen(s) is present and apply the applicable m
 
 ## Workflow Schematics (Subject to change)
 
-![Pipeline](images/20230921_Mikrokondo-worflow2.png "Workflow")
+![Pipeline](images/mikrokondo_mermaid.svg "Workflow")
diff --git a/docs/usage/useage.md b/docs/usage/useage.md
@@ -1,6 +1,20 @@
 # Running MikroKondo
 
-### Samplesheet
+## Useage
+
+MikroKondo can be run like most other nextflow pipelines. The most basic usage is as follows:
+`nextflow run main.nf --input PATH_TO_SAMPLE_SHEET --outdir OUTPUT_DIR --platform SEQUENCING_PLATFORM  -profile CONTAINER_TYPE`
+
+Many parameters can be altered or accessed from the command line. For a full list of parameters to be altered please refer to the `nextflow.config` file in the repo. 
+
+## Input
+
+This pipeline requires the following as input:
+
+### Sample files (gzip)
+This pipeline requires sample files to be gzipped (symlinks may be problematic).
+
+### Samplesheet (CSV)
 Mikrokondo requires a sample sheet to be run. This FOFN (file of file names) contains the samples names and allows a user to combine read-sets based on that name if provided. The sample-sheet can utilize the following header fields: 
 
 - sample   
@@ -9,7 +23,6 @@ Mikrokondo requires a sample sheet to be run. This FOFN (file of file names) con
 - long_reads   
 - assembly   
 
-**The sample sheet must be in csv format and sample files must be gzipped**
 
 Example layouts for different sample-sheets include:
 
@@ -37,21 +50,15 @@ _Starting with assembly only_
 |------|--------|
 |sample_name|path_to_assembly|
 
-## Useage
 
-MikroKondo can be run like most other nextflow pipelines. The most basic usage is as follows:
-`nextflow run main.nf --input PATH_TO_SAMPLE_SHEET --outdir OUTPUT_DIR --platform SEQUENCING_PLATFORM  -profile CONTAINER_TYPE`
-
-Many parameters can be altered or accessed from the command line. For a full list of parameters to be altered please refer to the `nextflow.config` file in the repo. 
+## Command line arguments
 
 > **Note:** All the below settings can be permanently changed in the `nextflow.config` file within the `params` section. For example, to permanently set a nanopore chemistry and use Kraken for speciation:
 ```
 --run_kraken = true // Note the lack of quotes
 --nanopore_chemistry "r1041_e82_400bps_hac_v4.2.0" // Note the quotes used here
 ```
 
-### Common command line arguments
-
 #### Nf-core boiler plate options
 
 - `--publish_dir_mode`: Method used to save pipeline results to output directory
@@ -118,3 +125,18 @@ Different container services can be specified from the command line when running
 
 - `slurm_p true`: slurm execurtor will be used.
 - `slurm_profile STRING`: a string to allow the user to specify which slurm partition to use.
+
+## Output
+
+All output files will be written into the `outdir` (specified by the user). More explicit tool results can be found in both the [Workflow](/workflows/CleanAssemble/) and [Subworkflow](/subworkflows/assemble_reads/) sections of the docs. Here is a brief description of the outdir structure:
+
+- **annotations** - dir containing all annotation tool output.
+- **assembly** - dir containing all assembly tool related output, including quality, 7 gene MLST and taxon determination.
+- **pipeline_info** - dir containing all pipeline related information including software versions used and execution reports.
+- **ReadQuality** - dir containing all read tool related output, including contamination, fastq, mash, and subsampled read sets (when present)
+- **subtyping** - dir containing all subtyping tool related output, including SISTR, ECtyper, etc.
+- **SummaryReport** - dir containing collated results files for all tools, including: 
+   - Individual sample flatted json reports
+   - **final_report** - All tool results for all samples in both .json (including a flattened version) and .tsv format
+- **bco.json** - data providence file generated from the nf-prov plug-in
+- **manifest.json** - data providence file generated from the nf-prov plug-in
diff --git a/docs/workflows/CleanAssemble.md b/docs/workflows/CleanAssemble.md
@@ -11,17 +11,17 @@
 
 
 ## Steps
-1. **[QC reads](subworkflows/clean_reads)** subworkflow steps in brief are listed below, for further information see [clean_reads.nf](subworkflows/local/clean_reads.nf)
+1. **[QC reads](/subworkflows/clean_reads)** subworkflow steps in brief are listed below, for further information see [clean_reads.nf](https://github.com/phac-nml/mikrokondo/blob/main/subworkflows/local/clean_reads.nf)
 	- Reads are checked for known sequencing contamination
 	- Quality metrics are calculated
 	- Reads are trimmed
 	- Coverage is estimated
 	- Read set subsampled to set level (OPTIONAL)
 	- Read set is assessed to be either an isolate or metagenomic sample (from presence of multiple taxa)
 
-2. **[Assemble reads](/subworkflows/assemble_reads)** using the `params.platform` flag, read sets will be diverted to either the assemble_reads (short reads) or hybrid_assembly (short and/or long reads) workflow. Though the data is handled differently in eash subworklow, both generate a contigs file and a bandage image, with an option of initial polishing via Racon. See [assemble_reads.nf](subworkflows/local/assemble_reads.nf) and [hybrid_assembly.nf](subworkflows/local/hybrid_assembly.nf) subworkflow pages for more details.
+2. **[Assemble reads](/subworkflows/assemble_reads)** using the `params.platform` flag, read sets will be diverted to either the assemble_reads (short reads) or hybrid_assembly (short and/or long reads) workflow. Though the data is handled differently in eash subworklow, both generate a contigs file and a bandage image, with an option of initial polishing via Racon. See [assemble_reads.nf](https://github.com/phac-nml/mikrokondo/blob/main/subworkflows/local/assemble_reads.nf) and [hybrid_assembly.nf](https://github.com/phac-nml/mikrokondo/blob/main/subworkflows/local/hybrid_assembly.nf) subworkflow pages for more details.
 
-3. **[Polish assembles](/subworkflows/polish_assemblies)** (OPTIONAL) Polishing of contigs can be added [polish_assemblies.nf](subworkflows/local/polish_assemblies.nf). To make changes to the default workflow, see setting 'optional flags' page.
+3. **[Polish assembles](/subworkflows/polish_assemblies)** (OPTIONAL) Polishing of contigs can be added [polish_assemblies.nf](https://github.com/phac-nml/mikrokondo/blob/main/subworkflows/local/polish_assemblies.nf). To make changes to the default workflow, see setting 'optional flags' page.
 
 ## Input
 - Next generation sequencing reads:

diff --git a/docs/workflows/PostAssembly.md b/docs/workflows/PostAssembly.md
@@ -13,11 +13,11 @@ This workflow is triggered in two ways: 1. when assemblies are used for initial
 - `subtype_genome.nf`
 
 ## Steps
-1. **Determine type** using the `metagenomic_samples` flag, this workflow will direct assemblies to the following two paths:
-	a. Isolate: proceeds to step 2.
-	b. Metagenomic: runs the following two modules before proceeding to step 2.
-		i.	[kraken2.nf](modules/local/kraken.nf) runs kraken 2 on contigs
-		ii.	[bin_kraken2.nf](modules/local/bin_kraken2.nf) bins contigs to respective genus level taxa
+1. **Determine type** using the `metagenomic_samples` flag, this workflow will direct assemblies to the following two paths:    
+	1. Isolate: proceeds to step 2.   
+	2. Metagenomic: runs the following two modules before proceeding to step 2.    
+        1.	[kraken.nf](https://github.com/phac-nml/mikrokondo/blob/main/modules/local/kraken.nf) runs kraken2 on contigs    
+        2.	[bin_kraken2.nf](https://github.com/phac-nml/mikrokondo/blob/main/modules/local/bin_kraken2.nf) bins contigs to respective genus level taxa    
 2. **[QC assemblies](/subworkflows/qc_assembly)** (OPTIONAL) runs quast and assigns quality metrics to generated assemblies
 3. **[Determine species](/subworkflows/determine_species)** (OPTIONAL) runs classifier tool (default: [Mash](https://github.com/marbl/Mash)) to determine sample or binned species
 4. **[Subtype genome](/subworkflows/subtype_genome)** (OPTIONAL) species specific subtyping tools are launched using a generated MASH screen report.

diff --git a/utils/mikrokondo_mermaid.js b/utils/mikrokondo_mermaid.js
@@ -0,0 +1,66 @@
+flowchart LR
+    CI(Check input):::lightGreen --> R((Reads));
+    CI --> A((Assembly));
+    R --> CA(CleanAssemble):::pink;
+    CA --> QC(QC reads / clean reads):::lightGreen;
+    QC --> QC1("Remove contaminants<br>(Minimap2)"):::orange;
+    QC1 --> QC2(Fastp):::orange;
+    QC2 --> QC3(parse Fastp<br>trimming):::orange;
+    QC3 --> QC4("estimate coverage<br>(kat_hist)"):::orange;
+    QC4 --> QC5("Subsample<br>(seqTK_sample)"):::orange;
+    QC5 --> QC6("Contamination check<br>(Mash screen)"):::orange;
+    QC6 --> QC7("Separate reads<br>(parse Mash)"):::orange;
+    QC --> SR((short read));
+    SR --> I((Isolate));
+    SR --> M((Metagenomic));
+    I --> AR(Assemble reads):::lightGreen;
+    AR --> AR1(Bandage image):::orange;
+    AR1 --> AR2((Illumina));
+    AR2 --> AR22(Spades assemble):::orange;
+    AR1 --> AR3((Pacbio/nanopore));
+    AR3 --> AR32(Flye assemble):::orange;
+    AR22 --> HAD3;
+    AR32 --> HAD3;
+    M -->|"set assembly flag<br>to `metagenomic`"| AR;
+    QC --> H((Long read<br>or hybrid));
+    H --> HA(Hybrid assembly):::lightGreen;
+    HA --> HAD((Default));
+    HAD --> HAD1(Flye assemble):::orange;
+    HAD1 --> HAD2(Bandage image):::orange;
+    HAD2 --> HAD3("Create contig index<br>(Minimap2 index)"):::orange;
+    HAD3 --> HAD4("Generate SAM<br>(Minimap2 map)"):::orange;
+    HAD4 --> HAD5(Racon polish):::orange;
+    HAD5 --> HAD6(Pilon interate):::orange;
+    HA --> HAU((Unicycler));
+    HAU --> HAU1(Unicycler assemble):::orange;
+    HAU1 --> HAU2(Bandage image):::orange;
+    QC7 --> PA(Polish assemblies):::lightGreen;
+    HAD6 --> PA; 
+    HAU2 --> PA;
+    PA --> PAI((Illumina));
+    PAI --> PAI1(Pilon iterate):::orange;
+    PA --> PAN((Pacbio/nanopore));
+    PAN --> PAN1(Medaka polish):::orange;
+    PAI1 --> PASS(Post assembly):::pink;
+    PAN1 --> PASS;
+    A --> PASS;
+    PASS --> I1((Isolate));
+    PASS --> M1((Metagenomic));
+    I1 --> QCA(QC assemblies):::lightGreen;
+
+
+    subgraph legend [Legend]
+    direction LR;
+    wk(Workflow):::pink --> sw(Subworkflow):::lightGreen;
+    sw --> m(Module):::orange;
+    d((Decision));
+    end
+
+
+
+
+
+
+    classDef lightGreen fill:#0ABC9B,stroke:#0ABC9B,stroke-width:2px,rx:10px,ry:10px;
+    classDef pink fill:#F681CB,stroke:#F681CB,stroke-width:2px,rx:10px,ry:10px;
+    classDef orange fill:#F2B581,stroke:#F2B581,stroke-width:2px,rx:10px,ry:10px;
Original file line number	Diff line number	Diff line change
Expand Up		@@ -15,4 +15,4 @@ This workflow will detect what pathogen(s) is present and apply the applicable m

		## Workflow Schematics (Subject to change)

		![Pipeline](images/20230921_Mikrokondo-worflow2.png "Workflow")
		![Pipeline](images/mikrokondo_mermaid.svg "Workflow")