diff --git a/README.md b/README.md
index 3f6d560a..75d14b4c 100644
--- a/README.md
+++ b/README.md
@@ -127,7 +127,18 @@ For more information see the [useage docs](https://phac-nml.github.io/mikrokondo
### Output/Results
-Explanation of how to interpret and/or export data. Include an example table of output if applicable with columns explained.
+All output files will be written into the `outdir` (specified by the user). More explicit tool results can be found in both the [Workflow](workflows/CleanAssemble/) and [Subworkflow](subworkflows/) sections of the docs. Here is a brief description of the outdir structure:
+
+- **annotations** - dir containing all annotation tool output.
+- **assembly** - dir containing all assembly tool related output, including quality, 7 gene MLST and taxon determination.
+- **pipeline_info** - dir containing all pipeline related information including software versions used and execution reports.
+- **ReadQuality** - dir containing all read tool related output, including contamination, fastq, mash, and subsampled read sets (when present)
+- **subtyping** - dir containing all subtyping tool related output, including SISTR, ECtyper, etc.
+- **SummaryReport** - dir containing collated results files for all tools, including:
+ - Individual sample flatted json reports
+ - **final_report** - All tool results for all samples in both .json (including a flattened version) and .tsv format
+- **bco.json** - data providence file generated from the nf-prov plug-in
+- **manifest.json** - data providence file generated from the nf-prov plug-in
## Run example data
diff --git a/.pages b/docs/.pages
similarity index 100%
rename from .pages
rename to docs/.pages
diff --git a/docs/images/mikrokondo_mermaid.svg b/docs/images/mikrokondo_mermaid.svg
new file mode 100644
index 00000000..1bce78d5
--- /dev/null
+++ b/docs/images/mikrokondo_mermaid.svg
@@ -0,0 +1 @@
+
\ No newline at end of file
diff --git a/docs/index.md b/docs/index.md
index 30a49434..1b2741aa 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -15,4 +15,4 @@ This workflow will detect what pathogen(s) is present and apply the applicable m
## Workflow Schematics (Subject to change)
-
+
diff --git a/docs/usage/useage.md b/docs/usage/useage.md
index 5063519f..02c465be 100644
--- a/docs/usage/useage.md
+++ b/docs/usage/useage.md
@@ -1,6 +1,20 @@
# Running MikroKondo
-### Samplesheet
+## Useage
+
+MikroKondo can be run like most other nextflow pipelines. The most basic usage is as follows:
+`nextflow run main.nf --input PATH_TO_SAMPLE_SHEET --outdir OUTPUT_DIR --platform SEQUENCING_PLATFORM -profile CONTAINER_TYPE`
+
+Many parameters can be altered or accessed from the command line. For a full list of parameters to be altered please refer to the `nextflow.config` file in the repo.
+
+## Input
+
+This pipeline requires the following as input:
+
+### Sample files (gzip)
+This pipeline requires sample files to be gzipped (symlinks may be problematic).
+
+### Samplesheet (CSV)
Mikrokondo requires a sample sheet to be run. This FOFN (file of file names) contains the samples names and allows a user to combine read-sets based on that name if provided. The sample-sheet can utilize the following header fields:
- sample
@@ -9,7 +23,6 @@ Mikrokondo requires a sample sheet to be run. This FOFN (file of file names) con
- long_reads
- assembly
-**The sample sheet must be in csv format and sample files must be gzipped**
Example layouts for different sample-sheets include:
@@ -37,12 +50,8 @@ _Starting with assembly only_
|------|--------|
|sample_name|path_to_assembly|
-## Useage
-MikroKondo can be run like most other nextflow pipelines. The most basic usage is as follows:
-`nextflow run main.nf --input PATH_TO_SAMPLE_SHEET --outdir OUTPUT_DIR --platform SEQUENCING_PLATFORM -profile CONTAINER_TYPE`
-
-Many parameters can be altered or accessed from the command line. For a full list of parameters to be altered please refer to the `nextflow.config` file in the repo.
+## Command line arguments
> **Note:** All the below settings can be permanently changed in the `nextflow.config` file within the `params` section. For example, to permanently set a nanopore chemistry and use Kraken for speciation:
```
@@ -50,8 +59,6 @@ Many parameters can be altered or accessed from the command line. For a full lis
--nanopore_chemistry "r1041_e82_400bps_hac_v4.2.0" // Note the quotes used here
```
-### Common command line arguments
-
#### Nf-core boiler plate options
- `--publish_dir_mode`: Method used to save pipeline results to output directory
@@ -118,3 +125,18 @@ Different container services can be specified from the command line when running
- `slurm_p true`: slurm execurtor will be used.
- `slurm_profile STRING`: a string to allow the user to specify which slurm partition to use.
+
+## Output
+
+All output files will be written into the `outdir` (specified by the user). More explicit tool results can be found in both the [Workflow](/workflows/CleanAssemble/) and [Subworkflow](/subworkflows/assemble_reads/) sections of the docs. Here is a brief description of the outdir structure:
+
+- **annotations** - dir containing all annotation tool output.
+- **assembly** - dir containing all assembly tool related output, including quality, 7 gene MLST and taxon determination.
+- **pipeline_info** - dir containing all pipeline related information including software versions used and execution reports.
+- **ReadQuality** - dir containing all read tool related output, including contamination, fastq, mash, and subsampled read sets (when present)
+- **subtyping** - dir containing all subtyping tool related output, including SISTR, ECtyper, etc.
+- **SummaryReport** - dir containing collated results files for all tools, including:
+ - Individual sample flatted json reports
+ - **final_report** - All tool results for all samples in both .json (including a flattened version) and .tsv format
+- **bco.json** - data providence file generated from the nf-prov plug-in
+- **manifest.json** - data providence file generated from the nf-prov plug-in
\ No newline at end of file
diff --git a/docs/workflows/CleanAssemble.md b/docs/workflows/CleanAssemble.md
index defd6e6c..a4877694 100644
--- a/docs/workflows/CleanAssemble.md
+++ b/docs/workflows/CleanAssemble.md
@@ -11,7 +11,7 @@
## Steps
-1. **[QC reads](subworkflows/clean_reads)** subworkflow steps in brief are listed below, for further information see [clean_reads.nf](subworkflows/local/clean_reads.nf)
+1. **[QC reads](/subworkflows/clean_reads)** subworkflow steps in brief are listed below, for further information see [clean_reads.nf](https://github.com/phac-nml/mikrokondo/blob/main/subworkflows/local/clean_reads.nf)
- Reads are checked for known sequencing contamination
- Quality metrics are calculated
- Reads are trimmed
@@ -19,9 +19,9 @@
- Read set subsampled to set level (OPTIONAL)
- Read set is assessed to be either an isolate or metagenomic sample (from presence of multiple taxa)
-2. **[Assemble reads](/subworkflows/assemble_reads)** using the `params.platform` flag, read sets will be diverted to either the assemble_reads (short reads) or hybrid_assembly (short and/or long reads) workflow. Though the data is handled differently in eash subworklow, both generate a contigs file and a bandage image, with an option of initial polishing via Racon. See [assemble_reads.nf](subworkflows/local/assemble_reads.nf) and [hybrid_assembly.nf](subworkflows/local/hybrid_assembly.nf) subworkflow pages for more details.
+2. **[Assemble reads](/subworkflows/assemble_reads)** using the `params.platform` flag, read sets will be diverted to either the assemble_reads (short reads) or hybrid_assembly (short and/or long reads) workflow. Though the data is handled differently in eash subworklow, both generate a contigs file and a bandage image, with an option of initial polishing via Racon. See [assemble_reads.nf](https://github.com/phac-nml/mikrokondo/blob/main/subworkflows/local/assemble_reads.nf) and [hybrid_assembly.nf](https://github.com/phac-nml/mikrokondo/blob/main/subworkflows/local/hybrid_assembly.nf) subworkflow pages for more details.
-3. **[Polish assembles](/subworkflows/polish_assemblies)** (OPTIONAL) Polishing of contigs can be added [polish_assemblies.nf](subworkflows/local/polish_assemblies.nf). To make changes to the default workflow, see setting 'optional flags' page.
+3. **[Polish assembles](/subworkflows/polish_assemblies)** (OPTIONAL) Polishing of contigs can be added [polish_assemblies.nf](https://github.com/phac-nml/mikrokondo/blob/main/subworkflows/local/polish_assemblies.nf). To make changes to the default workflow, see setting 'optional flags' page.
## Input
- Next generation sequencing reads:
diff --git a/docs/workflows/PostAssembly.md b/docs/workflows/PostAssembly.md
index fd884c66..d518fdff 100644
--- a/docs/workflows/PostAssembly.md
+++ b/docs/workflows/PostAssembly.md
@@ -13,11 +13,11 @@ This workflow is triggered in two ways: 1. when assemblies are used for initial
- `subtype_genome.nf`
## Steps
-1. **Determine type** using the `metagenomic_samples` flag, this workflow will direct assemblies to the following two paths:
- a. Isolate: proceeds to step 2.
- b. Metagenomic: runs the following two modules before proceeding to step 2.
- i. [kraken2.nf](modules/local/kraken.nf) runs kraken 2 on contigs
- ii. [bin_kraken2.nf](modules/local/bin_kraken2.nf) bins contigs to respective genus level taxa
+1. **Determine type** using the `metagenomic_samples` flag, this workflow will direct assemblies to the following two paths:
+ 1. Isolate: proceeds to step 2.
+ 2. Metagenomic: runs the following two modules before proceeding to step 2.
+ 1. [kraken.nf](https://github.com/phac-nml/mikrokondo/blob/main/modules/local/kraken.nf) runs kraken2 on contigs
+ 2. [bin_kraken2.nf](https://github.com/phac-nml/mikrokondo/blob/main/modules/local/bin_kraken2.nf) bins contigs to respective genus level taxa
2. **[QC assemblies](/subworkflows/qc_assembly)** (OPTIONAL) runs quast and assigns quality metrics to generated assemblies
3. **[Determine species](/subworkflows/determine_species)** (OPTIONAL) runs classifier tool (default: [Mash](https://github.com/marbl/Mash)) to determine sample or binned species
4. **[Subtype genome](/subworkflows/subtype_genome)** (OPTIONAL) species specific subtyping tools are launched using a generated MASH screen report.
diff --git a/utils/mikrokondo_mermaid.js b/utils/mikrokondo_mermaid.js
new file mode 100644
index 00000000..e226d211
--- /dev/null
+++ b/utils/mikrokondo_mermaid.js
@@ -0,0 +1,66 @@
+flowchart LR
+ CI(Check input):::lightGreen --> R((Reads));
+ CI --> A((Assembly));
+ R --> CA(CleanAssemble):::pink;
+ CA --> QC(QC reads / clean reads):::lightGreen;
+ QC --> QC1("Remove contaminants (Minimap2)"):::orange;
+ QC1 --> QC2(Fastp):::orange;
+ QC2 --> QC3(parse Fastp trimming):::orange;
+ QC3 --> QC4("estimate coverage (kat_hist)"):::orange;
+ QC4 --> QC5("Subsample (seqTK_sample)"):::orange;
+ QC5 --> QC6("Contamination check (Mash screen)"):::orange;
+ QC6 --> QC7("Separate reads (parse Mash)"):::orange;
+ QC --> SR((short read));
+ SR --> I((Isolate));
+ SR --> M((Metagenomic));
+ I --> AR(Assemble reads):::lightGreen;
+ AR --> AR1(Bandage image):::orange;
+ AR1 --> AR2((Illumina));
+ AR2 --> AR22(Spades assemble):::orange;
+ AR1 --> AR3((Pacbio/nanopore));
+ AR3 --> AR32(Flye assemble):::orange;
+ AR22 --> HAD3;
+ AR32 --> HAD3;
+ M -->|"set assembly flag to `metagenomic`"| AR;
+ QC --> H((Long read or hybrid));
+ H --> HA(Hybrid assembly):::lightGreen;
+ HA --> HAD((Default));
+ HAD --> HAD1(Flye assemble):::orange;
+ HAD1 --> HAD2(Bandage image):::orange;
+ HAD2 --> HAD3("Create contig index (Minimap2 index)"):::orange;
+ HAD3 --> HAD4("Generate SAM (Minimap2 map)"):::orange;
+ HAD4 --> HAD5(Racon polish):::orange;
+ HAD5 --> HAD6(Pilon interate):::orange;
+ HA --> HAU((Unicycler));
+ HAU --> HAU1(Unicycler assemble):::orange;
+ HAU1 --> HAU2(Bandage image):::orange;
+ QC7 --> PA(Polish assemblies):::lightGreen;
+ HAD6 --> PA;
+ HAU2 --> PA;
+ PA --> PAI((Illumina));
+ PAI --> PAI1(Pilon iterate):::orange;
+ PA --> PAN((Pacbio/nanopore));
+ PAN --> PAN1(Medaka polish):::orange;
+ PAI1 --> PASS(Post assembly):::pink;
+ PAN1 --> PASS;
+ A --> PASS;
+ PASS --> I1((Isolate));
+ PASS --> M1((Metagenomic));
+ I1 --> QCA(QC assemblies):::lightGreen;
+
+
+ subgraph legend [Legend]
+ direction LR;
+ wk(Workflow):::pink --> sw(Subworkflow):::lightGreen;
+ sw --> m(Module):::orange;
+ d((Decision));
+ end
+
+
+
+
+
+
+ classDef lightGreen fill:#0ABC9B,stroke:#0ABC9B,stroke-width:2px,rx:10px,ry:10px;
+ classDef pink fill:#F681CB,stroke:#F681CB,stroke-width:2px,rx:10px,ry:10px;
+ classDef orange fill:#F2B581,stroke:#F2B581,stroke-width:2px,rx:10px,ry:10px;