Skip to content

Commit

Permalink
continued docs updates: fixed nav, included mermaid workflow diagram …
Browse files Browse the repository at this point in the history
…and coding, wrote output section
  • Loading branch information
ChristyPeterson authored and ChristyPeterson committed Mar 8, 2024
1 parent 8002e92 commit 9319a7b
Show file tree
Hide file tree
Showing 8 changed files with 119 additions and 19 deletions.
13 changes: 12 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,18 @@ For more information see the [useage docs](https://phac-nml.github.io/mikrokondo

### Output/Results

Explanation of how to interpret and/or export data. Include an example table of output if applicable with columns explained.
All output files will be written into the `outdir` (specified by the user). More explicit tool results can be found in both the [Workflow](workflows/CleanAssemble/) and [Subworkflow](subworkflows/) sections of the docs. Here is a brief description of the outdir structure:

- **annotations** - dir containing all annotation tool output.
- **assembly** - dir containing all assembly tool related output, including quality, 7 gene MLST and taxon determination.
- **pipeline_info** - dir containing all pipeline related information including software versions used and execution reports.
- **ReadQuality** - dir containing all read tool related output, including contamination, fastq, mash, and subsampled read sets (when present)
- **subtyping** - dir containing all subtyping tool related output, including SISTR, ECtyper, etc.
- **SummaryReport** - dir containing collated results files for all tools, including:
- Individual sample flatted json reports
- **final_report** - All tool results for all samples in both .json (including a flattened version) and .tsv format
- **bco.json** - data providence file generated from the nf-prov plug-in
- **manifest.json** - data providence file generated from the nf-prov plug-in

## Run example data

Expand Down
File renamed without changes.
1 change: 1 addition & 0 deletions docs/images/mikrokondo_mermaid.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,4 +15,4 @@ This workflow will detect what pathogen(s) is present and apply the applicable m

## Workflow Schematics (Subject to change)

![Pipeline](images/20230921_Mikrokondo-worflow2.png "Workflow")
![Pipeline](images/mikrokondo_mermaid.svg "Workflow")
40 changes: 31 additions & 9 deletions docs/usage/useage.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,20 @@
# Running MikroKondo

### Samplesheet
## Useage

MikroKondo can be run like most other nextflow pipelines. The most basic usage is as follows:
`nextflow run main.nf --input PATH_TO_SAMPLE_SHEET --outdir OUTPUT_DIR --platform SEQUENCING_PLATFORM -profile CONTAINER_TYPE`

Many parameters can be altered or accessed from the command line. For a full list of parameters to be altered please refer to the `nextflow.config` file in the repo.

## Input

This pipeline requires the following as input:

### Sample files (gzip)
This pipeline requires sample files to be gzipped (symlinks may be problematic).

### Samplesheet (CSV)
Mikrokondo requires a sample sheet to be run. This FOFN (file of file names) contains the samples names and allows a user to combine read-sets based on that name if provided. The sample-sheet can utilize the following header fields:

- sample
Expand All @@ -9,7 +23,6 @@ Mikrokondo requires a sample sheet to be run. This FOFN (file of file names) con
- long_reads
- assembly

**The sample sheet must be in csv format and sample files must be gzipped**

Example layouts for different sample-sheets include:

Expand Down Expand Up @@ -37,21 +50,15 @@ _Starting with assembly only_
|------|--------|
|sample_name|path_to_assembly|

## Useage

MikroKondo can be run like most other nextflow pipelines. The most basic usage is as follows:
`nextflow run main.nf --input PATH_TO_SAMPLE_SHEET --outdir OUTPUT_DIR --platform SEQUENCING_PLATFORM -profile CONTAINER_TYPE`

Many parameters can be altered or accessed from the command line. For a full list of parameters to be altered please refer to the `nextflow.config` file in the repo.
## Command line arguments

> **Note:** All the below settings can be permanently changed in the `nextflow.config` file within the `params` section. For example, to permanently set a nanopore chemistry and use Kraken for speciation:
```
--run_kraken = true // Note the lack of quotes
--nanopore_chemistry "r1041_e82_400bps_hac_v4.2.0" // Note the quotes used here
```

### Common command line arguments

#### Nf-core boiler plate options

- `--publish_dir_mode`: Method used to save pipeline results to output directory
Expand Down Expand Up @@ -118,3 +125,18 @@ Different container services can be specified from the command line when running

- `slurm_p true`: slurm execurtor will be used.
- `slurm_profile STRING`: a string to allow the user to specify which slurm partition to use.

## Output

All output files will be written into the `outdir` (specified by the user). More explicit tool results can be found in both the [Workflow](/workflows/CleanAssemble/) and [Subworkflow](/subworkflows/assemble_reads/) sections of the docs. Here is a brief description of the outdir structure:

- **annotations** - dir containing all annotation tool output.
- **assembly** - dir containing all assembly tool related output, including quality, 7 gene MLST and taxon determination.
- **pipeline_info** - dir containing all pipeline related information including software versions used and execution reports.
- **ReadQuality** - dir containing all read tool related output, including contamination, fastq, mash, and subsampled read sets (when present)
- **subtyping** - dir containing all subtyping tool related output, including SISTR, ECtyper, etc.
- **SummaryReport** - dir containing collated results files for all tools, including:
- Individual sample flatted json reports
- **final_report** - All tool results for all samples in both .json (including a flattened version) and .tsv format
- **bco.json** - data providence file generated from the nf-prov plug-in
- **manifest.json** - data providence file generated from the nf-prov plug-in
6 changes: 3 additions & 3 deletions docs/workflows/CleanAssemble.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,17 +11,17 @@


## Steps
1. **[QC reads](subworkflows/clean_reads)** subworkflow steps in brief are listed below, for further information see [clean_reads.nf](subworkflows/local/clean_reads.nf)
1. **[QC reads](/subworkflows/clean_reads)** subworkflow steps in brief are listed below, for further information see [clean_reads.nf](https://github.com/phac-nml/mikrokondo/blob/main/subworkflows/local/clean_reads.nf)
- Reads are checked for known sequencing contamination
- Quality metrics are calculated
- Reads are trimmed
- Coverage is estimated
- Read set subsampled to set level (OPTIONAL)
- Read set is assessed to be either an isolate or metagenomic sample (from presence of multiple taxa)

2. **[Assemble reads](/subworkflows/assemble_reads)** using the `params.platform` flag, read sets will be diverted to either the assemble_reads (short reads) or hybrid_assembly (short and/or long reads) workflow. Though the data is handled differently in eash subworklow, both generate a contigs file and a bandage image, with an option of initial polishing via Racon. See [assemble_reads.nf](subworkflows/local/assemble_reads.nf) and [hybrid_assembly.nf](subworkflows/local/hybrid_assembly.nf) subworkflow pages for more details.
2. **[Assemble reads](/subworkflows/assemble_reads)** using the `params.platform` flag, read sets will be diverted to either the assemble_reads (short reads) or hybrid_assembly (short and/or long reads) workflow. Though the data is handled differently in eash subworklow, both generate a contigs file and a bandage image, with an option of initial polishing via Racon. See [assemble_reads.nf](https://github.com/phac-nml/mikrokondo/blob/main/subworkflows/local/assemble_reads.nf) and [hybrid_assembly.nf](https://github.com/phac-nml/mikrokondo/blob/main/subworkflows/local/hybrid_assembly.nf) subworkflow pages for more details.

3. **[Polish assembles](/subworkflows/polish_assemblies)** (OPTIONAL) Polishing of contigs can be added [polish_assemblies.nf](subworkflows/local/polish_assemblies.nf). To make changes to the default workflow, see setting 'optional flags' page.
3. **[Polish assembles](/subworkflows/polish_assemblies)** (OPTIONAL) Polishing of contigs can be added [polish_assemblies.nf](https://github.com/phac-nml/mikrokondo/blob/main/subworkflows/local/polish_assemblies.nf). To make changes to the default workflow, see setting 'optional flags' page.

## Input
- Next generation sequencing reads:
Expand Down
10 changes: 5 additions & 5 deletions docs/workflows/PostAssembly.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,11 +13,11 @@ This workflow is triggered in two ways: 1. when assemblies are used for initial
- `subtype_genome.nf`

## Steps
1. **Determine type** using the `metagenomic_samples` flag, this workflow will direct assemblies to the following two paths:
a. Isolate: proceeds to step 2.
b. Metagenomic: runs the following two modules before proceeding to step 2.
i. [kraken2.nf](modules/local/kraken.nf) runs kraken 2 on contigs
ii. [bin_kraken2.nf](modules/local/bin_kraken2.nf) bins contigs to respective genus level taxa
1. **Determine type** using the `metagenomic_samples` flag, this workflow will direct assemblies to the following two paths:
1. Isolate: proceeds to step 2.
2. Metagenomic: runs the following two modules before proceeding to step 2.
1. [kraken.nf](https://github.com/phac-nml/mikrokondo/blob/main/modules/local/kraken.nf) runs kraken2 on contigs
2. [bin_kraken2.nf](https://github.com/phac-nml/mikrokondo/blob/main/modules/local/bin_kraken2.nf) bins contigs to respective genus level taxa
2. **[QC assemblies](/subworkflows/qc_assembly)** (OPTIONAL) runs quast and assigns quality metrics to generated assemblies
3. **[Determine species](/subworkflows/determine_species)** (OPTIONAL) runs classifier tool (default: [Mash](https://github.com/marbl/Mash)) to determine sample or binned species
4. **[Subtype genome](/subworkflows/subtype_genome)** (OPTIONAL) species specific subtyping tools are launched using a generated MASH screen report.
Expand Down
66 changes: 66 additions & 0 deletions utils/mikrokondo_mermaid.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
flowchart LR
CI(Check input):::lightGreen --> R((Reads));
CI --> A((Assembly));
R --> CA(CleanAssemble):::pink;
CA --> QC(QC reads / clean reads):::lightGreen;
QC --> QC1("Remove contaminants<br>(Minimap2)"):::orange;
QC1 --> QC2(Fastp):::orange;
QC2 --> QC3(parse Fastp<br>trimming):::orange;
QC3 --> QC4("estimate coverage<br>(kat_hist)"):::orange;
QC4 --> QC5("Subsample<br>(seqTK_sample)"):::orange;
QC5 --> QC6("Contamination check<br>(Mash screen)"):::orange;
QC6 --> QC7("Separate reads<br>(parse Mash)"):::orange;
QC --> SR((short read));
SR --> I((Isolate));
SR --> M((Metagenomic));
I --> AR(Assemble reads):::lightGreen;
AR --> AR1(Bandage image):::orange;
AR1 --> AR2((Illumina));
AR2 --> AR22(Spades assemble):::orange;
AR1 --> AR3((Pacbio/nanopore));
AR3 --> AR32(Flye assemble):::orange;
AR22 --> HAD3;
AR32 --> HAD3;
M -->|"set assembly flag<br>to `metagenomic`"| AR;
QC --> H((Long read<br>or hybrid));
H --> HA(Hybrid assembly):::lightGreen;
HA --> HAD((Default));
HAD --> HAD1(Flye assemble):::orange;
HAD1 --> HAD2(Bandage image):::orange;
HAD2 --> HAD3("Create contig index<br>(Minimap2 index)"):::orange;
HAD3 --> HAD4("Generate SAM<br>(Minimap2 map)"):::orange;
HAD4 --> HAD5(Racon polish):::orange;
HAD5 --> HAD6(Pilon interate):::orange;
HA --> HAU((Unicycler));
HAU --> HAU1(Unicycler assemble):::orange;
HAU1 --> HAU2(Bandage image):::orange;
QC7 --> PA(Polish assemblies):::lightGreen;
HAD6 --> PA;
HAU2 --> PA;
PA --> PAI((Illumina));
PAI --> PAI1(Pilon iterate):::orange;
PA --> PAN((Pacbio/nanopore));
PAN --> PAN1(Medaka polish):::orange;
PAI1 --> PASS(Post assembly):::pink;
PAN1 --> PASS;
A --> PASS;
PASS --> I1((Isolate));
PASS --> M1((Metagenomic));
I1 --> QCA(QC assemblies):::lightGreen;


subgraph legend [Legend]
direction LR;
wk(Workflow):::pink --> sw(Subworkflow):::lightGreen;
sw --> m(Module):::orange;
d((Decision));
end






classDef lightGreen fill:#0ABC9B,stroke:#0ABC9B,stroke-width:2px,rx:10px,ry:10px;
classDef pink fill:#F681CB,stroke:#F681CB,stroke-width:2px,rx:10px,ry:10px;
classDef orange fill:#F2B581,stroke:#F2B581,stroke-width:2px,rx:10px,ry:10px;

0 comments on commit 9319a7b

Please sign in to comment.