Skip to content

Commit

Permalink
Merge pull request #66 from phac-nml/directories/restructuring
Browse files Browse the repository at this point in the history
Directories/restructuring
  • Loading branch information
mattheww95 authored May 6, 2024
2 parents cfab17f + 947f21b commit 81e7ccc
Show file tree
Hide file tree
Showing 38 changed files with 821 additions and 451 deletions.
36 changes: 25 additions & 11 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,31 @@
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## `Unreleased`

### `Added`

- Updated documentation for params.

- Fixed param typos in schema, config and docs.

- Added parameter to skip length filtering of sequences

- Added locidex for allele calling

- Updated directory output structure and names

- Added tests for Kraken2 contig binning

### `Fixed`

- If you select to filter contigs by length, those contigs will now be used for subsequent analysis. This resolves issue [#55](https://github.com/phac-nml/mikrokondo/issues/55)

### `Dependencies`

### `Deprecated`


## v0.1.2 - [2024-05-02]

### Added
Expand Down Expand Up @@ -50,15 +75,4 @@ Initial release of phac-nml/mikrokondo. Mikrokondo currently supports: read trim

- Added integration testing using [nf-test](https://www.nf-test.com/).

### `Added`

- Updated documentation for params.

- Fixed param typos in schema, config and docs.


### `Fixed`

### `Dependencies`

### `Deprecated`
56 changes: 44 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,35 @@
[![run with singularity](https://img.shields.io/badge/run%20with-singularity-1d355c.svg?labelColor=000000)](https://sylabs.io/docs/)
<!-- [![Launch on Nextflow Tower](https://img.shields.io/badge/Launch%20%F0%9F%9A%80-Nextflow%20Tower-%234256e7)](https://tower.nf/launch?pipeline=https://github.com/mk-kondo/mikrokondo) -->

- [Introduction](#introduction)
* [What is mikrokondo?](#what-is-mikrokondo-)
* [Is mikrokondo right for me?](#is-mikrokondo-right-for-me-)
* [Citation](#citation)
+ [Contact](#contact)
- [Installing mikrokondo](#installing-mikrokondo)
* [Step 1: Installing Nextflow](#step-1--installing-nextflow)
* [Step 2: Choose a Container Engine](#step-2--choose-a-container-engine)
+ [Docker or Singularity?](#docker-or-singularity-)
* [Step 3: Install dependencies](#step-3--install-dependencies)
+ [Dependencies listed](#dependencies-listed)
* [Step 4: Further resources to download](#step-4--further-resources-to-download)
+ [Configuration and settings:](#configuration-and-settings-)
- [Getting Started](#getting-started)
* [Usage](#usage)
+ [Data Input/formats](#data-input-formats)
+ [Output/Results](#output-results)
* [Run example data](#run-example-data)
* [Testing](#testing)
+ [Install nf-test](#install-nf-test)
+ [Run tests](#run-tests)
* [Troubleshooting and FAQs:](#troubleshooting-and-faqs-)
* [References](#references)
* [Legal and Compliance Information:](#legal-and-compliance-information-)
* [Updates and Release Notes:](#updates-and-release-notes-)

<small><i><a href='http://ecotrust-canada.github.io/markdown-toc/'>Table of contents generated with markdown-toc</a></i></small>


# Introduction

## What is mikrokondo?
Expand Down Expand Up @@ -127,18 +156,21 @@ For more information see the [useage docs](https://phac-nml.github.io/mikrokondo

### Output/Results

All output files will be written into the `outdir` (specified by the user). More explicit tool results can be found in both the [Workflow](workflows/CleanAssemble/) and [Subworkflow](subworkflows/) sections of the docs. Here is a brief description of the outdir structure:

- **annotations** - dir containing all annotation tool output.
- **assembly** - dir containing all assembly tool related output, including quality, 7 gene MLST and taxon determination.
- **pipeline_info** - dir containing all pipeline related information including software versions used and execution reports.
- **ReadQuality** - dir containing all read tool related output, including contamination, fastq, mash, and subsampled read sets (when present)
- **subtyping** - dir containing all subtyping tool related output, including SISTR, ECtyper, etc.
- **SummaryReport** - dir containing collated results files for all tools, including:
- Individual sample flatted json reports
- **final_report** - All tool results for all samples in both .json (including a flattened version) and .tsv format
- **bco.json** - data providence file generated from the nf-prov plug-in
- **manifest.json** - data providence file generated from the nf-prov plug-in
All output files will be written into the `outdir` (specified by the user). More explicit tool results can be found in both the [Workflow](workflows/CleanAssemble/) and [Subworkflow](subworkflows/) sections of the docs. Here is a brief description of the outdir structure (though in brief the further into the structure you head, the further in the workflow the tool has been run):

- **Assembly** - contains all output files generated as a result of read assembly and tools using assembled contigs as input
- **Annotation** - contains output files generated from tools applying annotation and/or gene characterization from assembled contigs
- **Assembling** - contains output files generated as a part of the assembly process in nested order
- **FinalAssembly** - this directory will always contain the final output contig files from the last step in the assembly process (will take into account any skip flags in the process)
- **PostProcessing** - contains output files from intermediary tools that run after assembly but before annotation takes place in the workflow
- **Quality** - contains all output files generated as a result of quality tools after assembly
- **Subtyping** - contains all output files from workflow subtyping tools, based off assembled contigs
- **FinalReports** - contains assorted reports including aggregated and flat reports
- **pipeline_info** - includes tool versions and other pipeline specific information
- **Reads** - contains all output files generated as a result of read processing and tools using reads as input
- **FinalReads** - this directory will contain the final output read files from the last step in read processing (taking into account any skip flags used in the run)
- **Processing** - contains output files from tools run to process reads in nested order
- **Quality** - contains all output files generated from read quality tools

## Run example data

Expand Down
3 changes: 2 additions & 1 deletion bin/kraken2_bin.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
from collections import defaultdict
import os
import sys
import re


kraken2_classifiers = frozenset(["U", "R", "D", "K", "P", "C", "O", "F", "G", "S"])
Expand Down Expand Up @@ -355,7 +356,7 @@ def write_fastas(self, sequences):
"""
for k, v in sequences.items():
with open(
f"{k.strip().replace(' ', '_').replace('(', '_').replace(')', '_').replace('.', '_')}_binned.fasta",
"{}.binned.fasta".format(re.sub(r'[^A-Za-z0-9\-_]', '_', k)),
"w",
encoding="utf8",
) as out_file:
Expand Down
38 changes: 19 additions & 19 deletions conf/irida_next.config
Original file line number Diff line number Diff line change
Expand Up @@ -13,26 +13,26 @@ iridanext {
files {
idkey = "sample"
global = [
"**/SummaryReport/final_report.json",
"**/SummaryReport/final_report.tsv"
"**/FinalReports/Aggregated/Json/final_report.json",
"**/FinalReports/Aggregated/Tables/final_report.tsv"
]
samples = [
"**/assembly/length_filtered_contigs/*_filtered.fasta.gz",
"**/assembly/quality/quast/*/*.pdf",
"**/assembly/7GeneMLST/*.json",
"**/assembly/taxon_determination/mash/*.taxa.screen",
"**/subtyping/ectyper/*/output.tsv",
"**/subtyping/sistr/*.json",
"**/subtyping/lissero/*.tsv",
"**/annotations/abricate/*.txt",
"**/annotations/mobrecon/*/mobtyper_results.txt",
"**/annotations/bakta/*.gbff",
"**/annotations/bakta/*.txt",
"**/StarAMR/*/summary.tsv",
"**/StarAMR/*/detailed_summary.tsv",
"**/StarAMR/*/results.xlsx",
"**/locidex/Report/*.profile.mlst.json.gz",
"**/SummaryReport/*_flat_sample.json.gz",
"**/Assembly/FinalAssembly/*/*.filtered.assembly.fasta.gz",
"**/Assembly/Quality/QUAST/*/*.pdf",
"**/Assembly/Subtyping/SevenGeneMLST/*7.mlst.subtyping.json",
"**/Assembly/Speciation/MashScreen/*.taxa.screen",
"**/Assembly/Subtyping/ECTyper/*/*output*.tsv",
"**/Assembly/Subtyping/SISTR/*.json",
"**/Assembly/Subtyping/Lissero/*.tsv",
"**/Assembly/Subtyping/Locidex/Report/*.json.gz",
"**/Assembly/Annotation/Abricate/*abricate.annotation.txt",
"**/Assembly/Annotation/Mobsuite/Recon/*/*mobtyper_results*.txt",
"**/Assembly/Annotation/Bakta/*.gbff",
"**/Assembly/Annotation/Bakta/*.txt",
"**/Assembly/Annotation/StarAMR/*/*summary*.tsv",
"**/Assembly/Annotation/StarAMR/*/*detailed_summary*.tsv",
"**/Assembly/Annotation/StarAMR/*/*results*.xlsx",
"**/FinalReports/FlattenedReports/*.flat_sample.json.gz"
]
}
metadata {
Expand Down Expand Up @@ -97,7 +97,7 @@ iridanext {
"FastP.command"
]
json {
path = "**/SummaryReport/final_report_flattened.json"
path = "**/FinalReports/Sample/Json/final_report_flattened.json"
}
}
}
Expand Down
Loading

0 comments on commit 81e7ccc

Please sign in to comment.