From 5e0bbb59f80caaddc7c5778a8248e66dc8219324 Mon Sep 17 00:00:00 2001 From: mbdabrowska1 Date: Mon, 11 Mar 2024 13:17:03 +0000 Subject: [PATCH] Format README --- README.md | 77 ++++++++++++++++++++++++++++--------------------------- 1 file changed, 39 insertions(+), 38 deletions(-) diff --git a/README.md b/README.md index 283ac65..a27862b 100644 --- a/README.md +++ b/README.md @@ -16,44 +16,45 @@ # ![NanopathPipeline](docs/images/16S_Pipeline.png#gh-light-mode-only) ![NanopathPipeline](docs/images/16S_Pipeline_darkmode.png#gh-dark-mode-only) -1. Initialize the data: - If a fastq directory is provided: - Concatenate fastq files using CAT_FASTQS. - -2. Validate input: - Use the INPUT_CHECK subworkflow to read samplesheet, validate, and stage input files. - Branch reads based on their status (discontinued or samples). - -3. Perform Quality Control: - Run ([`FASTP`](https://github.com/OpenGene/fastp)) for quality control, filtering, and preprocessing. - Filter out samples with no reads left after FASTP. - Run ([`FASTQC`](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)) on the processed reads. - -4. Classfy and Cluster: - If specified, remove unclassified reads using ([`KRAKEN2`](https://github.com/DerrickWood/kraken2)). - Subset reads based on specified parameters (default 100k reads to keep memory requirements reasonable). - Perform k-mer frequency analysis with KMER_FREQS. - Perform read clustering with READ_CLUSTERING using ([`HDBSCAN`](https://github.com/scikit-learn-contrib/hdbscan)) and ([`UMAP`](https://umap-learn.readthedocs.io/en/latest/)). - -5. Split Clusters and Correct Errors: - Split clusters. - Perform error correction using ([`CANU`](https://github.com/marbl/canu)). - -6. Select and Polish Draft: - Select draft reads using ([`FASTANI`](https://github.com/ParBLiSS/FastANI)). - Polish drafts using ([`RACON`](https://github.com/isovic/racon)). - Generate final consensus using ([`MEDAKA`](https://github.com/nanoporetech/medaka)). - -7. Classify Taxonomically: - Based on chosen tool, classify consensus sequences with ([`BLAST`](https://www.ncbi.nlm.nih.gov/books/NBK279690/)), ([`SEQMATCH`](https://github.com/rdpstaff/SequenceMatch)), ([`KRAKEN`](https://github.com/DerrickWood/kraken2)) or all of them. - Join classification results using JOIN_RESULTS. - -8. Estimate Abundace: - Estimate abundance per sample per detected species. - -9. Generate Reports: - If report generation is chosen: - Generate HTML reports. +1. **Initialize the data:** + - If a fastq directory is provided: + - Concatenate fastq files using CAT_FASTQS. + +2. **Validate input:** + - Use the INPUT_CHECK subworkflow to read samplesheet, validate, and stage input files. + - Branch reads based on their status (discontinued or samples). + +3. **Perform Quality Control:** + - Run ([`FASTP`](https://github.com/OpenGene/fastp)) for quality control, filtering, and preprocessing. + - Filter out samples with no reads left after FASTP. + - Run ([`FASTQC`](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)) on the processed reads. + +4. **Classfy and Cluster:** + - If specified, remove unclassified reads using ([`KRAKEN2`](https://github.com/DerrickWood/kraken2)). + - Subset reads based on specified parameters (default 100k reads to keep memory requirements reasonable). + - Perform k-mer frequency analysis with KMER_FREQS. + - Perform read clustering with READ_CLUSTERING using ([`HDBSCAN`](https://github.com/scikit-learn-contrib/hdbscan)) and ([`UMAP`](https://umap-learn.readthedocs.io/en/latest/)). + +5. **Split Clusters and Correct Errors:** + - Split clusters. + - Perform error correction using ([`CANU`](https://github.com/marbl/canu)). + +6. **Select and Polish Draft:** + - Select draft reads using ([`FASTANI`](https://github.com/ParBLiSS/FastANI)). + - Polish drafts using ([`RACON`](https://github.com/isovic/racon)). + - Generate final consensus using ([`MEDAKA`](https://github.com/nanoporetech/medaka)). + +7. **Classify Taxonomically:** + - Based on chosen tool, classify consensus sequences with ([`BLAST`](https://www.ncbi.nlm.nih.gov/books/NBK279690/)), ([`SEQMATCH`](https://github.com/rdpstaff/SequenceMatch)), ([`KRAKEN`](https://github.com/DerrickWood/kraken2)) or all of them. + - Join classification results using JOIN_RESULTS. + +8. **Estimate Abundace:** + - Estimate abundance per sample per detected species. + +9. **Generate Reports:** + - If report generation is chosen: + - Generate HTML reports. + ## Usage > **Note**