diff --git a/docs/assets/files/TheiaCoV_Illumina_PE_qc_check_template.tsv b/docs/assets/files/TheiaCoV_Illumina_PE_qc_check_template.tsv new file mode 100644 index 000000000..56da7c5e0 --- /dev/null +++ b/docs/assets/files/TheiaCoV_Illumina_PE_qc_check_template.tsv @@ -0,0 +1,6 @@ +taxon num_reads_raw1 num_reads_raw2 num_reads_clean1 num_reads_clean2 kraken_human kraken_human_dehosted meanbaseq_trim assembly_mean_coverage number_N number_Degenerate assembly_length_unambiguous_min assembly_length_unambiguous_max percent_reference_coverage vadr_num_alerts +sars-cov-2 100000 100000 100000 100000 20 20 30 100 5000 1 25000 30000 83 0 +HIV 100000 100000 100000 100000 20 20 30 100 +WNV 100000 100000 100000 100000 20 20 30 100 +MPXV 100000 100000 100000 100000 20 20 30 100 +flu 100000 100000 100000 100000 20 20 30 100 diff --git a/docs/workflows/genomic_characterization/theiacov.md b/docs/workflows/genomic_characterization/theiacov.md index da4a64a0a..04f64931f 100644 --- a/docs/workflows/genomic_characterization/theiacov.md +++ b/docs/workflows/genomic_characterization/theiacov.md @@ -55,15 +55,15 @@ Additionally, the **TheiaCoV_FASTA_Batch** workflow is available to process seve ### Supported Organisms -These workflows currently support the following organisms: +These workflows currently support the following organisms. The first option in the list (bolded) is what our workflows use as the _standardized_ organism name: -- **SARS-CoV-2** (`"sars-cov-2"`, `"SARS-CoV-2"`) - ==_default organism input_== -- **Monkeypox virus** (`"MPXV"`, `"mpox"`, `"monkeypox"`, `"Monkeypox virus"`, `"Mpox"`) -- **Human Immunodeficiency Virus** (`"HIV"`) -- **West Nile Virus** (`"WNV"`, `"wnv"`, `"West Nile virus"`) -- **Influenza** (`"flu"`, `"influenza"`, `"Flu"`, `"Influenza"`) -- **RSV-A** (`"rsv_a"`, `"rsv-a"`, `"RSV-A"`, `"RSV_A"`) -- **RSV-B** (`"rsv_b"`, `"rsv-b"`, `"RSV-B"`, `"RSV_B"`) +- **SARS-CoV-2** (**`"sars-cov-2"`**, `"SARS-CoV-2"`) - ==_default organism input_== +- **Monkeypox virus** (**`"MPXV"`**, `"mpox"`, `"monkeypox"`, `"Monkeypox virus"`, `"Mpox"`) +- **Human Immunodeficiency Virus** (**`"HIV"`**) +- **West Nile Virus** (**`"WNV"`**, `"wnv"`, `"West Nile virus"`) +- **Influenza** (**`"flu"`**, `"influenza"`, `"Flu"`, `"Influenza"`) +- **RSV-A** (**`"rsv_a"`**, `"rsv-a"`, `"RSV-A"`, `"RSV_A"`) +- **RSV-B** (**`"rsv_b"`**, `"rsv-b"`, `"RSV-B"`, `"RSV_B"`) The compatibility of each workflow with each pathogen is shown below: @@ -800,6 +800,30 @@ All input reads are processed through "core tasks" in the TheiaCoV Illumina, ONT | Software Documentation | [NCBI Scrub]()
[Artic pipeline](https://artic.readthedocs.io/en/latest/?badge=latest)
[Kraken2](https://github.com/DerrickWood/kraken2/wiki) | | Original Publication(s) | [STAT: a fast, scalable, MinHash-based *k*-mer tool to assess Sequence Read Archive next-generation sequence submissions](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-021-02490-0)
[Improved metagenomic analysis with Kraken 2](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1891-0) | +??? task "`qc_check`: Check QC Metrics Against User-Defined Thresholds (optional)" + + The `qc_check` task compares generated QC metrics against user-defined thresholds for each metric. This task will run if the user provides a `qc_check_table` TSV file. If all QC metrics meet the threshold, the `qc_check` output variable will read `QC_PASS`. Otherwise, the output will read `QC_NA` if the task could not proceed or `QC_ALERT` followed by a string indicating what metric failed. + + The `qc_check` task applies quality thresholds according to the specified organism, which should match the _standardized_ `organism` input in the TheiaCoV workflows. + + ??? toggle "Formatting the _qc_check_table.tsv_" + + - The first column of the qc_check_table lists the `organism` that the task will assess and the header of this column must be "**taxon**". + - Each subsequent column indicates a QC metric and lists a threshold for each organism that will be checked. **The column names must exactly match expected values, so we highly recommend copy and pasting the header from the template file below as a starting place.** + + ??? toggle "Template _qc_check_table.tsv_ files" + + - TheiaCoV_Illumina_PE: [TheiaCoV_Illumina_PE_qc_check_template.tsv](../../assets/files/TheiaCoV_Illumina_PE_qc_check_template.tsv) + + !!! warning "Example Purposes Only" + The QC threshold values shown in the file above are for example purposes only and should not be presumed to be sufficient for every dataset. + + !!! techdetails "`qc_check` Technical Details" + + | | Links | + | --- | --- | + | Task | [task_qc_check.wdl](https://github.com/theiagen/public_health_bioinformatiocs/blob/main/tasks/quality_control/comparisons/task_qc_check.wdl) | + #### Assembly tasks !!! tip "" diff --git a/docs/workflows/genomic_characterization/theiaeuk.md b/docs/workflows/genomic_characterization/theiaeuk.md index bdfeb5d81..270080ac2 100644 --- a/docs/workflows/genomic_characterization/theiaeuk.md +++ b/docs/workflows/genomic_characterization/theiaeuk.md @@ -407,7 +407,7 @@ All input reads are processed through "core tasks" in the TheiaEuk workflows. Th | Software Documentation | https://busco.ezlab.org/ | | Orginal publication | [BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs](https://academic.oup.com/bioinformatics/article/31/19/3210/211866) | -??? task "`QC_check`: Check QC Metrics Against User-Defined Thresholds (optional)" +??? task "`qc_check`: Check QC Metrics Against User-Defined Thresholds (optional)" The `qc_check` task compares generated QC metrics against user-defined thresholds for each metric. This task will run if the user provides a `qc_check_table` .tsv file. If all QC metrics meet the threshold, the `qc_check` output variable will read `QC_PASS`. Otherwise, the output will read `QC_NA` if the task could not proceed or `QC_ALERT` followed by a string indicating what metric failed. diff --git a/docs/workflows/genomic_characterization/theiaprok.md b/docs/workflows/genomic_characterization/theiaprok.md index 41e11c51b..25e57e4d2 100644 --- a/docs/workflows/genomic_characterization/theiaprok.md +++ b/docs/workflows/genomic_characterization/theiaprok.md @@ -1077,7 +1077,7 @@ All input reads are processed through "[core tasks](#core-tasks-performed-for-al | Software Documentation | https://bitbucket.org/genomicepidemiology/plasmidfinder/src/master/ | | Original Publication(s) | [In Silico Detection and Typing of Plasmids using PlasmidFinder and Plasmid Multilocus Sequence Typing](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4068535/) | -??? task "**`QC_check`: Check QC Metrics Against User-Defined Thresholds (optional)**" +??? task "**`qc_check`: Check QC Metrics Against User-Defined Thresholds (optional)**" The `qc_check` task compares generated QC metrics against user-defined thresholds for each metric. This task will run if the user provides a `qc_check_table` .tsv file. If all QC metrics meet the threshold, the `qc_check` output variable will read `QC_PASS`. Otherwise, the output will read `QC_NA` if the task could not proceed or `QC_ALERT` followed by a string indicating what metric failed.