Skip to content

Commit

Permalink
add qc check documentation to theiacov (#701)
Browse files Browse the repository at this point in the history
  • Loading branch information
sage-wright authored Dec 31, 2024
1 parent ffbbe81 commit 36c2748
Show file tree
Hide file tree
Showing 4 changed files with 40 additions and 10 deletions.
6 changes: 6 additions & 0 deletions docs/assets/files/TheiaCoV_Illumina_PE_qc_check_template.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
taxon num_reads_raw1 num_reads_raw2 num_reads_clean1 num_reads_clean2 kraken_human kraken_human_dehosted meanbaseq_trim assembly_mean_coverage number_N number_Degenerate assembly_length_unambiguous_min assembly_length_unambiguous_max percent_reference_coverage vadr_num_alerts
sars-cov-2 100000 100000 100000 100000 20 20 30 100 5000 1 25000 30000 83 0
HIV 100000 100000 100000 100000 20 20 30 100
WNV 100000 100000 100000 100000 20 20 30 100
MPXV 100000 100000 100000 100000 20 20 30 100
flu 100000 100000 100000 100000 20 20 30 100
40 changes: 32 additions & 8 deletions docs/workflows/genomic_characterization/theiacov.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,15 +55,15 @@ Additionally, the **TheiaCoV_FASTA_Batch** workflow is available to process seve

### Supported Organisms

These workflows currently support the following organisms:
These workflows currently support the following organisms. The first option in the list (bolded) is what our workflows use as the _standardized_ organism name:

- **SARS-CoV-2** (`"sars-cov-2"`, `"SARS-CoV-2"`) - ==_default organism input_==
- **Monkeypox virus** (`"MPXV"`, `"mpox"`, `"monkeypox"`, `"Monkeypox virus"`, `"Mpox"`)
- **Human Immunodeficiency Virus** (`"HIV"`)
- **West Nile Virus** (`"WNV"`, `"wnv"`, `"West Nile virus"`)
- **Influenza** (`"flu"`, `"influenza"`, `"Flu"`, `"Influenza"`)
- **RSV-A** (`"rsv_a"`, `"rsv-a"`, `"RSV-A"`, `"RSV_A"`)
- **RSV-B** (`"rsv_b"`, `"rsv-b"`, `"RSV-B"`, `"RSV_B"`)
- **SARS-CoV-2** (**`"sars-cov-2"`**, `"SARS-CoV-2"`) - ==_default organism input_==
- **Monkeypox virus** (**`"MPXV"`**, `"mpox"`, `"monkeypox"`, `"Monkeypox virus"`, `"Mpox"`)
- **Human Immunodeficiency Virus** (**`"HIV"`**)
- **West Nile Virus** (**`"WNV"`**, `"wnv"`, `"West Nile virus"`)
- **Influenza** (**`"flu"`**, `"influenza"`, `"Flu"`, `"Influenza"`)
- **RSV-A** (**`"rsv_a"`**, `"rsv-a"`, `"RSV-A"`, `"RSV_A"`)
- **RSV-B** (**`"rsv_b"`**, `"rsv-b"`, `"RSV-B"`, `"RSV_B"`)

The compatibility of each workflow with each pathogen is shown below:

Expand Down Expand Up @@ -800,6 +800,30 @@ All input reads are processed through "core tasks" in the TheiaCoV Illumina, ONT
| Software Documentation | [NCBI Scrub](<https://github.com/ncbi/sra-human-scrubber/blob/master/README.md>)<br>[Artic pipeline](https://artic.readthedocs.io/en/latest/?badge=latest)<br>[Kraken2](https://github.com/DerrickWood/kraken2/wiki) |
| Original Publication(s) | [STAT: a fast, scalable, MinHash-based *k*-mer tool to assess Sequence Read Archive next-generation sequence submissions](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-021-02490-0)<br>[Improved metagenomic analysis with Kraken 2](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1891-0) |

??? task "`qc_check`: Check QC Metrics Against User-Defined Thresholds (optional)"

The `qc_check` task compares generated QC metrics against user-defined thresholds for each metric. This task will run if the user provides a `qc_check_table` TSV file. If all QC metrics meet the threshold, the `qc_check` output variable will read `QC_PASS`. Otherwise, the output will read `QC_NA` if the task could not proceed or `QC_ALERT` followed by a string indicating what metric failed.

The `qc_check` task applies quality thresholds according to the specified organism, which should match the _standardized_ `organism` input in the TheiaCoV workflows.

??? toggle "Formatting the _qc_check_table.tsv_"

- The first column of the qc_check_table lists the `organism` that the task will assess and the header of this column must be "**taxon**".
- Each subsequent column indicates a QC metric and lists a threshold for each organism that will be checked. **The column names must exactly match expected values, so we highly recommend copy and pasting the header from the template file below as a starting place.**

??? toggle "Template _qc_check_table.tsv_ files"
- TheiaCoV_Illumina_PE: [TheiaCoV_Illumina_PE_qc_check_template.tsv](../../assets/files/TheiaCoV_Illumina_PE_qc_check_template.tsv)

!!! warning "Example Purposes Only"
The QC threshold values shown in the file above are for example purposes only and should not be presumed to be sufficient for every dataset.

!!! techdetails "`qc_check` Technical Details"

| | Links |
| --- | --- |
| Task | [task_qc_check.wdl](https://github.com/theiagen/public_health_bioinformatiocs/blob/main/tasks/quality_control/comparisons/task_qc_check.wdl) |

#### Assembly tasks

!!! tip ""
Expand Down
2 changes: 1 addition & 1 deletion docs/workflows/genomic_characterization/theiaeuk.md
Original file line number Diff line number Diff line change
Expand Up @@ -407,7 +407,7 @@ All input reads are processed through "core tasks" in the TheiaEuk workflows. Th
| Software Documentation | https://busco.ezlab.org/ |
| Orginal publication | [BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs](https://academic.oup.com/bioinformatics/article/31/19/3210/211866) |

??? task "`QC_check`: Check QC Metrics Against User-Defined Thresholds (optional)"
??? task "`qc_check`: Check QC Metrics Against User-Defined Thresholds (optional)"

The `qc_check` task compares generated QC metrics against user-defined thresholds for each metric. This task will run if the user provides a `qc_check_table` .tsv file. If all QC metrics meet the threshold, the `qc_check` output variable will read `QC_PASS`. Otherwise, the output will read `QC_NA` if the task could not proceed or `QC_ALERT` followed by a string indicating what metric failed.

Expand Down
2 changes: 1 addition & 1 deletion docs/workflows/genomic_characterization/theiaprok.md
Original file line number Diff line number Diff line change
Expand Up @@ -1077,7 +1077,7 @@ All input reads are processed through "[core tasks](#core-tasks-performed-for-al
| Software Documentation | https://bitbucket.org/genomicepidemiology/plasmidfinder/src/master/ |
| Original Publication(s) | [In Silico Detection and Typing of Plasmids using PlasmidFinder and Plasmid Multilocus Sequence Typing](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4068535/) |

??? task "**`QC_check`: Check QC Metrics Against User-Defined Thresholds (optional)**"
??? task "**`qc_check`: Check QC Metrics Against User-Defined Thresholds (optional)**"

The `qc_check` task compares generated QC metrics against user-defined thresholds for each metric. This task will run if the user provides a `qc_check_table` .tsv file. If all QC metrics meet the threshold, the `qc_check` output variable will read `QC_PASS`. Otherwise, the output will read `QC_NA` if the task could not proceed or `QC_ALERT` followed by a string indicating what metric failed.

Expand Down

0 comments on commit 36c2748

Please sign in to comment.