Skip to content

Commit

Permalink
Merge branch 'main' into fc-dorado-workflow-standalone-dev
Browse files Browse the repository at this point in the history
  • Loading branch information
fraser-combe authored Jan 13, 2025
2 parents 0b4bcd3 + dd7d021 commit 03c3b91
Show file tree
Hide file tree
Showing 23 changed files with 412 additions and 167 deletions.
6 changes: 6 additions & 0 deletions docs/assets/files/TheiaCoV_Illumina_PE_qc_check_template.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
taxon num_reads_raw1 num_reads_raw2 num_reads_clean1 num_reads_clean2 kraken_human kraken_human_dehosted meanbaseq_trim assembly_mean_coverage number_N number_Degenerate assembly_length_unambiguous_min assembly_length_unambiguous_max percent_reference_coverage vadr_num_alerts
sars-cov-2 100000 100000 100000 100000 20 20 30 100 5000 1 25000 30000 83 0
HIV 100000 100000 100000 100000 20 20 30 100
WNV 100000 100000 100000 100000 20 20 30 100
MPXV 100000 100000 100000 100000 20 20 30 100
flu 100000 100000 100000 100000 20 20 30 100
2 changes: 2 additions & 0 deletions docs/stylesheets/extra.css
Original file line number Diff line number Diff line change
Expand Up @@ -173,6 +173,7 @@
table {
overflow-y: scroll;
max-height: 500px;
max-width: 100vw;
display: block;
}
th {
Expand All @@ -183,6 +184,7 @@ th {
}
td {
word-break: break-all;
overflow-wrap: anywhere;
}
/* Base styles for the search box */
div.searchable-table input.table-search-input {
Expand Down
139 changes: 78 additions & 61 deletions docs/workflows/genomic_characterization/theiacov.md

Large diffs are not rendered by default.

96 changes: 66 additions & 30 deletions docs/workflows/genomic_characterization/theiaeuk.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

| **Workflow Type** | **Applicable Kingdom** | **Last Known Changes** | **Command-line Compatibliity** | **Workflow Level** |
|---|---|---|---|---|
| [Genomic Characterization](../../workflows_overview/workflows_type.md/#genomic-characterization) | [Mycotics](../../workflows_overview/workflows_kingdom.md/#mycotics) | PHB v2.3.0 | Yes | Sample-level |
| [Genomic Characterization](../../workflows_overview/workflows_type.md/#genomic-characterization) | [Mycotics](../../workflows_overview/workflows_kingdom.md/#mycotics) | PHB vX.X.X | Yes | Sample-level |

## TheiaEuk Workflows

Expand Down Expand Up @@ -407,7 +407,7 @@ All input reads are processed through "core tasks" in the TheiaEuk workflows. Th
| Software Documentation | https://busco.ezlab.org/ |
| Orginal publication | [BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs](https://academic.oup.com/bioinformatics/article/31/19/3210/211866) |

??? task "`QC_check`: Check QC Metrics Against User-Defined Thresholds (optional)"
??? task "`qc_check`: Check QC Metrics Against User-Defined Thresholds (optional)"

The `qc_check` task compares generated QC metrics against user-defined thresholds for each metric. This task will run if the user provides a `qc_check_table` .tsv file. If all QC metrics meet the threshold, the `qc_check` output variable will read `QC_PASS`. Otherwise, the output will read `QC_NA` if the task could not proceed or `QC_ALERT` followed by a string indicating what metric failed.

Expand Down Expand Up @@ -598,64 +598,100 @@ All input reads are processed through "core tasks" in the TheiaEuk workflows. Th

| **Variable** | **Type** | **Description** |
|---|---|---|
| assembly_fasta | File | _De novo_ genome assembly in FASTA format |
| assembly_length | Int | Length of assembly (total number of nucleotides) as determined by QUAST |
| bbduk_docker| String | BBDuk docker image used |
| busco_database | String | BUSCO database used |
| busco_docker | String | BUSCO docker image used |
| busco_report | File | A plain text summary of the results in BUSCO notation |
| busco_results | String | BUSCO results (see above for explanation of BUSCO notation) |
| busco_version | String | BUSCO software version used |
| cg_pipeline_docker | String | Docker file used for running CG-Pipeline on cleaned reads |
| cg_pipeline_report | File | TSV file of read metrics from raw reads, including average read length, number of reads, and estimated genome coverage |
| est_coverage_clean | Float | Estimated coverage calculated from clean reads and genome length |
| est_coverage_raw | Float | Estimated coverage calculated from raw reads and genome length |
| cladetyper_annotated_reference | String | The annotated reference file for the identified clade, "None" if no clade was identified |
| cladetyper_clade | String | The clade assigned to the input assembly |
| cladetyper_docker_image | String | The Docker container used for the task |
| cladetyper_gambit_version | String | The version of GAMBIT used for the analysis |
| combined_mean_q_clean | Float | Mean quality score for the combined clean reads |
| combined_mean_q_raw | Float | Mean quality score for the combined raw reads |
| combined_mean_readlength_clean | Float | Mean read length for the combined clean reads |
| combined_mean_readlength_raw | Float | Mean read length for the combined raw reads |
| contigs_fastg | File | Assembly graph if megahit used for genome assembly |
| contigs_gfa | File | Assembly graph if spades used for genome assembly |
| contigs_lastgraph | File | Assembly graph if velvet used for genome assembly |
| est_coverage_clean | Float | Estimated coverage calculated from clean reads and genome length |
| est_coverage_raw | Float | Estimated coverage calculated from raw reads and genome length |
| fastp_html_report | File | The HTML report made with fastp |
| fastp_version | String | Version of fastp software used |
| fastq_scan_clean1_json | File | JSON file output from `fastq-scan` containing summary stats about clean forward read quality and length |
| fastq_scan_clean2_json | File | JSON file output from `fastq-scan` containing summary stats about clean reverse read quality and length |
fastq_scan_num_reads_clean_pairs | String | Number of read pairs after cleaning as calculated by fastq_scan |
| fastq_scan_num_reads_clean1 | Int | Number of forward reads after cleaning as calculated by fastq_scan |
| fastq_scan_num_reads_clean2 | Int | Number of reverse reads after cleaning as calculated by fastq_scan |
| fastq_scan_num_reads_raw_pairs | String | Number of input read pairs calculated by fastq_scan |
| fastq_scan_num_reads_raw1 | Int | Number of input forward reads calculated by fastq_scan |
| fastq_scan_num_reads_raw2 | Int | Number of input reverse reads calculated by fastq_scan |
| fastq_scan_num_reads_raw_pairs | String | Number of input read pairs calculated by fastq_scan |
| fastq_scan_raw1_json | File | JSON file output from `fastq-scan` containing summary stats about raw forward read quality and length |
| fastq_scan_raw2_json | File | JSON file output from `fastq-scan` containing summary stats about raw reverse read quality and length |
| r1_mean_q_clean | Float | Mean quality score of clean forward reads |
| r1_mean_q_raw | Float | Mean quality score of raw forward reads |
| r2_mean_q_clean | Float | Mean quality score of clean reverse reads |
| r2_mean_q_raw | Float | Mean quality score of raw reverse reads |
| fastq_scan_version | String | Version of fastq-scan software used |
| fastqc_clean1_html | File | Graphical visualization of clean forward read quality from fastqc to open in an internet browser |
| fastqc_clean2_html | File | Graphical visualization of clean reverse read quality from fastqc to open in an internet browser |
| fastqc_docker | String | Docker container used with fastqc |
| fastqc_num_reads_clean1 | Int | Number of forward reads after cleaning by fastqc |
| fastqc_num_reads_clean2 | Int | Number of reverse reads after cleaning by fastqc |
| fastqc_num_reads_clean_pairs | String | Number of read pairs after cleaning by fastqc |
| fastqc_num_reads_raw1 | Int | Number of input reverse reads by fastqc |
| fastqc_num_reads_raw2 | Int | Number of input reverse reads by fastqc |
| fastqc_num_reads_raw_pairs | String | Number of input read pairs by fastqc |
| fastqc_raw1_html | File | Graphical visualization of raw forward read quality from fastqc to open in an internet browser |
| fastqc_raw2_html | File | Graphical visualization of raw reverse read qualityfrom fastqc to open in an internet browser |
| fastqc_version | String | Version of fastqc software used |
| gambit_closest_genomes | File | CSV file listing genomes in the GAMBIT database that are most similar to the query assembly |
| gambit_db_version | String | Version of GAMBIT used |
| gambit_docker | String | GAMBIT docker file used |
| gambit_predicted_taxon | String | Taxon predicted by GAMBIT |
| gambit_predicted_taxon_rank | String | Taxon rank of GAMBIT taxon prediction |
| gambit_report | File | GAMBIT report in a machine-readable format |
| gambit_version | String | Version of GAMBIT software used |
| assembly_length | Int | Length of assembly (total contig length) as determined by QUAST |
| n50_value | Int | N50 of assembly calculated by QUAST |
| number_contigs | Int | Total number of contigs in assembly |
| qc_check | String | A string that indicates whether or not the sample passes a set of pre-determined and user-provided QC thresholds |
| qc_standard | File | The user-provided file that contains the QC thresholds used for the QC check |
| quast_gc_percent | Float | The GC percent of your sample |
| quast_report | File | TSV report from QUAST |
| quast_version | String | Software version of QUAST used |
| r1_mean_q_raw | Float | Mean quality score of raw forward reads |
| r1_mean_readlength_raw | Float | Mean read length of raw forward reads |
| r2_mean_q_raw | Float | Mean quality score of raw reverse reads |
| r2_mean_readlength_clean | Float | Mean read length of clean reverse reads |
| rasusa_version | String | Version of rasusa used |
| read1_subsampled | File | Subsampled read1 file |
| read2_subsampled | File | Subsampled read2 file |
| bbduk_docker | String | BBDuk docker image used |
| fastp_version | String | Version of fastp software used |
| read1_clean | File | Clean forward reads file |
| read1_subsampled | File | Subsampled read1 file |
| read2_clean | File | Clean reverse reads file |
| num_reads_clean_pairs | String | Number of read pairs after cleaning |
| num_reads_clean1 | Int | Number of forward reads after cleaning |
| num_reads_clean2 | Int | Number of reverse reads after cleaning |
| num_reads_raw_pairs | String | Number of input read pairs |
| num_reads_raw1 | Int | Number of input forward reads |
| num_reads_raw2 | Int | Number of input reverse reads |
| trimmomatic_version | String | Version of trimmomatic used |
| clean_read_screen | String | PASS or FAIL result from clean read screening; FAIL accompanied by the reason for failure |
| raw_read_screen | String | PASS or FAIL result from raw read screening; FAIL accompanied by thereason for failure |
| assembly_fasta | File | <https://github.com/tseemann/shovill#contigsfa> |
| contigs_fastg | File | Assembly graph if megahit used for genome assembly |
| contigs_gfa | File | Assembly graph if spades used for genome assembly |
| contigs_lastgraph | File | Assembly graph if velvet used for genome assembly |
| read2_subsampled | File | Subsampled read2 file |
| read_screen_clean | String | PASS or FAIL result from clean read screening; FAIL accompanied by the reason for failure | ONT, PE, SE |
| read_screen_raw | String | PASS or FAIL result from raw read screening; FAIL accompanied by thereason for failure |
| seq_platform | String | Sequencing platform input by the user |
| shovill_pe_version | String | Shovill version used |
| theiaeuk_snippy_variants_bam | File | BAM file produced by the snippy module |
| theiaeuk_illumina_pe_analysis_date | String | Date of TheiaEuk PE workflow execution |
| theiaeuk_illumina_pe_version | String | TheiaEuk PE workflow version used |
| theiaeuk_snippy_variants_bai | String | BAI file produced by the snippy module |
| theiaeuk_snippy_variants_bam | String | BAM file produced by the snippy module |
| theiaeuk_snippy_variants_coverage_tsv | String | TSV file containing coverage information for each base in the reference genome |
| theiaeuk_snippy_variants_gene_query_results | File | File containing all lines from variants file matching gene query terms |
| theiaeuk_snippy_variants_hits | String | String of all variant file entries matching gene query term |
| theiaeuk_snippy_variants_num_reads_aligned | String | Number of reads aligned by snippy |
| theiaeuk_snippy_variants_num_variants | Int | Number of variants detected by snippy |
| theiaeuk_snippy_variants_outdir_tarball | File | Tar compressed file containing full snippy output directory |
| theiaeuk_snippy_variants_percent_ref_coverage | String | Percent of reference genome covered by snippy |
| theiaeuk_snippy_variants_query | String | The gene query term(s) used to search variant |
| theiaeuk_snippy_variants_query_check | String | Were the gene query terms present in the refence annotated genome file |
| theiaeuk_snippy_variants_reference_genome | File | The reference genome used in the alignment and variant calling |
| theiaeuk_snippy_variants_results | File | The variants file produced by snippy |
| theiaeuk_snippy_variants_summary | File | A file summarizing the variants detected by snippy |
| theiaeuk_snippy_variants_version | String | The version of the snippy_variants module being used |
| seq_platform | String | Sequencing platform inout by the user |
| theiaeuk_illumina_pe_analysis_date | String | Date of TheiaProk workflow execution |
| theiaeuk_illumina_pe_version | String | TheiaProk workflow version used |
| trimmomatic_docker | String | Docker image used for trimmomatic |
| trimmomatic_version | String | Version of trimmomatic used |

</div>
Loading

0 comments on commit 03c3b91

Please sign in to comment.