21 Apr 15:26

cimendes

0e380f9

Latest

This minor release implements an enhancement including a new species-specific genomic characterization module for Vibrio spp

New task

Vibrio spp.

One new task has been implemented for Vibrio spp. genomic characterization: Vibrio Characterization through SRST2 and a custom database.
Information for this new task is included in the latest TheiaProk documentation.

What's Changed

Incorporate vibrio characterisation with srst2 into TheiaProk workflows by @cimendes in #216
Amendment to Vibrio subworkflow by @emmadoughty in #228
update version task to "PHBG v1.3.0" by @kapsakcj in #229
Updated task description by @emmadoughty in #230

Full Changelog: v1.2.0...v.1.3.0

Contributors

kapsakcj, cimendes, and emmadoughty

Assets 2

28 Mar 14:11

cimendes

v1.2.0

384c1a0

v1.2.0

This minor release implements several enhancements and improvements to the species-specific genomic characterization modules

New tasks

Staphylococcus aureus

Three new tasks have been implemented for S. aureus genomic characterization: spatyper,
staphopia-sccmec and agrvate.

Neisseria spp.

Neisseria gonorrhoeae: a new task, ngmaster, has been implemented.
Neisseria meningitidis: a new task, meningotype, has been implemented.

Mycobacterium tuberculosis

The tbprofiler task now has an additional set of outputs that can be accessed by
setting tbprofiler_additional_outputs option to true.

What's Changed

mlst: new String output "ts_mlst_allelic_profile" by @kapsakcj in #209
Add neisseria subwf: ngmaster and meningotype by @kapsakcj in #211
New output columns from TBProfiler Task by @cimendes in #217
Adds Staph aureus subwf by @kapsakcj in #213
Update README.md by @kevinlibuit in #220
Add comma by @sage-wright in #222
add missing tbprofiler optional outputs to export_taxon_tables inputs by @cimendes in #224
Update version to v1.2.0 by @sage-wright in #223

Full Changelog: v1.1.1...v1.2.0

Contributors

kapsakcj, cimendes, and 2 other contributors

Assets 2

31 Jan 19:44

sage-wright

v1.1.1

42659de

v1.1.1

This patch release implements several enhancements and improvements to the phylogenetic workflows

For the kSNP3, Mashtree, and Core_Gene_SNP workflows, several changes have been implemented.

A new task was created, reorder_matrix that performs the following:

Phylogenetic trees have been midpoint-rooted to improve appearance. Final trees from these workflows are now midpoint-rooted.
Previously, SNP matrices were not ordered. Now, they are ordered to match the order of terminal ends in the midpoint-rooted phylogenetic tree.
Phandango coloring is automatically applied to all column headers in matrices (:c1); these matrix files are .csv files for easy transfer/upload to Phandango.

A new task was created, summarize_data that performs the following:

Digests a comma-separated list of column names
Parses through those column contents
Outputs a .csv file that indicates presence (TRUE)/absence (empty cell) for each item in those columns.
A Boolean option phandango_coloring will color all items from the same column in the same format; rows are ordered according to the terminal ends of the midpoint-rooted tree for easy transfer/uplod to Phandango.

These two tasks have been added to all three phylogenetic workflows in the PHBG repository.

Other modifications

ShigEiFinder

A new optional task was created that allows ShigEiFinder to be run with read files as inputs instead of assemblies. 10 new output columns were created that are identical to the task that uses assemblies as input except they have the _reads suffix to differentiate between them. To use, set the new optional input variable call_shigeifinder_reads_input to true. This task is not run by default.

AMRFinderPlus

A typo has been corrected in the AMRFinderPlus task; the previous "pnemoniae" has now been corrected to "pneumoniae"

New TheiaProk QC

Several new columns are now being outputted that report the following:

r1, r2, and combined raw mean quality scores and read lengths
combined clean mean quality scores and read lengths (no individual r1 and r2 to avoid excessive column creep)

Clean mean quality scores and read lengths are now able to be checked in the qc_check task as well.

What's Changed

Update task_amrfinderplus.wdl by @kevinlibuit in #204
add optional task for shigeifinder w/ reads as input; update default docker for both shigeifinder & shigatyper by @kapsakcj in #202
Fja readlength dev by @frankambrosio3 in #201
reorder snp matrix by @sage-wright in #198

Full Changelog: v1.1.0...v1.1.1

Contributors

kapsakcj, kevinlibuit, and 2 other contributors

Assets 2

29 Dec 21:14

michellescribner

v1.1.0

870ae7f

v1.1.0

PHBG v1.1.0 Release Notes

This minor release introduces multiple modules to the TheiaProk workflow series as well as a new workflow for performing core gene phylogenetic analysis (Core_Gene_SNP).

Updates to the TheiaProk Workflow Series

Taxon-specific modules added:

Acinetobacter baumannii: Kaptive (detection of surface polysaccharide loci for A. baumannii) & AcinetobacterPlasmid Typing (plasmid typing of A. baumannii using abricate with the custom A. baumannii plasmid typing database)
Pseudomonas aeruginosa: Pasty (tool to identify the serogroup of P. aeruginosa isolates)
Shigella spp.: ShigaTyper (tool designed to determine Shigella serotype), ShigEiFinder (tool that is used to identify differentiate Shigella/EIEC using cluster-specific genes and identify the serotype using O-antigen/H-antigen genes), SonneiTyper (tool to identify input genomes as S. sonnei, assign those identified as S. sonnei to hierarchical genotypes based on detection of single nucleotide variants)
Streptococcus pneuomniae: GPS unified workflow (PopPUNK (tool for in silico Penicillin Binding Protein (PBP) typing), SeroBA (tool for S. pneumoniae serotyping), PBPTyper with Global Pneumococcal Sequencing (GPS) database v6 for GPS Cluster assignment

QC and read processing modules added:

Option to quantify secondary genus abundance using the MIDAS
Option to utilize fastp rather than trimmomatic for read processing
Option to utilize bakta rather than prokka for genome annotation
Option to perform a QC check--i.e. determine QC Pass or QC Alert based on user-defined thresholds for multiple QC metrics

Column output updates:

genome_length renamed to assembly_length
est_coverage renamed to est_coverage_raw (est_coverage_clean column output added)
- Note: Assembly length calculated by quast is used to calculate estimated coverage rather than the estimated genome length produced from the mash sketch

Core Gene SNP Workflow

The Core_Gene_SNP workflow is a flexible workflow intended for core gene alignment and phylogenetic analysis of a set of samples. The workflow takes in gene sequence data in GFF3 format from a set of samples. It first produces a pangenome summary using Pirate, which clusters genes within the sample set into orthologous gene families. By default, the workflow also instructs Pirate to produce both core genome and pangenome alignments.

The workflow subsequently triggers the generation of a SNP distance matrix and a phylogenetic tree using the core genome alignment via snp-dists and iqtree, respectively. Optionally, the workflow will also run this analysis using the pangenome alignment.

Other Modifications

AMRFinderPlus task modifications:
- Default docker image updated to v3.10.26 and output database version
- Drug class outputs brought to Terra data table
kSNP3 task/workflow modifications
- tree Newick file output extensions changed to .nwk
Gambit docker task modified to utilize GAMBIT v0.5.0
TS_MLST task modified to utilize MLST v2.23.0

New Documentation

Detailed documentation has been created for all workflows in the PHBG v1.1.0 repository.

What's Changed

amrfinderplus task updates by @kapsakcj in #137
Add Streptococcus pneumoniae subworkflow by @kapsakcj in #141
Adds subworkflow for A. baumannii, includes Kaptive task (K & O typing) by @erikwolfsohn in #138
Kleborate updates by @kapsakcj in #148
kSNP3 task edit: changed file suffix from .tree to .nwk by @kapsakcj in #146
Adds drug class output to TheiaProk by @michellescribner in #145
update gambit task to v0.5.0 docker image by @michellescribner in #151
Spneumo subworkflow enhancements: docker & GPS db version outputs and upgrade default pbptyper docker by @kapsakcj in #149
Add midas as optional TheiaProk task by @michellescribner in #159
Add option to hide point mutations from AMRFinderPlus output & update default amrfinderplus docker image by @michellescribner in #158
Fix gambit parsing for next_taxon_rank is None by @michellescribner in #161
add task for Abaum plasmid typing to TheiaProk_Illumina_PE and SE by @kapsakcj in #160
Add option to kSNP3 to create maximum likelihood and neighbor joining trees by @michellescribner in #166
update default mlst docker image to staphb/mlst:2.23.0 & fix CI env by @kapsakcj in #163
Modify midas parsing by @michellescribner in #172
Adds shigella subworkflow by @kapsakcj in #162
Adds bakta task by @michellescribner in #170
Add fastp task, modify read trimming parameters, and modify estimated coverage calculations by @michellescribner in #169
Fja tbprofiler update by @frankambrosio3 in #174
Add Core_Gene_SNP workflow by @michellescribner in #178
adds p. aeruginosa subworkflow and pasty for serogrouping by @jrotieno in #179
update pasty_docker default; add pasty_comment string output for PE and SE wfs by @kapsakcj in #181
Revert default read trimming parameters to v1.0 by @michellescribner in #184
Eld docs dev by @emmadoughty in #180
Fixed printf to convert sci notation to integers by @frankambrosio3 in #177
Add qc_check task to TheiaProk by @michellescribner in #182
Generate gene_presence_absence.csv with pirate task by @HNHalstead in #185
MLST novel alleles by @emmadoughty in #186
Export Taxon Table Fix and others by @sage-wright in #188
fix file extension awareness cg_pipeline by @michellescribner in #189

New Contributors

@jrotieno made their first contribution in #179
@emmadoughty made their first contribution in #180
@HNHalstead made their first contribution in #185

Full Changelog: v1.0.0...1.1.0

Contributors

kapsakcj, erikwolfsohn, and 6 other contributors

Assets 2

12 Aug 17:10

sage-wright

v1.0.0

488e95d

v1.0.0

PHBG v1.0.0 Release Notes

This major release introduces a stable and validated version of the TheiaProk workflow series.

This release also offers two new workflows (TheiaProk_Illumina_SE and RASUSA) and multiple organism-agnostic modules described in more detail below.

About TheiaProk

The TheiaProk workflows are for assembly and characterization of prokaryotic genomes, principally bacteria. All input reads go through steps in the core workflow for read trimming and assembly, quality assessment, species identification, and resistance gene identification. Sub-workflows further characterize some genomes, with activation of these processes dependent on the taxa identified.

Currently, TheiaProk has two forms: for Illumina paired-end sequencing data (TheiaProk_Illumina_PE), and for Illumina single-end sequencing data (TheiaProk_Illumina_SE). Future plans include development of workflows for alternative sequence data types, like Oxford Nanopore.

The following information describes the changes since the v0.6.0 version.

New modules to the TheiaProk workflows

The following modules are new additions to the core sample characterization performed on all organisms after genome assembly. While most of these are run by default, several modules can be enabled through the usage of a Boolean input parameter. More information about each tool can be found by clicking on the associated links.

Gene Typing
- PlasmidFinder - identifies plasmid replicon genes in total or partial sequenced isolates of bacteria (default)
- Prokka - annotates bacterial genomes quickly and produces standards-compliant output files (default)
- ResFinder - identifies acquired antimicrobial resistance genes in total or partial sequenced isolates of bacteria (optional; set call_resfinder to true to enable)
Quality Control
- BUSCO- “provide[s] a quantitative assessment of the completeness in terms of expected gene content” (default)
- Mummer ANI - calculates Average Nucleotide Identify (ANI) using MUMmer and an ANI calculation script from Lee Katz (optional; set call_ani to true to enable)

SKESA as default assembler

Through extensive validation and analysis, we have made the decision to switch our default parameter from SPAdes to SKESA. We have observed that the more conservative assemblies generated with SKESA led to greater concordance with known epidemiological relationships downstream while maintaining an ability to accurately characterize pathogen genomic data with respect to taxon prediction, serotyping, and AMR gene detection.

The SPAdes assembler can still be used through usage of an input variable (for TheiaProk_Illumina_PE, set shovill_pe.assembler to “spades”; for TheiaProk_Illumina_SE, set shovill_se.assembler to “spades”).

New workflows

TheiaProk_Illumina_SE - this workflow is equivalent to TheiaProk_Illumina_PE but is intended for Illumina single-end sequencing data; all modules are the same, except when appropriate, single-end-specific versions and parameters are used.
RASUSA - a workflow that will randomly subsample reads to a specified coverage using RASUSA.

Other changes

GitHub Actions for automated testing and continuous integration were added to the PHBG repository!
The export_taxon_tables task now can handle extra large fastq files.
Shovill parameters have been exposed so advanced users can select their own assemblers and customize assembly parameters to their heart’s content.
kSNP3 distance matrices were previously completely unordered. These SNP matrices are now more ordered than they were before. These semi-ordered SNP matrices appear most often when multiple outbreak groups are included in a kSNP3 analysis. Future releases will include the addition of fully-ordered SNP matrices.
The kSNP3 workflow now produces SNP distance matrices and phylogenetic trees generated using both pangenome and core genome analyses.

Log of PRs

Fja export tt highmem dev by @frankambrosio3 in #119
exposing shovill parameters by @sage-wright in #121
Order SNP-Dists Matrix by @kevinlibuit in #125
Add GitHub Actions to PHBG by @rpetit3 in #123
Add RASUSA Task and Workflow File by @kevinlibuit in #122
add TheiaProk_Illumina_SE workflow by @sage-wright in #124
typo fix by @frankambrosio3 in #128
adds ANI to theiaprok_illumina_pe wf by @kapsakcj in #126
Add BUSCO by @sage-wright in #127
Adds resfinder to theiaprok pe and se by @michellescribner in #130
Adds prokka, plasmidfinder, ksnp3 core, wf_pangenome by @michellescribner in #129
Change default assembler to skesa by @sage-wright in #132
remove compression from alignment files by @michellescribner in #134

Full Changelog: v0.6.0...v1.0.0

Follow Theiagen on Twitter!

Contributors

rpetit3, kapsakcj, and 4 other contributors

Assets 2

30 Jun 15:42

michellescribner

v0.6.0

cb0b9c2

v0.6.0

What's Changed

Gambit output parsing correction by @sage-wright in #80
Narrow TBProfiler to MTB only by @kevinlibuit in #91
Add genotyphi to TheiaProk_Illumina_PE workflow by @kapsakcj in #98
Remove fasta extension restriction by @kevinlibuit in #105
Adds legsta to TheiaProk_Illumina_PE by @michellescribner in #106
Legsta fix SBT output value for samples with no SBT predicted by @michellescribner in #110
Add disk_size attribute to kSNP3 task by @kevinlibuit in #100
Enclose Terra billing project and workspace arguments in double quotes by @michellescribner in #109

Full Changelog: v0.5.0...v0.6.0

Contributors

kapsakcj, kevinlibuit, and 2 other contributors

Assets 2

12 May 18:22

kapsakcj

v0.5.0

43f3c88

v0.5.0

What's Changed

Add 2 kraken2 workflows (Single End & Paired End) by @rpetit3 in #70
New NCBI-AMRFinderPlus workflow and integration in TheiaProk_Illumina_PE wf by @kapsakcj in #65
- TheiaProk_Illumina_PE workflow - replaced abricate task with NCBI-AMRFinderPlus for AMR gene detection
- Fixed integer math in read_screen task by @sage-wright , also in #65
mlst task updated to 2.22.0 (default docker image updated to staphb/mlst:2.22.0)
updated gambit_query workflow with updated task (gambit v0.4.0) by @kevinlibuit
export_taxon_tables feature now includes NCBI-AMRFinderPlus outputs

Full Changelog: v0.4.0...v0.5.0

Contributors

rpetit3, kapsakcj, and 2 other contributors

Assets 2

08 Apr 20:54

kevinlibuit

v0.4.0

53d351a

v0.4.0

This release adds MLST profiling to the TheiaProk_Illumina_PE workflow.

MLST profiling is performed using @tseemann's mlst workflow

Additional updates to TheiaProk_Illumina_PE:

Data screening task added to avoid workflow failures caused by low-quality input read data
QC metrics adjusted for WGS bacterial data
Capture of n50 from Quast report (Thanks, @erikwolfsohn!)
Exposure of minimum percent length and coverage parameters exposed in Abricate task
Replacing the Quast assembly length with the Mash estimated genome size for the cg-pipeline read coverage calculations
Allow for additional fields of metadata to be exported to taxon tables: collection_date, originating_lab, city, county, zip

Contributors

tseemann and erikwolfsohn

Assets 2

10 Mar 00:02

kevinlibuit

v0.3.0

a0b4c33

v0.3.0

This release renames the Apollo_Illumina_PE workflow to TheiaProk_Illumina_PE workflow restructures the PHBG task directory

The TheiaProk workflow was developed to replace Apollo workflows for bacterial genomic characterization. TheiaProk is based off of @rpetit3's Bactopia and its Merlin subworkflow and differs from the original Apollo workflows in its organism-typing subworkflow merlin_magic.

This subworkflow triggers organism typing based on gambit taxon assignments for each sample, e.g. serotyping via SeroTypeFinder will be performed for samples with an Escherichia gambit taxon assignment.

TheiaProk organism typing will be performed for the following organisms using the listed bioinformatics software:

Eschericia spp.: serotypefinder & ectyper
Listeria spp.: lissero
Salmonella spp.: sistr & seqsero2
Klebsiella spp.: kleborate
Mycobacterium spp.: tbprofiler

TheiaProk_Illumina_PE will also perform AMR gene detection using abricate against the NCBI AMRFinderPlus database.

Additionally, the PHBG directory structure was reformatted for ease of use and readability.

Contributors

rpetit3

Assets 2

27 Jul 15:08

kevinlibuit

v0.2

ec4d057

v0.2

Release to add the Kleborate and SerotypeFinder workflows

Available as tasks within the task_taxon_id.wdl file as well as stand-alone single-task workflows; both available on Terra via DockStore

Other Changes:

Version and analysis date captured for every workflow
Shovill task modified to include optional minimum contig length (default set to 200bp); this default setting is utilized in the Apollo_Illumina_PE workflow
White space inconsistencies addressed
Apollo_Illumina_PE output name changes:
- predicted_genus → gamibit_genus
- predicted_species → gamibit_species
Validation files directory created for local testing

Assets 2

Releases: theiagen/public_health_bacterial_genomics

Release notes for version 1.3.0

This minor release implements an enhancement including a new species-specific genomic characterization module for Vibrio spp

New task

Vibrio spp.

What's Changed

Contributors

v1.2.0

This minor release implements several enhancements and improvements to the species-specific genomic characterization modules

New tasks

Staphylococcus aureus

Neisseria spp.

Mycobacterium tuberculosis

What's Changed

Contributors

v1.1.1

This patch release implements several enhancements and improvements to the phylogenetic workflows

For the kSNP3, Mashtree, and Core_Gene_SNP workflows, several changes have been implemented.

Other modifications

ShigEiFinder

AMRFinderPlus

New TheiaProk QC

What's Changed

Contributors

v1.1.0

PHBG v1.1.0 Release Notes

Updates to the TheiaProk Workflow Series

Core Gene SNP Workflow

Other Modifications

New Documentation

What's Changed

New Contributors

Contributors

v1.0.0

PHBG v1.0.0 Release Notes

This major release introduces a stable and validated version of the TheiaProk workflow series.

About TheiaProk

New modules to the TheiaProk workflows

SKESA as default assembler

New workflows

Other changes

Log of PRs

Contributors

v0.6.0

What's Changed

Contributors

v0.5.0

What's Changed

Contributors

v0.4.0

Contributors

v0.3.0

Contributors

v0.2