PHBG v1.1.0 Release Notes

This minor release introduces multiple modules to the TheiaProk workflow series as well as a new workflow for performing core gene phylogenetic analysis (Core_Gene_SNP).

Updates to the TheiaProk Workflow Series

Taxon-specific modules added:

Acinetobacter baumannii: Kaptive (detection of surface polysaccharide loci for A. baumannii) & AcinetobacterPlasmid Typing (plasmid typing of A. baumannii using abricate with the custom A. baumannii plasmid typing database)
Pseudomonas aeruginosa: Pasty (tool to identify the serogroup of P. aeruginosa isolates)
Shigella spp.: ShigaTyper (tool designed to determine Shigella serotype), ShigEiFinder (tool that is used to identify differentiate Shigella/EIEC using cluster-specific genes and identify the serotype using O-antigen/H-antigen genes), SonneiTyper (tool to identify input genomes as S. sonnei, assign those identified as S. sonnei to hierarchical genotypes based on detection of single nucleotide variants)
Streptococcus pneuomniae: GPS unified workflow (PopPUNK (tool for in silico Penicillin Binding Protein (PBP) typing), SeroBA (tool for S. pneumoniae serotyping), PBPTyper with Global Pneumococcal Sequencing (GPS) database v6 for GPS Cluster assignment

QC and read processing modules added:

Option to quantify secondary genus abundance using the MIDAS
Option to utilize fastp rather than trimmomatic for read processing
Option to utilize bakta rather than prokka for genome annotation
Option to perform a QC check--i.e. determine QC Pass or QC Alert based on user-defined thresholds for multiple QC metrics

Column output updates:

genome_length renamed to assembly_length
est_coverage renamed to est_coverage_raw (est_coverage_clean column output added)
- Note: Assembly length calculated by quast is used to calculate estimated coverage rather than the estimated genome length produced from the mash sketch

Core Gene SNP Workflow

The Core_Gene_SNP workflow is a flexible workflow intended for core gene alignment and phylogenetic analysis of a set of samples. The workflow takes in gene sequence data in GFF3 format from a set of samples. It first produces a pangenome summary using Pirate, which clusters genes within the sample set into orthologous gene families. By default, the workflow also instructs Pirate to produce both core genome and pangenome alignments.

The workflow subsequently triggers the generation of a SNP distance matrix and a phylogenetic tree using the core genome alignment via snp-dists and iqtree, respectively. Optionally, the workflow will also run this analysis using the pangenome alignment.

Other Modifications

AMRFinderPlus task modifications:
- Default docker image updated to v3.10.26 and output database version
- Drug class outputs brought to Terra data table
kSNP3 task/workflow modifications
- tree Newick file output extensions changed to .nwk
Gambit docker task modified to utilize GAMBIT v0.5.0
TS_MLST task modified to utilize MLST v2.23.0

New Documentation

Detailed documentation has been created for all workflows in the PHBG v1.1.0 repository.

What's Changed

amrfinderplus task updates by @kapsakcj in #137
Add Streptococcus pneumoniae subworkflow by @kapsakcj in #141
Adds subworkflow for A. baumannii, includes Kaptive task (K & O typing) by @erikwolfsohn in #138
Kleborate updates by @kapsakcj in #148
kSNP3 task edit: changed file suffix from .tree to .nwk by @kapsakcj in #146
Adds drug class output to TheiaProk by @michellescribner in #145
update gambit task to v0.5.0 docker image by @michellescribner in #151
Spneumo subworkflow enhancements: docker & GPS db version outputs and upgrade default pbptyper docker by @kapsakcj in #149
Add midas as optional TheiaProk task by @michellescribner in #159
Add option to hide point mutations from AMRFinderPlus output & update default amrfinderplus docker image by @michellescribner in #158
Fix gambit parsing for next_taxon_rank is None by @michellescribner in #161
add task for Abaum plasmid typing to TheiaProk_Illumina_PE and SE by @kapsakcj in #160
Add option to kSNP3 to create maximum likelihood and neighbor joining trees by @michellescribner in #166
update default mlst docker image to staphb/mlst:2.23.0 & fix CI env by @kapsakcj in #163
Modify midas parsing by @michellescribner in #172
Adds shigella subworkflow by @kapsakcj in #162
Adds bakta task by @michellescribner in #170
Add fastp task, modify read trimming parameters, and modify estimated coverage calculations by @michellescribner in #169
Fja tbprofiler update by @frankambrosio3 in #174
Add Core_Gene_SNP workflow by @michellescribner in #178
adds p. aeruginosa subworkflow and pasty for serogrouping by @jrotieno in #179
update pasty_docker default; add pasty_comment string output for PE and SE wfs by @kapsakcj in #181
Revert default read trimming parameters to v1.0 by @michellescribner in #184
Eld docs dev by @emmadoughty in #180
Fixed printf to convert sci notation to integers by @frankambrosio3 in #177
Add qc_check task to TheiaProk by @michellescribner in #182
Generate gene_presence_absence.csv with pirate task by @HNHalstead in #185
MLST novel alleles by @emmadoughty in #186
Export Taxon Table Fix and others by @sage-wright in #188
fix file extension awareness cg_pipeline by @michellescribner in #189

New Contributors

@jrotieno made their first contribution in #179
@emmadoughty made their first contribution in #180
@HNHalstead made their first contribution in #185

Full Changelog: v1.0.0...1.1.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.1.0