Skip to content

v1.1.0

Compare
Choose a tag to compare
@michellescribner michellescribner released this 29 Dec 21:14
· 36 commits to main since this release
870ae7f

PHBG v1.1.0 Release Notes

This minor release introduces multiple modules to the TheiaProk workflow series as well as a new workflow for performing core gene phylogenetic analysis (Core_Gene_SNP).

Updates to the TheiaProk Workflow Series

Taxon-specific modules added:

  • Acinetobacter baumannii: Kaptive (detection of surface polysaccharide loci for A. baumannii) & AcinetobacterPlasmid Typing (plasmid typing of A. baumannii using abricate with the custom A. baumannii plasmid typing database)
  • Pseudomonas aeruginosa: Pasty (tool to identify the serogroup of P. aeruginosa isolates)
  • Shigella spp.: ShigaTyper (tool designed to determine Shigella serotype), ShigEiFinder (tool that is used to identify differentiate Shigella/EIEC using cluster-specific genes and identify the serotype using O-antigen/H-antigen genes), SonneiTyper (tool to identify input genomes as S. sonnei, assign those identified as S. sonnei to hierarchical genotypes based on detection of single nucleotide variants)
  • Streptococcus pneuomniae: GPS unified workflow (PopPUNK (tool for in silico Penicillin Binding Protein (PBP) typing), SeroBA (tool for S. pneumoniae serotyping), PBPTyper with Global Pneumococcal Sequencing (GPS) database v6 for GPS Cluster assignment

QC and read processing modules added:

  • Option to quantify secondary genus abundance using the MIDAS
  • Option to utilize fastp rather than trimmomatic for read processing
  • Option to utilize bakta rather than prokka for genome annotation
  • Option to perform a QC check--i.e. determine QC Pass or QC Alert based on user-defined thresholds for multiple QC metrics

Column output updates:

  • genome_length renamed to assembly_length
  • est_coverage renamed to est_coverage_raw (est_coverage_clean column output added)
    • Note: Assembly length calculated by quast is used to calculate estimated coverage rather than the estimated genome length produced from the mash sketch

Core Gene SNP Workflow

The Core_Gene_SNP workflow is a flexible workflow intended for core gene alignment and phylogenetic analysis of a set of samples. The workflow takes in gene sequence data in GFF3 format from a set of samples. It first produces a pangenome summary using Pirate, which clusters genes within the sample set into orthologous gene families. By default, the workflow also instructs Pirate to produce both core genome and pangenome alignments.

The workflow subsequently triggers the generation of a SNP distance matrix and a phylogenetic tree using the core genome alignment via snp-dists and iqtree, respectively. Optionally, the workflow will also run this analysis using the pangenome alignment.

Other Modifications

  • AMRFinderPlus task modifications:
    • Default docker image updated to v3.10.26 and output database version
    • Drug class outputs brought to Terra data table
  • kSNP3 task/workflow modifications
    • tree Newick file output extensions changed to .nwk
  • Gambit docker task modified to utilize GAMBIT v0.5.0
  • TS_MLST task modified to utilize MLST v2.23.0

New Documentation

Detailed documentation has been created for all workflows in the PHBG v1.1.0 repository.


What's Changed

New Contributors

Full Changelog: v1.0.0...1.1.0