v1.1.0
PHBG v1.1.0 Release Notes
This minor release introduces multiple modules to the TheiaProk workflow series as well as a new workflow for performing core gene phylogenetic analysis (Core_Gene_SNP).
Updates to the TheiaProk Workflow Series
Taxon-specific modules added:
- Acinetobacter baumannii: Kaptive (detection of surface polysaccharide loci for A. baumannii) & AcinetobacterPlasmid Typing (plasmid typing of A. baumannii using abricate with the custom A. baumannii plasmid typing database)
- Pseudomonas aeruginosa: Pasty (tool to identify the serogroup of P. aeruginosa isolates)
- Shigella spp.: ShigaTyper (tool designed to determine Shigella serotype), ShigEiFinder (tool that is used to identify differentiate Shigella/EIEC using cluster-specific genes and identify the serotype using O-antigen/H-antigen genes), SonneiTyper (tool to identify input genomes as S. sonnei, assign those identified as S. sonnei to hierarchical genotypes based on detection of single nucleotide variants)
- Streptococcus pneuomniae: GPS unified workflow (PopPUNK (tool for in silico Penicillin Binding Protein (PBP) typing), SeroBA (tool for S. pneumoniae serotyping), PBPTyper with Global Pneumococcal Sequencing (GPS) database v6 for GPS Cluster assignment
QC and read processing modules added:
- Option to quantify secondary genus abundance using the MIDAS
- Option to utilize fastp rather than trimmomatic for read processing
- Option to utilize bakta rather than prokka for genome annotation
- Option to perform a QC check--i.e. determine QC Pass or QC Alert based on user-defined thresholds for multiple QC metrics
Column output updates:
genome_length
renamed toassembly_length
est_coverage
renamed toest_coverage_raw
(est_coverage_clean
column output added)
Core Gene SNP Workflow
The Core_Gene_SNP workflow is a flexible workflow intended for core gene alignment and phylogenetic analysis of a set of samples. The workflow takes in gene sequence data in GFF3 format from a set of samples. It first produces a pangenome summary using Pirate, which clusters genes within the sample set into orthologous gene families. By default, the workflow also instructs Pirate to produce both core genome and pangenome alignments.
The workflow subsequently triggers the generation of a SNP distance matrix and a phylogenetic tree using the core genome alignment via snp-dists and iqtree, respectively. Optionally, the workflow will also run this analysis using the pangenome alignment.
Other Modifications
- AMRFinderPlus task modifications:
- Default docker image updated to v3.10.26 and output
database version
- Drug class outputs brought to Terra data table
- Default docker image updated to v3.10.26 and output
- kSNP3 task/workflow modifications
- tree Newick file output extensions changed to
.nwk
- tree Newick file output extensions changed to
- Gambit docker task modified to utilize GAMBIT v0.5.0
- TS_MLST task modified to utilize MLST v2.23.0
New Documentation
Detailed documentation has been created for all workflows in the PHBG v1.1.0 repository.
What's Changed
- amrfinderplus task updates by @kapsakcj in #137
- Add Streptococcus pneumoniae subworkflow by @kapsakcj in #141
- Adds subworkflow for A. baumannii, includes Kaptive task (K & O typing) by @erikwolfsohn in #138
- Kleborate updates by @kapsakcj in #148
- kSNP3 task edit: changed file suffix from .tree to .nwk by @kapsakcj in #146
- Adds drug class output to TheiaProk by @michellescribner in #145
- update gambit task to v0.5.0 docker image by @michellescribner in #151
- Spneumo subworkflow enhancements: docker & GPS db version outputs and upgrade default pbptyper docker by @kapsakcj in #149
- Add midas as optional TheiaProk task by @michellescribner in #159
- Add option to hide point mutations from AMRFinderPlus output & update default amrfinderplus docker image by @michellescribner in #158
- Fix gambit parsing for next_taxon_rank is None by @michellescribner in #161
- add task for Abaum plasmid typing to TheiaProk_Illumina_PE and SE by @kapsakcj in #160
- Add option to kSNP3 to create maximum likelihood and neighbor joining trees by @michellescribner in #166
- update default mlst docker image to staphb/mlst:2.23.0 & fix CI env by @kapsakcj in #163
- Modify midas parsing by @michellescribner in #172
- Adds shigella subworkflow by @kapsakcj in #162
- Adds bakta task by @michellescribner in #170
- Add fastp task, modify read trimming parameters, and modify estimated coverage calculations by @michellescribner in #169
- Fja tbprofiler update by @frankambrosio3 in #174
- Add Core_Gene_SNP workflow by @michellescribner in #178
- adds p. aeruginosa subworkflow and pasty for serogrouping by @jrotieno in #179
- update pasty_docker default; add
pasty_comment
string output for PE and SE wfs by @kapsakcj in #181 - Revert default read trimming parameters to v1.0 by @michellescribner in #184
- Eld docs dev by @emmadoughty in #180
- Fixed printf to convert sci notation to integers by @frankambrosio3 in #177
- Add qc_check task to TheiaProk by @michellescribner in #182
- Generate gene_presence_absence.csv with pirate task by @HNHalstead in #185
- MLST novel alleles by @emmadoughty in #186
- Export Taxon Table Fix and others by @sage-wright in #188
- fix file extension awareness cg_pipeline by @michellescribner in #189
New Contributors
- @jrotieno made their first contribution in #179
- @emmadoughty made their first contribution in #180
- @HNHalstead made their first contribution in #185
Full Changelog: v1.0.0...1.1.0