Raw sequencing reads were subjected to quality control using Sunbeam (v4.3.7), which performed adapter trimming and host genome decontamination. Genome assembly was carried out using the following tools:
- SPAdes (v3.15.5) for isolate genomes.
- Anvi’o (v8) pipeline with MEGAHIT (v1.2.9) for 102 metagenomic samples.
Assembly quality was assessed using CheckM, applying the following criteria:
- ≥95% CheckM completeness
- ≤5% CheckM contamination Lineage classification as "Staphylococcus (UID301)"
To evaluate species-level contamination, Mash was used. Assemblies were filtered to retain only those between 2.55 Mb and 3.15 Mb in size. Of the 1,670 assembled genomes, 1,446 passed the quality control criteria and were included in downstream analyses.
Clonality was determined using a multi-step approach:
- Hierarchical Clustering: Genomes were stratified into distinct groups.
- Single-Linkage Clustering (SLC): Iterative SLC identified closely related genomes across SNP thresholds (minimum pairwise distance in the group to 500 SNPs).
- Phylogenetic Correction: Strain compositions were validated and corrected using reference-based maximum likelihood phylogenies. The compositions were expanded to include the smallest monophyletic clade with robust bootstrap support (≥70%).
- SNP Threshold Finilization: Final thresholds were determined where cluster composition and number plateaued.
This method ensured robust identification and validation of clonal relationships across the dataset.
These plots visualize the following metrics at each tested SNP threshold:
- Number of clones and singletons.
- Number of discrepant genomes observed when SNP-only strain composition was mapped to the phylogeny.
- Bootstrap support for the strain compositions.
This approach ensures a comprehensive view of clonality across varying thresholds.
This plot provides a temporal analysis of transmission clusters based on patient location and treatment team assignments:
- Patient Timelines: NICU section assignments are represented as colored rectangles. Admission dates are marked by triangles, and discharge dates by squares.
- Treatment Team Assignments: Shown as colored rectangles over time, corresponding to each patient’s timeline.
In both plots, sampling events are represented by dots: Red dots: Invasive isolates. Blue dots: Colonizing isolates.
These visualizations help illustrate the temporal and spatial dynamics within transmission clusters.
Note: The input data is not provided as it contains sensitive patient information that cannot be shared to ensure privacy and confidentiality.
This script generates a NICU floorplan visualization(1 static and 2 animated floorplans for each cluster) to analyze patient movements in transmission dynamics. The plot highlights how proximity in time and space drives transmission events within the NICU. This visualization provides insights into spatial dynamics and helps pinpoint areas requiring targeted interventions.
Note: The input data is not provided as it contains sensitive patient information that cannot be shared to ensure privacy and confidentiality.
This script generates visualizations to summarize transmission clusters and their characteristics:
- Red: Invasive clusters.
- Blue: Colonizing clusters.
- Blue: Colonization only.
- Red: Infection only.
- Yellow: Both colonization and infection.
- Individual patient IDs are shown as numbers.
- Environmental isolates are marked by grey diamonds.
Two bars below each cluster box indicate:
- The specific NICU section where the cluster was detected.
- The assigned treatment team at the time of detection.