In silico PCR command line tool for typing Escherichia coli ST131
ST131Typer
is a simple Bash shell script that utilizes the SeqKit function seqkit amplicon
to detect Escherichia coli sequence type (ST) 131 and characterize its key subclones from preassembled partial or complete genome assemblies.
This tool is the in silico version of the in vitro multiplex PCR assays developed by Johnston et al. in prep. The lab assay allows resolution of 15 distinctive molecular subsets within ST131, including 3 within ST131 clade A (i.e. the H41 subclone), 5 within clade B (i.e. the H22 subclone), and 7 within clade C (i.e. the H30 subclone), including subclones C0 (i.e. H30S: 2 subsets), C1 and C1-M27 (i.e. H30R1: 2 subsets), and C2 (i.e. H30Rx: 3 subsets).
Specifically, this tool uses primer sequences to target distinctive nucleotide polymorphisms or sequences in:
- mdh36
- gyrB47
- trpA72
- sbmA
- plsB
- nupC
- rmuC
- kefC
- ybbW
- O16 and O25b rfb variants
- five key fimH alleles (fimH22, fimH27, fimH30, fimH35, and fimH41)
- two fliC alleles (H4 and H5)
- (subclone-specific) fluoroquinolone resistance-associated parC allele
- (subclone-specific) prophage marker
If the target polymorphism or sequence is found in a WGS assembly, the resulting target sequence (the sequence that lies between and includes the primer pair), or "amplicon", is output.
ST131Typer is only as accurate as the quality of the input sequences. Less complete assemblies increase the likelihood that a target sequence is fragmented or incomplete and therefore not identified by ST131Typer. Be aware that ST131Typer will not consider a target present if:
- The target spans two assembly contigs/scaffolds (i.e. the forward and reverse primers are on different contigs/scaffolds).
- Only one primer of a pair is identified.
- The primer sequences do not match 100% with the target.
BLAST-based tools, such as those on the Center for Genomic Epidemiology (CGE) website, utilize curated databases of complete allele sequences and do not require a 100% match between the input assembly and the allele sequences. Since these tools are simply reporting the top BLAST hit(s), they are less affected by incomplete genome assemblies or allelic variation in primer regions than ST131Typer
. The benefit of using ST131Typer
over publicly available BLAST-based tools is that it provides typing resolution beyond just serotype and fimH type.
- SeqKit (>= v0.14.0)
- Go to SeqKit's Installation Page for install options.
This will install the latest version directly from GitHub:
git clone https://github.com/JohnsonSingerLab/ST131Typer.git
Change permissions to make ST131Typer.sh
executable:
cd ST131Typer
chmod +x ST131Typer.sh
Ensure the desired ST131Typer version is installed:
./ST131Typer.sh -v
Check that the dependency, SeqKit, is installed in your path:
./ST131Typer.sh -c
- assembly in FASTA format or directory containing multiple assemblies in FASTA format; assemblies can be in multiple contigs
- directory to output the results into
% ./ST131Typer.sh
Usage: ST131Typer.sh [OPTIONS] -i [FASTA or DIR] -o [DIR]
-h print this message
-v print the version
-c check SeqKit is in path
-i fasta contigs file or directory containing multiple files
-o output directory
-r prints citation
Single FASTA file:
% ./ST131Typer.sh -i data/assemblies/BS448.fasta -o example_output
Directory containing multiple FASTA files:
% ./ST131Typer.sh -i data/assemblies -o example_output
Assembly FASTA files of strains representing 14 of the 15 ST131 PCR profile types described in Johnston et al. in prep were downloaded from EnteroBase's Escherichia/Shigella Database and can be found in the data
/assemblies
directory.
PCR Type | Representative Strain | Clade | O:H Type | fimH Allele | Description |
---|---|---|---|---|---|
1 | MVAST020 | A | O16:H5 | H41 | H41 / A (O16:H4 variant) |
2 | BS488 | A | O25b:H4 | H41 | H41 / A (O25b:H4 variant) |
3 | M670745 | A | O25b:H5 | H41 | H41 / A (O25b:H5 variant) |
4 | H17 | B0 | O25b:H4 | H27 | H27 / B0 |
5 | JJ1897 | B0 | O25b:H4 | H22 | H22 / B0 |
6 | JJ1969 | B1 | O25b:H4 | H22 | H22 / B1 |
7 | ZH071 | B1 | O25b:H4 | H94 | H94 (non-H22) / B1 |
8 | G199 | B1 | O25b:H4 | H30 | H30 / B1 |
9 | CD306 | C0 | O25b:H4 | H30 | H30S / C0 |
10 | BS448 | C0 | O25b:H4 | H30 | H30S / C0 (parC E84V and FQ-R) |
11 | JJ2193 | C1 | O25b:H4 | H30 | H30R1 non-C1-M27 / C1 |
12 | U024 | C1-M27 | O25b:H4 | H30 | H30R1 C1-M27 / C1-M27 |
13 | JJ1886 | C2 | O25b:H4 | H30 | H30Rx / C2 |
14 | U004 | C2 | O25b:H4 | H35 | H30Rx (fimH35) / C2 |
15 | Not Available | C2 | ONT:H4 | H30 | H30Rx / C2 (non-O-typable) |
Within the user-defined output directory, there will be two items:
summary.txt
: A simple tab-separated summary of the resultsseqkit_outputs
: A directory containing the output(s) ofseqkit amplicon
in BED format, one file for each assembly file input.
The summary.txt
file is a tab-separated results summary with one line for each assembly file input.
Column | Description | Possible Values |
---|---|---|
Sample | Name of assembly file | |
PCR_Profile_Type | Numerical ST131 PCR profile type from Johnston et al. in prep | 1-15, "Unknown", or "non-ST131" |
Clade | ST131 clade designation | A, B0, B1, C0, C1, C1-M27, or C2 |
O_type | O antigen | O16 or O25b |
H_type | H antigen | H4 or H5 |
fimH | fimH allele | fimH22, fimH27, fimH30, fimH35, or fimH41 |
Description | Description of the ST131 subclone/clade associated with the ST131 PCR profile type | |
mdh36 | Length (bp) of mdh36 target identified | ~275 * |
gyrB47 | Length (bp) of gyrB47 target identified | ~138 * |
trpA72 | Length (bp) of trpA72 target identified | ~487 * |
rfb_O16 | Length (bp) of rfb O16 allele target identified | ~732 * |
rfb_O25b | Length (bp) of rfb O25b allele target identified | ~557 * |
fliC_H4 | Length (bp) of fliC H4 allele target identified | ~199 * |
fliC_H5 | Length (bp) of fliC H4 allele target identified | ~614 * |
fimH22 | Length (bp) of fimH22 allele complex target identified | ~279 * |
fimH27 | Length (bp) of fimH27 allele complex target identified | ~405 * |
fimH30 | Length (bp) of fimH30 allele complex target identified | ~350 * |
fimH35 | Length (bp) of fimH35 allele complex target identified | ~500 * |
fimH41 | Length (bp) of fimH41 allele complex target identified | ~92 * |
plsB | Length (bp) of plsB target identified | ~628 * |
nupC | Length (bp) of nupC target identified | ~498 * |
kefC | Length (bp) of kefC target identified | ~238 * |
rmuC | Length (bp) of rmuC target identified | ~353 * |
prophage | Length (bp) of the prophage marker target identified | ~822 * |
sbmA | Length (bp) of sbmA target identified | ~64 * |
ybbW | Length (bp) of ybbW target identified | ~194 * |
parC_E84V | Length (bp) of parC E84V target identified | ~107 * |
* Within the target columns, there are five possible entries:
Character | Meaning |
---|---|
[NUM] | Length (bp) of target (sequence between and including the primer pair) |
*[NUM]* | Length (bp) of target falls outside +/- 5% of the expected length |
NA | Not applicable to the typing of the sample (i.e. non-ST131 strain; clade B- and C-associated primer pairs are not relevant for typing of clade A isolates) |
NF | Not found; primer pair was not identified or was not identified on the same contig |
NT | Non-typable; O-type, H-type, fimH-type or clade could not be determined. Could be due to missing target or the presence of multiple conflicting targets. |
./ST131Typer.sh -r
If you use ST131Typer in your work, please cite:
Johnston, BD, Gordon, DH, Burn, S, Johnson, TJ, Weber, BP, Miller, EA, and Johnson, JR. Novel Multiplex PCR Method for High-Resolution Subclonal Assignment and Characterization of Escherichia coli Sequence Type 131 Isolates.
🔗