VAST is for analyzing sequenced variants of the vlsE gene found in Borrelia species, as well as similar antigenic variation systems in other pathogens.
The vls (VMP-like sequence) antigenic variation system is composed of a vlsE expression locus, and a set of approximately 15 unexpressed, partial-length vls cassettes.
Unidirectional, segmental recombination of the unexpressed cassettes into the expression locus is responsible for the massive repertoire of vlsE proteins that protect Borrelia from clearance by adaptive immunity.
VAST was developed to quantify the behaviour of switching from NGS or Sanger sequencing of full-length or partial vlsE sequences. With large datasets in mind (<100 000 full-length variants), it is written entirely in Python with support for high-performance computing (HPC) environments.
- Import the reference (unswitched) vlsE sequence, and the sequence of the reference silent cassettes. Multiple reference vls systems can be imported to the database, so that switching in different Borrelia strains can be compared.
- Align the cassette sequences to the reference and to each other.
- Import the reads, categorizing bins of reads with sample labels (Eg. to distinguish replicates, time points, strains, etc.).
- Align reads to the reference and cassettes. Aligning the cassettes first means that variations in the read sequences will map to the reference in the same ways that those same variants in the silent cassettes map to the reference.
- Choose a type of analysis to do and how to group the data for the analysis.
VAST requires Python 3.4+ and the following packages:
- NumPy
- SciPy
- pandas
- matplotlib
- Biopython
- pysam is not strictly required, but if absent, no BAM-format outputs will be produced.
-
Clone the repository.
git clone https://github.com/verheytb/vast
-
Create a link to vast.py in a directory in your PATH so that it can be accessible from anywhere. (Linux only)
ln -s vast/vast.py /usr/local/bin/vast
-
Run VAST as follows:
vast
Documentation is found on the wiki.