Skip to content
Adam Price edited this page Nov 9, 2016 · 12 revisions

Simulome

October 21, 2016
Version: 1.0.1
Author: Adam Price
Maintainer: Adam Price
Contact: [email protected]
Copyright: Adam Price, 2016
License: MIT

Simulome provides a powerful and easy to use tool for creating pseudo-genomes and mutated variants for prokaryotes. Simulome makes it possible to create genomes based on any bacterial species, while controlling for a variety of factors. Furthermore, it provides a range of options that can be used in combination to create mutated variants of the simulated genome, which allows for controlled testing of specific genomic conditions. Simulome can be used in combination with reads generated from next-generation sequencing platforms or alternatively with NGS read simulation packages.

Simulome takes an existing genome and the corresponding annotation information for that genome and samples a subset of the genes to use as a simulated genome. Sampling is performed based read length and genes are selected to approximate a normal distribution of read lengths. An initial simulation is created by using these sampled genes in conjunction with non-duplicating intergenic regions, whose properties can be specified by the user. Once the initial genome is simulated, a variant genome can be simulated to meet desired specifications. Three run modes are available and can be used in any combination to produce variants of the simulated genome containing SNPs, indels, and/or duplicate regions. Additional optional arguments are available to allow direct control over selection criteria and genomic structure. The resulting simulations will each be provided as a FASTA nucleotide file and a GTF/GFF3 annotation file.

Simulome can be used in combination with read simulators such as ART to create completely controlled simulations.

For example, how SNPs influence read alignment of various lengths can be simulated as shown in the above plot. The above data was simulated using Simulome and Art, and shows how read alignment performs for a correct (Native) or mutated (Heterologous) genome.

Dependencies

Simulome was developed in a linux/unix environment and requires the following libraries for proper functionality.
• Python 2.7.2
• Biopython 1.6.1+
• BLAST 2.3.0+

Usage:
python simulome.py -f <genome.fasta> -a <genome.gff> -o <destination> <RUN MODE> <OPTIONAL ARGUMENTS>

For detailed usage instructions, please see the Simulome manual.

Examples

  • Simulate a genome based on e.coli containing 100 genes, output files to a folder called ecoli_simulation/.

    python simulome.py -f ecoli_genome.fasta -a ecoli_anno.gtf -o ecoli_simulation -g 100

  • Simulate a genome based on e.coli containing 500 genes, and a variant of the simulated genome in which each gene contains 10 SNPs, output to a folder called ecoli_simulation/.

    python simulome.py -f ecoli_genome.fasta -a ecoli_anno.gtf -o ecoli_simulation -g 500 --snp TRUE –s 10

  • Simulate a genome based on e.coli containing 500 genes, and a variant of the simulated genome in which each gene contains 10 SNPs that are concentrated in 50 base pair windows, output to a folder called ecoli_simulation/.

    python simulome.py -f ecoli_genome.fasta -a ecoli_anno.gtf -o ecoli_simulation -g 500 --snp TRUE –s 10 -w 50

  • Simulate a genome based on e.coli containing 100 genes, and a variant of the simulated genome in which each gene contains an insertion event of length 100, output files to a folder called ecoli_simulation/.

    python simulome.py -f ecoli_genome.fasta -a ecoli_anno.gtf -o ecoli_simulation -g 100 --indel 1 –n 100

  • Simulate a genome based on e.coli containing 100 genes, and a variant of the simulated genome in which each gene contains an insertion event of length 100, and two deletion events of length 25, output files to a folder called ecoli_simulation/.

    python simulome.py -f ecoli_genome.fasta -a ecoli_anno.gtf -o ecoli_simulation -g 100 --indel 3 –n 100 –m 25 –d 2

  • Simulate a genome based on e.coli containing 100 genes, and a variant in which 10% of the genome is duplicated, output files to a folder called ecoli_simulation/.

    python simulome.py -f ecoli_genome.fasta -a ecoli_anno.gtf -o ecoli_simulation -g 100 --duplicate TRUE –c 10

  • Simulate a genome based on e.coli containing 100 genes, with a variant genome in which each gene contains 5 SNPs, an insertion of length 500, a deletion of length 100, 10% genome duplication, and random intergenic region lengths. Output files to a folder called ecoli_simulation/.

    python simulome.py -f ecoli_genome.fasta -a ecoli_anno.gtf -o ecoli_simulation -g 100 --snp TRUE –s 5 --indel 3 –n 500 –m 100 --duplicate TRUE –c 10

Clone this wiki locally