3.7. Outputting the Simulation Results

Getting the Simulation Results: `<samplingSchedule>`

The sampling schedule lists one or more samplers, which each extract a particular type of information from the simulation, and dumps it to a file.

An example of a sampling schedule which defines three samplers:

<samplingSchedule>

    <!-- Sample 100 sequences from the population, every 1000 generations,
         in NEXUS format. -->
    <sampler>
        <atFrequency>1000</atFrequency>
        <fileName>alignment_%r.nex</fileName>
        <alignment>
            <sampleSize>100</sampleSize>
            <format>NEXUS</format>
            <label>seq_%g_%s</label>
        </alignment>
    </sampler>

    <!-- Sample the frequency of all 20 states at two amino acid sites,
         every 1000 generations. -->
    <sampler>
        <atFrequency>1000</atFrequency>
        <fileName>aa_frequencies.csv</fileName>
        <alleleFrequency>
	     <feature>protein AB</feature>
             <sites>1,2</sites>
        </alleleFrequency>
    </sampler>

    <!-- Sample population statistics, every generation -->
    <sampler>
        <atFrequency>1</atFrequency>
        <fileName>stats.csv</fileName>
        <statistics />
    </sampler>
</samplingSchedule>

The following properties are common for every sampler:

<atFrequency> : define the sampler to run at a certain frequency, every so many generations
<atGeneration> : define the sampler to run once at a specific generation
<fileName> : dumps its result in a given filename. The special string '%r' will be replaced with the current replicate, to avoid that each replicate writes in the same value, each time erasing the results of the previous run.

Sampling alignments: `<alignment>`

The alignment sampler samples whole genome alignments from the population at a given generation. It has the following properties:

<sampleSize> : the amount of genomes to be sampled
<format> : the format in which the alignments need to be stored:
- NEXUS: NEXUS format
- FASTA: FASTA format
- XML: a custom XML format
<label> : the label for each sequence. The special strings '%r', '%g', and '%s' are substituted with respectively the replicate index, the generation number, and the index of the sequence within the sample.
<consensus> (true or false): whether a consensus sequence should be synthesized and stored, rather than the full alignment (false by default).

Sampling statistics: `<statistics>`

The statistics sampler dumps some common population genetic statistics:

mean_diversity: mean nucleotide sequence diversity (estimated from a sample of 10 random sequences)
max_diversity: maximum nucleotide sequence diversity (estimated from a sample of 10 random sequences)
min_fitness: fitness of individual with lowest fitness
mean_fitness: mean fitness of population
max_fitness: fitness of individual with highest fitness
max_frequency: frequency of most common genome in the population
mean_distance: mean sequence distance of population from initial population (ignoring mutation saturation, thus an overestimate)

The statistics sampler does not require any configuration.

Sampling allele frequencies: `<alleleFrequency>`

This sampler will output the frequency of each possible state at each given site in a nucleotide or amino acid feature.

By default, the feature is the genome feature (nucleotides of the entire genome), and the sites are all sites in the feature.

This may be overridden by:

<feature> : the name of one of the defined features in the** <genome> description. If omitted, genome is assumed.
<sites> : A comma separated list of single sites or site ranges within the feature. Note that if the feature is an amino acid feature, this refers to amino acid sites, while if the feature is a nucleotide feature, this refers to nucleotide sites.

Sampling genealogies: `<tree>`

In addition to sampling sequences, the genealogies that gave rise to those samples may also be sampled. For example, the configuration below produces a NEXUS format ancestral tree for 10 random viruses selected from the population every 100 generations.

As with the <alignment> sampler, a random subset of viruses is selected from the population each time the sampler is run. It is possible to get a complete picture of the branching process if the tree sample size matches the population size. The resulting trees may be viewed with figtree or other tree visualization software.

    <sampler>
        <atFrequency>100</atFrequency>
        <fileName>santa_out.trees</fileName>
        <tree>
   	    <sampleSize>10</sampleSize>
   	    <format>NEXUS</format>
   	    <label>sequence_%s</label>
        </tree>
    </sampler>

<sampleSize> : number of leaves in the sampled trees.
<format> : format of the genealogy trees to be produced:
- NEXUS: NEXUS format
- NEWICK: NEWICK format
<format> : labels associated with the leaves of each tree.

As with <label> elements in alignment samplers, the strings '%r', '%g', and '%s' are substituted with respectively the replicate index, the generation number, and the index of the sequence within the sample. The format provided here must provide unique names across all sampled taxa, but should provide consistent names across samples. For example, a value of name would not work because all taxa would have the same name. sequence_%s_%g also would not work because the taxon names would change across samples. A value of sequence_%s satisfies all requirements and works well. An incorrect value here results in a Java exception thrown from deep in the jebl.jar library.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

3.7. Outputting the Simulation Results

Getting the Simulation Results: `<samplingSchedule>`

Sampling alignments: `<alignment>`

Sampling statistics: `<statistics>`

Sampling allele frequencies: `<alleleFrequency>`

Sampling genealogies: `<tree>`

Clone this wiki locally

3.7. Outputting the Simulation Results

Getting the Simulation Results: <samplingSchedule>

Sampling alignments: <alignment>

Sampling statistics: <statistics>

Sampling allele frequencies: <alleleFrequency>

Sampling genealogies: <tree>

Clone this wiki locally

Getting the Simulation Results: `<samplingSchedule>`

Sampling alignments: `<alignment>`

Sampling statistics: `<statistics>`

Sampling allele frequencies: `<alleleFrequency>`

Sampling genealogies: `<tree>`