Skip to content

3.1. Genome Description

ktheyss edited this page Jan 28, 2018 · 3 revisions

Element <genome> in the Config File

The genome description specifies the length and organization of the genome in different features. A feature corresponds to an open reading frame, and specifies either a nucleotide or translated amino acid sequence.

The organization of the genome in features allows for a later definition of selection modes that act on different parts of the genome.

In addition, a single sequence or sequence alignment must be specified in the genome definition to seed the initial population (if configured so in the <population> definition), or to configure a purifying selection to reflect observed states (if configured so in a <purifyingFitness> definition).

An example of a genome definition:

<genome>
    <length>21</length>

    <!-- protein from a forward ORF that spans the entire genome -->
    <feature>
       <name>ABC protein</name>
       <type>aminoAcid</type>
       <coordinates>1-21</coordinates>
    </feature>

    <!-- protein from a backward ORF spanning sites 11 to 19 -->
    <feature>
       <name>DE protein</name>
       <type>aminoAcid</type>
       <coordinates>19-11</coordinates>
    </feature>

    <sequences>
>seq1
CCTCAGGTCACTCTTTGGCAAC
>seq2
CCTCGGGTCACTCCTTGGCGAC
    </sequences>
</genome>

Genome length: <length>

The genome length, as a number of nucleotides.

Genome feature: <feature>

A genome feature has three properties:

  • <name> A unique feature name.

  • <type> Must be 'nucleotide' or 'aminoAcid'. This is used to define if a fitness factor acts on nucleotides or amino acids. Note that for aminoAcid, the length of the feature needs to be a multiple of 3. 'aminoAcid' features implicitly get a fitness criteria that assigns -infinity to any stop codon (TAA, TAG, or TGA) generated by a mutation, regardless of any other fitness criteria that is defined.

  • <coordinates> Defines how the feature is created from nucleotides in the genome. The format is a comma-separated list of fragments. Each fragment is defined by a single nucleotide site, or a range (begin-end). A range where begin is larger than end is read in the opposite direction.

By default, a nucleotide feature genome is created, which represents the entire genome.

Sequence or sequence alignment: <sequences>

One or multiple full-genome sequences may be given, either in FASTA or plain format. In the plain format, sequences are separated by a new-line. The input file is introduced this way:

<genome>
    <length>609</length>
    <sequences file='input_fasta.fa'/>
    <feature>
        <name>CDS</name>
        <type>aminoAcid</type>
        <coordinates>1-609</coordinates>
     </feature>
</genome>