-
Notifications
You must be signed in to change notification settings - Fork 10
3.1. Genome Description
The genome description specifies the length and organization of the genome in different features. A feature corresponds to an open reading frame, and specifies either a nucleotide or translated amino acid sequence.
The organization of the genome in features allows for a later definition of selection modes that act on different parts of the genome.
In addition, a single sequence or sequence alignment must be specified in the genome definition to seed the initial population (if configured so in the <population>
definition), or to configure a purifying selection to reflect observed states (if configured so in a <purifyingFitness>
definition).
An example of a genome definition:
<genome>
<length>21</length>
<!-- protein from a forward ORF that spans the entire genome -->
<feature>
<name>ABC protein</name>
<type>aminoAcid</type>
<coordinates>1-21</coordinates>
</feature>
<!-- protein from a backward ORF spanning sites 11 to 19 -->
<feature>
<name>DE protein</name>
<type>aminoAcid</type>
<coordinates>19-11</coordinates>
</feature>
<sequences>
>seq1
CCTCAGGTCACTCTTTGGCAAC
>seq2
CCTCGGGTCACTCCTTGGCGAC
</sequences>
</genome>
The genome length, as a number of nucleotides.
A genome feature has three properties:
-
<name>
A unique feature name. -
<type>
Must be 'nucleotide' or 'aminoAcid'. This is used to define if a fitness factor acts on nucleotides or amino acids. Note that for aminoAcid, the length of the feature needs to be a multiple of 3. 'aminoAcid' features implicitly get a fitness criteria that assigns-infinity
to any stop codon (TAA, TAG, or TGA) generated by a mutation, regardless of any other fitness criteria that is defined. -
<coordinates>
Defines how the feature is created from nucleotides in the genome. The format is a comma-separated list of fragments. Each fragment is defined by a single nucleotide site, or a range (begin-end). A range where begin is larger than end is read in the opposite direction.
By default, a nucleotide feature genome is created, which represents the entire genome.
One or multiple full-genome sequences may be given, either in FASTA or plain format. In the plain format, sequences are separated by a new-line. The input file is introduced this way:
<genome>
<length>609</length>
<sequences file='input_fasta.fa'/>
<feature>
<name>CDS</name>
<type>aminoAcid</type>
<coordinates>1-609</coordinates>
</feature>
</genome>