-
Notifications
You must be signed in to change notification settings - Fork 10
3.1. Genome Description
The genome description specifies the length and organization of the genome in different features. A feature corresponds to an open reading frame, and specifies either a nucleotide or translated amino acid sequence.
The organization of the genome in features allows for later definition of modes of selection that act on different parts of the genome.
In addition, a single sequence or sequence alignment must be specified in the genome definition to seed the initial population (if configured so in the <population>
definition), or to configure a purifying selection to reflect observed states (if configured so in a <purifyingFitness>
definition).
An example of a genome definition:
<genome>
<length>21</length>
<!-- protein from a forward ORF that spans the entire genome -->
<feature>
<name>ABC protein</name>
<type>aminoAcid</type>
<coordinates>1-21</coordinates>
</feature>
<!-- protein from a backward ORF spanning sites 11 to 19 -->
<feature>
<name>DE protein</name>
<type>aminoAcid</type>
<coordinates>19-11</coordinates>
</feature>
<sequences>
>seq1
CCTCAGGTCACTCTTTGGCAAC
>seq2
CCTCGGGTCACTCCTTGGCGAC
</sequences>
</genome>
The genome length, as a number of nucleotides.
A genome feature has three properties:
-
<name>
A unique feature name. -
<type>
Must be 'nucleotide' or 'aminoAcid'. This is used to define if a fitness factor acts on nucleotides or amino acids. Note that for aminoAcid, the length of the feature needs to be a multiple of 3. 'aminoAcid' features implicitly get a fitness criteria that assigns-infinity
to any stop codon (TAA, TAG, or TGA) generated by a mutation, regardless of any other fitness criteria that is defined. -
<coordinates>
Defines how the feature is created from nucleotides in the genome. The format is a comma-separated list of fragments. Each fragment is defined by a single nucleotide site, or a range (begin-end). A range where begin is larger than end is read in the opposite direction.
By default, a nucleotide feature genome is created, which represents the entire genome.
One or multiple full-genome sequences may be given, either in FASTA or plain format. In the plain format, sequences are separated by a new-line. The input file is introduced this way:
<genome>
<length>609</length>
<sequences file='input_fasta.fa'/>
<feature>
<name>CDS</name>
<type>aminoAcid</type>
<coordinates>1-609</coordinates>
</feature>
</genome>