-
Notifications
You must be signed in to change notification settings - Fork 114
Development Tutorial Life Cycle
This document examines the details of commands directly involved in replication.
The very first instruction in most heads-based organisms is h-alloc
,
which will allocate space for an offspring to be placed into. If you look at
the file source/cpu/cHardwareCPU.cc, you will see that this
instruction is associated with the method
cHardwareCPU::Inst_MaxAlloc().
What this means is that the organism will automatically allocate as much
space as it possibly can, without having to first calculate its needs. When
the organism is finished copying itself and divides off its child, any excess
allocated memory will automatically be discarded.
This Inst_MaxAlloc() method will determine the maximum amount of extra space that an organism is allowed to allocate, and then run the Allocate_Main() function passing in that amount. Allocate_Main is a very long method which is mostly just check to make sure that everything going on is legal, and then initializes the new memory that was allocated as per the configuration file: random, default instruction, or leave as it was in the previous organism that used it (for "necrophelia").
Most of the initial self-analysis done on Avida organisms is with the
search
instruction or one of its variants. In the heads based
instruction set, we call this h-search
and associate it with the
method cHardwareCPU::Inst_HeadSearch().
The search type instructions read in the template (series of nops) that follows it, determine the complement template, and find that complement elsewhere in the genome. It then sets the registers BX and CX to be the distance to the found template and the size of that template respectively. Finally, we place the flow control head at the end of the template found in order to reference it later on. Obviously this last step only occurs in the heads-based search.
The first search instruction executed by a heads organism is typically used
to locate the end of its genome. The search will place the flow-head at
the end of the genome, which the organism will use to move the write head to
this point as well. This is done with the mov-head
instruction.
If the mov-head
instruction is followed by a nop-C
it
will move the write head to the flow head, and ideally be ready to start
copying itself into the newly allocated space for the offspring.
The copy loop is the heart of any organism. It consists of a setup, a copy segment to copy one or more instructions, a test segment to determine if the loop has finished, and a 'jump' type instruction to move back to copy the next line.
In a hand written organism, the setup is an h-search
command with
no template to direct its behavior. The default when this instruction does not
have a template is to just drop the flow-head at the very next instruction,
which is what it is used for here -- it places the flow head at the beginning
of the portion of code that will actually be looped through copying each line.
The copy segment is typically just a single h-copy
instruction
that will read in an instruction from the location of the read-head, and write
it out to the location of the write-head. It will then advance both heads to
the next positions in the genomes. Take a look at the source code for the
method cHardwareCPU::Inst_HeadCopy().
The first thing that happens in this method is the variables read_head, write_head, and cpu_stats are setup as references to the appropriate objects in the hardware and organism (that is, any modifications to these references change the actual objects, not just a local copy of them). This is so that we have easy to use variables locally for those objects that we are working with. The read_head and write_head are then adjusted to make sure they are in a legal position on the genome (if, for example, the last instruction changed the organism's size, the heads might no longer be pointing to memory that still exists).
Next, the instruction at the read head is recorded in the variable read_inst, and we test to see if this should be mutated to some other value. If a mutation does occur, we change the read_inst variable to a random value, increment the mutation count, and mark flags at the instruction position of the write head to denote the mutation. After we determine what instruction was read (be it a correct reading or not), we call the ReadInst() method, which is simply used to keep track of the most recent template copied. This template is used to help detect the end of the organism, which we shall discuss in a moment.
Finally, we collect the statistics that another copy command was executed in this organism, finish the write by placing this instruction at the position of the write head (and setting its flag as being a copied instruction) and then advancing both heads to their next positions.
After an organism executes one of these copies it has to test to see if it
is done copying itself. The heads based organisms will typically do this
with the aid of the if-label
instruction, which tests to see if
the most recent label copied is the complement of the one that follows it.
If so, it will execute the next instruction (often a divide), otherwise it
will skip that next instruction and execute a mov-head
that will
jump the instruction pointer back to the flow head that was placed at the
beginning of the copy loop. It will continue this copy-test-jump cycle
until all the lines have been copied.
A common adaptation is "unrolling the loop".
In the hand-written version discussed above, each instruction must have
three instructions executed to copy it: h-copy
,
if-label
, and mov-head
. But what if a second
h-copy
command were inserted after the first? Now the program
would be one line longer, so it would have more to copy, but each time
through the loop would now copy two instructions while executing four --
that means that on average only two instructions need be executed to copy
one. A *huge* savings. The main drawback to the organism is that its
length will need to be a multiple of two, or else the test to see if it is
finished won't occur at the proper time. This loop unrolling becomes less
and less beneficial each time the organism does it, so it won't go completely
out of control.
When an organism finishes copying itself, it needs to divide off its
child using a divide command. In the heads based instruction set, this is
the h-divide
command which calls the
cHardwareCPU::Inst_HeadDivide()
method.
This method will use the read head to determine the starting location of the offspring, and the write head to determine its end. This is logical because these are the locations that the heads should be in right after the copy loop has finished. Everything after the write head is cut off and discarded as "extra lines". This information is passed into the Divide_Main method which does the bulk of the work for divide (and is called by all of the various divide instructions in all of the sets).
The cHardwareCPU::Divide_Main() method is therefore what we are most interested in. It begins by calculating the size of that child that would result from the divide point and the extra_line count that were passed into it, and runs Divide_CheckViable() to make sure that all of these values are legal (that is that both parent and child are reasonable sizes for organisms, and reasonable sizes in relationship to each other -- for definitions of reasonable as found in the configuration file). If any of them are not legal, the method returns false.
From this point on, we know the divide is legal, so we just need to process it. We create a variable called child_genome, which we use to construct the child genome. We use a reference to a Genome object inside of the organism so that this child genome is attached to its parent organism and will be easily accessible from other places where it will be needed later. We're not going to be doing all of the work on it right in this method. We initialize the child genome to the section of the parents genome that was created for it. We then run Resize() on the parent genome to get rid of all of this extra space (both child and extra lines).
The Divide_DoMutations() method will test and (if needed) process any divide mutations that may occur. There are many of them, so this method is quite long. It is followed by Divide_TestFitnessMeasures(), which will run the offspring through a test CPU for special options that may be set in the genesis file (such as mutation reversions). Obviously this is very processor intensive since it would occur with every birth, so tests are only performed if required. Both of these methods are left to the reader to step through themselves.
If we are using extra costs associated with the first time instructions are used, those costs a reset now that a divide has occurred, and must be paid for again on the next divide cycle.
After a divide, we mark that we no longer have a mal (Memory ALlocation) active. If the parent is reset (i.e., we have two offspring, not a parent and child) we need to make sure not to advance the IP of the parent. The reset parent has its IP placed at the beginning of its genome, and we want to leave it there to execute the very first instruction.
Finally, we tell the organism to activate the divide and do something with the child. Give the child to the population (or the test CPU as the case may be) to be dealt with, and reset the parent if we're splitting into two offspring.
In the description of this life-cycle, one issue that has not been discussed is where these organisms would perform their computations. In truth, there isn't a fixed time other than it must be before the divide occurs, since merit is recalculated on a divide. In practice it will typically be placed right before the copy loop, but there are plenty of exceptions.
Ideally, in the longer term, an organism's life will be composed of much more than just replication and computations -- they will have to interact with each other and have more interactions with the environment. In a multi-threaded model, organisms will be doing many activities at the same time.