Skip to content

Genotype Specification

Lars Schöning edited this page Jan 30, 2017 · 10 revisions

Introduction

This document describes the logic used by gnomic to compute full inherited genotypes from differential genotypes.

General Rules (DRAFT)

  • Mutations are applied sequentially from left to right.
  • A genotype with a parent genotype is functionally equivalent to a genotype without a parent genotype but with all the changes from the other genotype applied first.

...

Insertion followed by deletion and vice versa (DRAFT)

Two inverse insertion and deletion mutations result in a return to the original state. If one or both of the mutations are applied with the multiple flag, the change resolution becomes more complex.

Change 1 Change 2 Effect
+A +A
-A -A
+A -A
-A +A
B>A B>A
+B B>A A
-B B>A 'B>A' or '' [ambiguous; potentially invalid; needs review]
+B B>>A B>>A
-B B>>A 'B>>A' or '' [ambiguous; potentially invalid; needs review]
-A ++A ++A [provisional]
+A --A --A [provisional]
--A +A +A
++A -A [ambiguous; invalid; needs review]
--A ++A ++A [provisional]
++A --A [provisional]

Partial changes (DRAFT)

Fusions and feature sets are compound elements of the genotype. Mutations are able to change these features partially.

Fusions

Fusions can contain features, feature sets and implicit sub-fusions (e.g. A:B is a sub-fusion of A:B:C, but A:C is not). These sub-elements can be cut out from a fusion or replaced with another element.

Change 1 Change 2 Effect
+A:B:C -A:B +C
-A:B:C +A:B -C
+A:B:C -B +A:C
-A:B:C +B -A:C
+A:{B C} -{B C} +A
-A:{B C} +{B C} -A
+A:B:C B>D +A:D:C
+A:B:C B>D:E +A:D:E:C
+A:B:C B:C>D:E +A:D:E
+A:B:C B:C>{D E} +A:{D E}

Feature sets

Feature sets can contain features, fusions or implicit feature subsets (e.g. {A B} and {B C} are feature subsets of of {A B C}, but {A D} or {A C} is not). Currently, elements in a feature set are assumed to be ordered ([provisional]). One-element feature sets are not converted into single features as opposed to fusions.

Change 1 Change 2 Effect
+{A B C} -B +{A C}
-{A B C} +B -{A C}
+{A B} -B +{A}
-{A B} +B -{A}
+{A B C} -{A B} +{C} [provisional]
-{A B C} +{A B} -{C} [provisional]
+{A B C} -{B} +{A C} [provisional]
-{A B C} +{B} -{A C} [provisional]
+{A B:C} -B:C +{A}
-{A B:C} +B:C -{A}
+{A B} B>C +{A C}
+{A B} B>C:D +{A C:D}
+{A B:C} B:C>D +{A D}
+{A B} B>{C D} +{A C D} [provisional]
+{A B C} {B C}>{D E} +{A D E} [provisional]

Consider:

  • defining feature sets as an unordered collection of elements. If such a definition would be used, the following operations would make sense: +{A B C} -{A B} is +{C}; +{A B C} -{C A} is +{B}.
  • distinguishing between -A and -{A} operations
  • feature set in a feature set issue: +{A B} B>C:{D E} = ?. How to resolve? {A C:{D E}} is not allowed by grammar currently.

Cascading changes

If a feature set contains a fusion or vice versa, the changes are also applied to that inner compound element.

Change 1 Change 2 Effect
+{A B:C} -B +{A C}
+{A B:C} B>D:E +{A D:E:C}
+A:{B C} C>D +A:{B D}
+A:{B C} C>D:E +A:{B D:E}
+A:{B}:C -B +A:{}:D [provisional]

Repeated integrative changes without locus (DRAFT)

Repeated changes such as +A, +A or -A -A are ambiguous and may be invalid, raising an exception. [needs review]

Mutation Loci (DRAFT)

Each integrative mutation may optionally have a locus. (Mutations that insert plasmids or set a phenotype are non-integrative and may not have a locus).

A locus acts as a namespace controlling interactions between mutations.

Consider these examples:

  • A>B B>C: The loci of the two mutations in this genotype are identical as they are both None. Therefore, both mutations are allowed to interact with each other and the final genotype becomes A>C. [potentially ambiguous; needs review]
  • A>B B>>C: The loci of the two mutations in this genotype are identical as they are both None. Therefore, both mutations are allowed to interact with each other. As the second mutation is a multiple insertion mutation, it is also applied on its own. The final genotype becomes A>C B>>C. [potentially ambiguous; needs review]
  • A@locus-1>B B@locus-2>C: Since the loci of the two mutations are different, they are not allowed to interact with each other and the final genotype is A@locus-1>B B@locus-2>C.
  • A@>B B@locus-2>C: Since the loci of the two mutations are different (None and 'locus-2'), they are not allowed to interact with each other and the final genotype is A@>B B@locus-2>C.

Ranges (TODO)

...

Terminology

  • integrative mutation: A mutation with a specified change on the genome. This includes non-integrated plasmids (e.g. (p1)), feature variants (e.g. A+), and markers (e.g. +x::A+).
  • multiple insertion mutation and multiple deletion mutation: A mutation with the multiple flag (e.g. ++A [provisional], --A [provisional], A>>B)