-
Notifications
You must be signed in to change notification settings - Fork 7
Data Master
"Data Master" is a package of executables for general data analysis.
Files in a .dm
("Data Master") format store object-attribute and object-object tables for data analysis.
These files have the extension .dm
, but when the file names are used as program parameters, the extension .dm
is omitted.
Objects (or entities) is something which can be sampled and observed.
Arttributes are functions of objects.
A one-way attribute is a function of one object. For example, if a "human" is an object, then "weight" is a one-way attribute of a human.
A two-way attribute is a function of two objects. For example, if a "city" is an object, then "distance" is a two-way attribute of a pair of cities.
Attributes have a scale which can be:
- interval (
real
,integer
orreal2
in Data Master) - ratio (
positive
,probability
orpositive2
in Data Master) nominal
- ordinal (not implemented)
-
Boolean
orCompactBoolean
The real2
and positive2
are two-way attributes.
An example of a Data Master file with a dissimilarity two-way attribute dist
is the file phylogeny/data/tree4.dm
:
OBJNUM 4 NAME NOMULT
ATTRIBUTES
dist POSITIVE2 0
DATA
A
B
C
D
dist FULL
0 7 9 10
7 0 10 11
9 10 0 3
10 11 3 0
The first line says that there are 4 objects, each object has a name, but no multiplicity, i.e., each object has multiplicity 1.
Multiplicity is the number of repetitions.
The line after ATTRIBUTES
says that there is one two-way non-negative attribute dist
where each number is non-negative and has 0 decimals.
The 4 lines after DATA
contain the names of the 4 objects (it is actually an object-attribute table, but there are no one-way attributes).
Then the line dist FULL
indicates that below is a square 4 x 4 matrix of the two-way attribute dist
, aka an object-object table.
This is the beginning of the file /phylogeny/data/Saccharomyces.dm
:
OBJNUM 501 NAME NOMULT
ATTRIBUTES
obj_size POSITIVE 0
cons POSITIVE2 8
DATA
1038988 11487077
1038998 11601448
1104308 11489728
1163408 11905256
1363498 11606437
Here there are 501 objects, their names are GenBank assembly identifiers.
And there are two attributes: one one-way attribute obj_size
(the length of genomes in bp) and one two-way attribute cons
("conservation" dissimilarity).
The objects and the one-way attribute obj_size
are in the object-attribute table following the keyword DATA
.
After the object-attribute table there is a square 501 x 501 matrix of the second attribute cons
.
The number of decimals, 8, for the attribute cons
, does not affect the computation, but is used for the printing of the attribute values.
The keywords OBJNUM
, NAME
, ATTRIBUTES
, POSITIVE2
etc. are case-insensitive.
The names of objects and attributes are case-sensitive.
<file> ::=
{# <comment> <new line>}*
OBJNUM <spaces> <objnum> <spaces> [NO]NAME <spaces> [NO]MULT <spaces>
ATTRIBUTES <spaces>
{<attribute> <spaces>}+
DATA <spaces>
<row>*
<addition>*
<spaces> ::= <space> <space>*
<objnum> ::= <natural>
<attribute> ::= <name> <spaces> <type>
<type> ::= {REAL | REAL2 | POSITIVE | POSITIVE2 | PROBABILITY} <decimals>
| INTEGER | BOOLEAN | COMPACTBOOLEAN
| NOMINAL <category>+
<decimals> ::= <natural>
<category> ::= <name>
<row> ::= [<object> <spaces>] [<multiplicity> <spaces>] {<value> <spaces>}+
<object> ::= <name>
<addition> ::= <object comments> | <two-way attribute> | <pair data>
<object comments> ::=
COMMENT <spaces> <new_line>
<object comment>*
<object comment> ::= <comment> <newline>
<two-way attribute> ::= <name> <spaces> <two-way data>
<two-way data> ::= FULL <spaces> <full_matrix>
| PARTIAL <spaces> <partial objnum> <spaces> <partial matrix>
| PAIRS <spaces> <pairnum> <spaces> <pairs>
<partial objnum> ::= <natural>
<partial matrix> ::= <object> <value>*
<pairnum> ::= <natural>
<pairs> ::= <pair>*
<pair> ::= <object> <spaces> <object> <spaces> <value> [<spaces> <comment>] <new line>
<pair data> ::=
PAIR_DATA <spaces> <pair data pairnum> <spaces> <pair data attrnum> <spaces>
<pair data attribute>*
<pair data row>*
<pair data attrnum> ::= <natural>
<pair data attribute> ::= <name> <spaces>
<pair data row> ::= <object> <spaces> <object> <spaces> <pair data value>*
Where
- all keywords are case-insensitive
-
<comment>
is a character string not containing<new_line>
- attribute types ending with "2" are two-way attributes, otherwise one-way attributes
- the number of
<row>
is<objnum>
- the number of
<object comment>
is<objnum>
- the number of
<value>
is the number of one-way attributes -
<name>
is a string not containing<space>
, case sensitive -
<multiplicity>
is a non-negative real number -
<full matrix>
is a square matrix of<value>
-
<partial objnum>
is not greater than<objnum>
- in
<pairs>
the object pairs must be ordered alphabetically, and if the same pair is present more than once its values are averaged - the number of
<value>
in<partial matrix>
is<partial objnum>
- the number of
<pair>
is<pairnum>
- the number of
<pair data row>
is<pair data pairnum>
- the number of
<pair data attribute>
and<pair data value>
is<pair data attrnum>
-
<value>
is the value of a corresponding attribute, it must match the attribute type; missing values are "?" or "NAN" for reals; reals also can be "INF" and "-INF" -
<new line>
is a new line character -
<space>
is a space, tab or<new line>
-
<natural>
is a non-negative integer
The executables are in the directory $TT/dm/
.