Skip to content

Database

Tom Smoker edited this page Oct 18, 2018 · 10 revisions

A Tensorlog DATABASE is holds a bunch of unary and binary relations, which are encoded as scipy sparse matrixes. The human-readable format for this is a set of files with the .cfacts extension. Each line contains a predicate name and then one or two tab-separated string constants. Some examples, from src/test/textcattoy.cfacts:

 hasWord        dh      a
 hasWord        dh      pricy
 hasWord        dh      doll
 hasWord        dh      house
 hasWord        ft      a
 hasWord        ft      little
 hasWord        ft      red
 hasWord        ft      fire
 hasWord        ft      truck
 ...
 label  pos
 label  neg

An additional column can be added which is a numeric weight (so don't use any constant that parses to a number in a cfacts file to avoid program confusion.) You need to group facts with the same predicate together.

A database can be serialized and after serialization should be stored in a directory with extension .db. A serialized database is much smaller and can be loaded more quickly (although load time is less of an issue with later version of TensorLog).

To see what's in a database, serialized or not, you can use the 'list' module, for example:

 python -m tensorlog.list --db test-data/textcattoy.cfacts

or

 python -m tensorlog.list --db test-data/textcattoy.cfacts --mode hasWord/2

Typed Databases

You can optionally add type declarations in a cfacts file, like this:

 # :- predict(doc,label)
 # :- hasWord(doc,word)
 # :- posPair(word,labelWordPair)
 # :- label(label)

This will basically put the constants of type 'doc', 'label', etc in different namespaces. Types are all disjoint. You must either type everything or nothing (in the latter case, everything is given a default type).

You can put the type declarations anywhere in a the cfacts file, but the type declaration for a predicate needs to come BEFORE any triples for the predicate.

Clone this wiki locally