Skip to content

Database

wwcohen edited this page Aug 12, 2016 · 10 revisions

A Tensorlog DATABASE is holds a bunch of unary and binary relations, which are encoded as scipy sparse matrixes. The human-readable format for this is a set of files with the .cfacts extension. Each line contains a predicate name and then one or two tab-separated string constants. Some examples, from src/test/textcattoy.cfacts:

 hasWord        dh      a
 hasWord        dh      pricy
 hasWord        dh      doll
 hasWord        dh      house
 hasWord        ft      a
 hasWord        ft      little
 hasWord        ft      red
 hasWord        ft      fire
 hasWord        ft      truck
 ...
 label  pos
 label  neg

An additional column can be added which is a numeric weight (so don't use any constant that parses to a number in a cfacts file to avoid program confusion.)

A database can be serialized and after serialization should be stored in a directory with extension .db. A serialized database is much smaller and can be loaded much more quickly.

To see what's in a database, serialized or not, you can use the 'list' module, for example:

 python -m list --db test/textcattoy.cfacts

or

 python -m list --db test/textcattoy.cfacts --mode hasWord/2
Clone this wiki locally