-
Notifications
You must be signed in to change notification settings - Fork 22
Database
A Tensorlog DATABASE is holds a bunch of unary and binary relations, which are encoded as scipy sparse matrixes. The human-readable format for this is a set of files with the .cfacts extension. Each line contains a predicate name and then one or two tab-separated string constants. Some examples, from src/test/textcattoy.cfacts:
hasWord dh a hasWord dh pricy hasWord dh doll hasWord dh house hasWord ft a hasWord ft little hasWord ft red hasWord ft fire hasWord ft truck ... label pos label neg
An additional column can be added which is a numeric weight (so don't use any constant that parses to a number in a cfacts file to avoid program confusion.) You need to group facts with the same predicate together.
A database can be serialized and after serialization should be stored in a directory with extension .db. A serialized database is much smaller and can be loaded more quickly (although load time is less of an issue with later version of TensorLog).
To see what's in a database, serialized or not, you can use the 'list' module, for example:
python -m tensorlog.list --db test-data/textcattoy.cfacts
or
python -m tensorlog.list --db test-data/textcattoy.cfacts --mode hasWord/2
You can optionally add type declarations in a cfacts file, like this:
# :- predict(doc,label) # :- hasWord(doc,word) # :- posPair(word,labelWordPair) # :- label(label)
This will basically put the constants of type 'doc', 'label', etc in different namespaces. Types are all disjoint. You must either type everything or nothing (in the latter case, everything is given a default type).
You can put the type declarations anywhere in a the cfacts file, but the type declaration for a predicate needs to come BEFORE any triples for the predicate.