DeepDive 0.8.0
A completely re-architected version of DeepDive is here.
Now the system compiles an execution plan ahead of time, checkpoints at a much finer granularity, and gives users full visibility and control of the execution, so any parts of the computation can be flexibly repeated, resumed, or optimized later.
The new architecture naturally enforces modularity and extensibility, which enables us to innovate most parts independently without having to understand every possible combination of the entire code.
The abstraction layers that encapsulate database operations as well as compute resources are now clearly established, giving a stable ground for extensions in the future that support more types of database engines and compute clusters such as Hadoop/YARN and ones with traditional job schedulers.
As an artifact of this redesign, exciting performance improvements are now observed:
- The database drivers show more than 20x higher throughput (2MB/s -> 50MB/s, per connection) with zero storage footprint by streaming data in and out of UDFs.
- The grounded factor graphs save up to 100x storage space (12GB -> 180MB) by employing compression during the factor graph's grounding and loading, incurring less than 10% overhead in time (400s -> 460s, measuring only the dumping and loading, hence a much smaller fraction in practice).
See the issues and pull requests for this milestone on GitHub (most notably #445) for further details.
New commands and features
An array of new commands have been added to deepdive
, and existing ones have been rewritten, such as deepdive initdb
and deepdive run
.
-
deepdive compile
deepdive plan
deepdive do
deepdive redo
deepdive mark
deepdive done
-
deepdive model
-
deepdive create
deepdive load
deepdive unload
deepdive query
deepdive db
-
deepdive check
deepdive compute
@tsv_extractor
,@returns
Python decorators for parsing and formatting in UDFs.
-
Interactive tools
The bundled Mindbender can now automatically construct a search and browsing interface from DDlog annotations.
Documentation for Dashboard has been added.mindbender search
mindbender dashboard
mindbender snapshot
mindbender tagger
-
Miscellaneous
deepdive whereis
To learn more about individual deepdive COMMAND
, use the following deepdive help
command.
deepdive help COMMAND
Dropped and deprecated features
Scala code base has been completely dropped and rewritten in Bash and jq.
Many superfluous features have been dropped and are deprecated to be dropped as summarized below:
- All other extractor style than
tsv_extractor
,sql_extractor
, andcmd_extractor
have been dropped, namely:plpy_extractor
piggy_extractor
json_extractor
- Manually writing
deepdive.conf
is strongly discouraged as filling in more fields such asdependencies:
andinput_relations:
became mandatory.
Rewriting them in DDlog is strongly recommended. - Database configuration in
deepdive.db.default
is completely ignored.
db.url
must be used instead. deepdive.extraction.extractors.*.input
indeepdive.conf
should always be SQL queries.
TSV(filename.tsv)
orCSV(filename.csv)
no longer supported.