Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trace Reconstruction #2

Conversation

RoryBrennan-Jones
Copy link

Major Changes

Test Suite

In the lcf directory, running make test will verify the go predicate of every Datalog+ file in the tests directory. The results are saved as .thm files with names matching the original code files. Using the Python script in the tools directory, it is possible to randomly generate example Datalog+ programs involving edge connectivity problems.

New Trace Reconstruction Algorithm

The trace output of a Prolog program shows an exploration of many possible paths before settling on one that allows the query to be true or resolves some unification. The new trace reconstruction algorithm attempts to discard all the failed paths and look at just the successful path, which greatly simplifies the process of constructing a proof tree for that query. The algorithm does this by only looking at trace events that use the exit port, since these events result from successes and contain full variable assignments. However, failed paths can have zero or more temporary successes, potentially leading to the presence of "junk" events with exit ports in the trace output.

Since exit ports always follow successes, the first exit ports in the trace happen after finding succesful base cases (builtin operations, facts, etc.), and later exit ports show the successes of predicates that depended on other rules and facts. As a result, the algorithm reads from the trace backwards in order to go from the root to the leaves instead of vice versa. The algorithm uses a mix of recursion and iteration, where the recursion helps the algorithm move "up" and "down" the tree, and the iteration helps algorithm move between siblings in the tree and skip junk events. A trace event with a prop such as a fact, equality, or builtin becomes a leaf in the proof tree, whereas other trace events become regular theorems that have proofs using other theorems and leaves from lower down in the tree as arguments.

Support for Types and Builtins

Unlike Datalog, Datalog+ supports constants other than atoms, including numbers, strings, and lists. These types are not complete yet. It has not been decided yet if the numbers should be natural numbers or integers (or if both should be supported). Furthermore, strings do not support all characters that could be found in a legal Datalog+ string, including escaped characters such as double quotes. Also, lists currently only support constant terms and cannot use the pipe operator. In addition to these types, there is also support for certain builtin operations, such as equality operators and string and list builtins.

mmcloughlin and others added 30 commits June 12, 2024 10:59
*.dll
# Test outputs
*.trace
*.thm
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please could we fix the missing newlines at the end of files? I think there's a few examples of it in this PR.

I think there's a vscode setting for this: https://stackoverflow.com/a/44704969

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@@ -0,0 +1,20 @@
## Project Overview

The Veri-Datalog project is a verified Datalog+ engine. It proves the correctness of Datalog+ programs by constructing proof trees from their trace output. Datalog+ is a research defined language that is similar to Datalog but with more feature support; both are subsets of Prolog.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: wrap paragraphs to some reasonable width, 100 or 120 characters perhaps?


The Veri-Datalog project is a verified Datalog+ engine. It proves the correctness of Datalog+ programs by constructing proof trees from their trace output. Datalog+ is a research defined language that is similar to Datalog but with more feature support; both are subsets of Prolog.

## Contributions by Rory Brennan-Jones
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps remove this heading? We know it was your work, but I don't think the README needs to say that.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -9,7 +9,7 @@ trace
;

traceevent
: port level=integer id=integer goal=clause ';'
: port level=integer id=integer goal=clause';'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: keep the space after clause?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

;

atom
: val=Identifier
;

natural
: numeral=Int // TODO: make natural use the Nat rule, or switch support from natural numbers to integers
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we can't do this TODO now?

It's odd to have a rule called natural actually be an integer. I'd suggest we either rename natural to integer, or change Int to Nat. Either is fine, just the mixed naming is odd right now.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned in person, this may have to stay unresolved for now. To summarize, the kernel only supports natural numbers right now, but the Nat lexer rule in the parser doesn't seem to work.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that the naming issue is worth resolving now, if we can.

Copy link
Author

@RoryBrennan-Jones RoryBrennan-Jones Aug 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have resolved the naming issue. The Int lexer rule has been commented out, and the rest of the Antlr uses the Nat rule.

print(f'destination("{p.dst}").', file=out)


def _write_datalog(p, out): # cls
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably remove the _ prefix in the function name.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

# Query.
# print(f'{cls.QUERY_NAME}(S, D) :- source(S), destination(D), connected(S, D).', file=out)

_write_datalog(random_connectivity_problem(100), sys.stdout)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you wrap this in the usual Python boilerplate:

def main():
    write_datalog(random_connectivity_problem(100), sys.stdout)

if __name__ == "__main__":
    main()

https://stackoverflow.com/questions/419163/what-does-if-name-main-do

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I recall, some of these tests had interesting trace features that triggered bugs in prior failed versions of trace reconstruction.

Are you able to provide comments where appropriate explaining what properties they have which make them interesting cases.

I guess the parser may not support comments right now? In which case, can you add your comments to tests/README.md or something?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some comments in README, although I might need to write more than what I currently have.

Comment on lines 69 to 98
function lower_char(c : char) : char {
match (c)
case 'A' => 'a'
case 'B' => 'b'
case 'C' => 'c'
case 'D' => 'd'
case 'E' => 'e'
case 'F' => 'f'
case 'G' => 'g'
case 'H' => 'h'
case 'I' => 'i'
case 'J' => 'j'
case 'K' => 'k'
case 'L' => 'l'
case 'M' => 'm'
case 'N' => 'n'
case 'O' => 'o'
case 'P' => 'p'
case 'Q' => 'q'
case 'R' => 'r'
case 'S' => 's'
case 'T' => 't'
case 'U' => 'u'
case 'V' => 'v'
case 'W' => 'w'
case 'X' => 'x'
case 'Y' => 'y'
case 'Z' => 'z'
case _ => c
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you may be able to rewrite this by converting to integer with (c as int), doing the ascii math, and converting back with (... as char).

https://dafny.org/dafny/DafnyRef/DafnyRef.html#sec-characters

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

match strings(parts.l) {
case Ok(parts_strings) => str.s == string_join(sep.s, parts_strings)
case Ok(parts_strings) => str.s == string_join(sep.s, parts_strings) // TODO: add support for padding
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay to not implement it, but it would be good if the code errors when a non-empty padding is passed in.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Collaborator

@pratapsingh1729 pratapsingh1729 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! I think Mike already made most of the comments, so I think this can be merged as-is.

;

atom
: val=Identifier
;

natural
: numeral=Int // TODO: make natural use the Nat rule, or switch support from natural numbers to integers
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that the naming issue is worth resolving now, if we can.

the pipe operator. In addition to these types, there is also support for
certain builtin operations, such as equality operators and string and list
builtins.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to add a list of the primitives that we currently do support? For instance, which comparison operators are supported for numbers?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I added a few lists to the readme.

invariant |trace'| < |trace|
invariant forall j :: 0 <= j < |args| ==> args[j].wf(rs)
{
// This while loop skips over all ineligble trace events until it finds one that matches with the next child rule.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does it determine the match? I guess just by seeing if the substitutions line up?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, more or less.

// Deduce theorem.
var root := outcome.nodes[0];
var maybe_match := reconstruct(root, root.prop, rs);
print "tree:\n"; // if there was a flag that disabled printing in build_proof_tree, this line should be disabled as well
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this comment refer to?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"// Deduce theorem" refers to the section where the theorem object is created.
"// if there was a flag, ..." is a note that in Verus port, this line will be enabled by a flag. (If the user doesn't want the tree to be printed out, they wouldn't want "tree:" to be printed.)

pratapsingh1729 pushed a commit that referenced this pull request Aug 2, 2024
@mmcloughlin mmcloughlin merged commit 3e23775 into secure-foundations:trace-reconstruction Aug 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants