-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trace Reconstruction #2
Trace Reconstruction #2
Conversation
…d changing the trace parsing
…, allowing me to remove bound
… which requires one less parameter
…struct to function
… are parsed as nats or atoms
…e a few changes to the Dafny kernel
dafny/lcf/.gitignore
Outdated
*.dll | ||
# Test outputs | ||
*.trace | ||
*.thm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please could we fix the missing newlines at the end of files? I think there's a few examples of it in this PR.
I think there's a vscode setting for this: https://stackoverflow.com/a/44704969
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
dafny/lcf/README.md
Outdated
@@ -0,0 +1,20 @@ | |||
## Project Overview | |||
|
|||
The Veri-Datalog project is a verified Datalog+ engine. It proves the correctness of Datalog+ programs by constructing proof trees from their trace output. Datalog+ is a research defined language that is similar to Datalog but with more feature support; both are subsets of Prolog. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: wrap paragraphs to some reasonable width, 100 or 120 characters perhaps?
dafny/lcf/README.md
Outdated
|
||
The Veri-Datalog project is a verified Datalog+ engine. It proves the correctness of Datalog+ programs by constructing proof trees from their trace output. Datalog+ is a research defined language that is similar to Datalog but with more feature support; both are subsets of Prolog. | ||
|
||
## Contributions by Rory Brennan-Jones |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps remove this heading? We know it was your work, but I don't think the README needs to say that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
dafny/lcf/datalog.g4
Outdated
@@ -9,7 +9,7 @@ trace | |||
; | |||
|
|||
traceevent | |||
: port level=integer id=integer goal=clause ';' | |||
: port level=integer id=integer goal=clause';' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: keep the space after clause
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
dafny/lcf/datalog.g4
Outdated
; | ||
|
||
atom | ||
: val=Identifier | ||
; | ||
|
||
natural | ||
: numeral=Int // TODO: make natural use the Nat rule, or switch support from natural numbers to integers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason we can't do this TODO now?
It's odd to have a rule called natural
actually be an integer. I'd suggest we either rename natural
to integer
, or change Int
to Nat
. Either is fine, just the mixed naming is odd right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned in person, this may have to stay unresolved for now. To summarize, the kernel only supports natural numbers right now, but the Nat lexer rule in the parser doesn't seem to work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed that the naming issue is worth resolving now, if we can.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have resolved the naming issue. The Int lexer rule has been commented out, and the rest of the Antlr uses the Nat rule.
print(f'destination("{p.dst}").', file=out) | ||
|
||
|
||
def _write_datalog(p, out): # cls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably remove the _
prefix in the function name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
# Query. | ||
# print(f'{cls.QUERY_NAME}(S, D) :- source(S), destination(D), connected(S, D).', file=out) | ||
|
||
_write_datalog(random_connectivity_problem(100), sys.stdout) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you wrap this in the usual Python boilerplate:
def main():
write_datalog(random_connectivity_problem(100), sys.stdout)
if __name__ == "__main__":
main()
https://stackoverflow.com/questions/419163/what-does-if-name-main-do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
dafny/lcf/tests/connectivity1.pl
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I recall, some of these tests had interesting trace features that triggered bugs in prior failed versions of trace reconstruction.
Are you able to provide comments where appropriate explaining what properties they have which make them interesting cases.
I guess the parser may not support comments right now? In which case, can you add your comments to tests/README.md
or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some comments in README, although I might need to write more than what I currently have.
dafny/lcf/def.dfy
Outdated
function lower_char(c : char) : char { | ||
match (c) | ||
case 'A' => 'a' | ||
case 'B' => 'b' | ||
case 'C' => 'c' | ||
case 'D' => 'd' | ||
case 'E' => 'e' | ||
case 'F' => 'f' | ||
case 'G' => 'g' | ||
case 'H' => 'h' | ||
case 'I' => 'i' | ||
case 'J' => 'j' | ||
case 'K' => 'k' | ||
case 'L' => 'l' | ||
case 'M' => 'm' | ||
case 'N' => 'n' | ||
case 'O' => 'o' | ||
case 'P' => 'p' | ||
case 'Q' => 'q' | ||
case 'R' => 'r' | ||
case 'S' => 's' | ||
case 'T' => 't' | ||
case 'U' => 'u' | ||
case 'V' => 'v' | ||
case 'W' => 'w' | ||
case 'X' => 'x' | ||
case 'Y' => 'y' | ||
case 'Z' => 'z' | ||
case _ => c | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you may be able to rewrite this by converting to integer with (c as int)
, doing the ascii math, and converting back with (... as char)
.
https://dafny.org/dafny/DafnyRef/DafnyRef.html#sec-characters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
match strings(parts.l) { | ||
case Ok(parts_strings) => str.s == string_join(sep.s, parts_strings) | ||
case Ok(parts_strings) => str.s == string_join(sep.s, parts_strings) // TODO: add support for padding |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay to not implement it, but it would be good if the code errors when a non-empty padding is passed in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! I think Mike already made most of the comments, so I think this can be merged as-is.
dafny/lcf/datalog.g4
Outdated
; | ||
|
||
atom | ||
: val=Identifier | ||
; | ||
|
||
natural | ||
: numeral=Int // TODO: make natural use the Nat rule, or switch support from natural numbers to integers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed that the naming issue is worth resolving now, if we can.
the pipe operator. In addition to these types, there is also support for | ||
certain builtin operations, such as equality operators and string and list | ||
builtins. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to add a list of the primitives that we currently do support? For instance, which comparison operators are supported for numbers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I added a few lists to the readme.
invariant |trace'| < |trace| | ||
invariant forall j :: 0 <= j < |args| ==> args[j].wf(rs) | ||
{ | ||
// This while loop skips over all ineligble trace events until it finds one that matches with the next child rule. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does it determine the match? I guess just by seeing if the substitutions line up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, more or less.
// Deduce theorem. | ||
var root := outcome.nodes[0]; | ||
var maybe_match := reconstruct(root, root.prop, rs); | ||
print "tree:\n"; // if there was a flag that disabled printing in build_proof_tree, this line should be disabled as well |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this comment refer to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"// Deduce theorem" refers to the section where the theorem object is created.
"// if there was a flag, ..." is a note that in Verus port, this line will be enabled by a flag. (If the user doesn't want the tree to be printed out, they wouldn't want "tree:" to be printed.)
Major Changes
Test Suite
In the lcf directory, running
make test
will verify the go predicate of every Datalog+ file in the tests directory. The results are saved as .thm files with names matching the original code files. Using the Python script in the tools directory, it is possible to randomly generate example Datalog+ programs involving edge connectivity problems.New Trace Reconstruction Algorithm
The trace output of a Prolog program shows an exploration of many possible paths before settling on one that allows the query to be true or resolves some unification. The new trace reconstruction algorithm attempts to discard all the failed paths and look at just the successful path, which greatly simplifies the process of constructing a proof tree for that query. The algorithm does this by only looking at trace events that use the exit port, since these events result from successes and contain full variable assignments. However, failed paths can have zero or more temporary successes, potentially leading to the presence of "junk" events with exit ports in the trace output.
Since exit ports always follow successes, the first exit ports in the trace happen after finding succesful base cases (builtin operations, facts, etc.), and later exit ports show the successes of predicates that depended on other rules and facts. As a result, the algorithm reads from the trace backwards in order to go from the root to the leaves instead of vice versa. The algorithm uses a mix of recursion and iteration, where the recursion helps the algorithm move "up" and "down" the tree, and the iteration helps algorithm move between siblings in the tree and skip junk events. A trace event with a prop such as a fact, equality, or builtin becomes a leaf in the proof tree, whereas other trace events become regular theorems that have proofs using other theorems and leaves from lower down in the tree as arguments.
Support for Types and Builtins
Unlike Datalog, Datalog+ supports constants other than atoms, including numbers, strings, and lists. These types are not complete yet. It has not been decided yet if the numbers should be natural numbers or integers (or if both should be supported). Furthermore, strings do not support all characters that could be found in a legal Datalog+ string, including escaped characters such as double quotes. Also, lists currently only support constant terms and cannot use the pipe operator. In addition to these types, there is also support for certain builtin operations, such as equality operators and string and list builtins.