Grackle is a project to automate the marshaling and unmarshaling code for grove.
Effectively, it aims to build on top of the old marshaling code found here by
creating a compiler capable of taking a subset of the Protobuf specification
(.proto
file) and emitting the Go code which can perform the marshaling and
unmarshaling of the described buffers and a Gallia script which enables
reasoning about the marshaling process.
- Usage
- Why “Grackle”
- Compatibility with =protoc=
- Supported Protobuf Features
- Grackle Wire Format
- Generated Code Guide
- Technical Considerations
- Development
Grackle is capable of generating three outputs for an input directory of proto files:
- Go code which marshals and unmarshals the messages.
- Goose output, a representation of the generated go code in goose lang.
- Coq proofs using perennial that the generated go code is correct.
Grackle can generate just the go code, just the coq code, go and goose code, coq and go or all three. Basically, the go and coq code generation can happen separately, but goose code can only be generated when go code it.
Usage of grackle [-options] <proto file directory>: -coq-logical-path string Logical path to import the marshal proofs from -coq-physical-path string Physical output path for coq proofs -debug Output all generated code to stdout -go-output-path string Physical path to output go code into -go-package string Fully qualified root package for the output packages -goose-output string Directory to write the goose output
To use grackle, the tool needs at least three pieces of information, a directory
with the .proto
files and either
- the two
coq
flags - the two
go
flags
The other flags are optional. A full invocation of grackle would be
grackle -coq-logical-path Grackle.test -coq-physical-path testdata/out/coq/ -go-output-path testdata/out/go -go-package github.com/mjschwenne/grackle/testdata/out/go -goose-output testdata/out/goose/ testdata/proto/calendar
A more detailed description of the flags and ideas is given below
coq-logical-path
: The Coq import systems works by mapping physical directories on disk into logical directories which are referenced in the proofs to perform the imports. This flag provides the logical path where the output coq proofs will be written. For any message with nested messages, this is the logical path where the proofs of the other messages will be imported from. The proofs will always import the correspondinggoose
output from a logical directory namedGoose
. This is a standard assumption for perennial and related projects.coq-physical-path
: This is where on disk to write the coq proofs. There is no strict relation between the coq physical path and the corresponding coq logical path. The two should be linked via a-Q
flag in the_CoqProject
file. Grackle will create this directory if it doesn’t exist.debug
: Causes the generated go and coq code to be written to standard output. The format for this output is--- Begin: < file output path > --- < file contents > --- End: < file output path > --- --- Begin: < next file output path > --- < next file contents > --- End: < next file output path > ---
go-output-path
: The path on disk to write the go output from grackle. As explained later, each message in the proto files of the input directories will get it’s own go package, so grackle will create a nested directory within the provided one.go-package
: The full path name for the go package that would correspond to thego-output-path
. Unlike the coq logical and physical paths, there is an expected (but not required) correspondence between the package name and the output path. For projected hosted on GitHub, go convention expects the package name to start withgithub.com/<user name>/<repo name>
, for example importing something from grackle as a library would start withgithub.com/mjschwenne/grackle
. From this base, go convention expects the package name to be basically the relative path to the repository root to the directory the package lives in,go-output-path
. As with the coq logical path, the generated go code will import from the go code for any message containing enumerations or nested messages from this package.goose-output
: If this flag is set, callgoose
programmatically on the outputgo
files, writing the output to this directory. The output will follow the standardgoose
practice of writing the output into subdirectories that follow the full go package name.
A grackle is a bird capable of mimicking other birds and even human speech if it wants to, and it’s also a bird whose name starts with ‘g’ to complement goose, a project which converts a subset of Go to Coq and is used as part of grackle.
Grackle is not compatible with protoc
. Neither the wire format grackle uses nor
the generated go API is compatible. Some major differences are highlighted
below.
- The
int32
andint64
fields are treated as unsigned, rather than signed. - The wire format is markedly different, see Grackle Wire Format for more information. Probably the most notable two differences is that the grackle wire format does not use variable width integer encodings and fields are not tagged with the field number, but rather serialized in order as they appear in the proto file.
- Go code generation is structured differently. The
protoc
compiler outputs one file per proto file while grackle outputs one go package per message (and enum). - Recursive messages, naturally including mutually recursive message, are not
supported by grackle despite being supported by
protoc
. - Grackle appends a
_gk
suffix to generated go packages and proof file names. On the Coq side, output modules do not include the suffix and are referenced directly by the name of the proto messages.
The proto language is used by Google’s protocol buffers to describe the layout of binary messages in a language agnostic manner (which is a bit ironic since proto is itself a language). Below is a list of supported and unsupported features as well as any commentary on them.
- Messages, including nested messages. (Mutually) recursive use of messages is
not supported, regardless of if it is supported by
protoc
. - Integer fields, although all of the integers are mapped to unsigned integers
in both go and coq. This is mostly due to limited support for signed integers
within perennial. Should support improve, support for signed integers may be
added.
int32
uint32
fixed32
int64
uint64
fixed64
- Boolean fields,
bool
. - String fields,
string
. - Byte fields,
bytes
. - Imports between proto files in the input proto directory or a subdirectory of the input proto directory.
- Slices of fields. These are represented with the
repeated
label, i.e.repeated uint32
. Slices will support types that do not have pointers in them due to coq proof limitations rather than go code generation. - Enumerations. Go code generation is nearly complete, but the coq proofs cannot handle the use of enumerations in the proofs yet.
- Maps. After slices, map support is important and used in several places in the perennial ecosystem.
- Signed integer fields
sint32
,sfixed32
,sint64
andsfixed64
. This is mostly due to limited support for signed integers within perennial. According to the proto language, regularint32
andint64
should also be signed even though grackle treats them as unsigned. It is unclear at this point if support for signed integers would change the interpretation of these fields, which would be a breaking change. - Recursive messages. This issue seems particularly challenging on the coq side, although go code generation is unlikely to be significant issue.
- Nested messages declaration. Referencing other top-level messages for a field
is supported. However, protobuf supports declaring nested messages within a
message, like this example from the protobuf documentation:
message SearchResponse { message Result { string url = 1; string title = 2; repeated string snippets = 3; } repeated Result results = 1; }
any
andoneof
labels. These labels operate as a wildcard type and sum type. They are ignored by grackle and unsupported. Moreover, support for these labels is not planned.
The grackle wire format is must simpler than the protobuf wire format. The two most notable differences are:
- Grackle does not use a variable width encoding for integers. All integers are fixed width and embedded in the serialized message in little-endian byte order.
- Grackle fields are not tagged with the field number defined in the proto file. Fields are serialized and deserialized in the order they appear in the proto file.
As an example, consider the following event
message. The complete proto file for
this message can be found here.
message Event {
int32 id = 1;
string name = 4;
TimeStamp startTime = 2;
TimeStamp endTime = 3;
}
The wire format for the message would start with the four bytes needed for id
,
then the length of the name
string as a 64 bit number followed by the bytes
composing the string itself. The null byte is not included since casting a
string to a byte slice ([]byte
) does not include the null byte. Finally, the
marshaled message includes the 12 bytes needed to marshal a TimeStamp
. In the
below diagram, the number after each field is the length of the field. Note that
the nested structures don’t have to be of a fixed size, but are in this example.
[ | | | | ] [ id (4) | length(name) (8) | name (variable) | startTime (12) | endTime (12) ] [ | | | | ]
Below is an overview of what type of code is generated by grackle. Full examples can be found in the testdata/out subdirectories.
The generated code guide will use the calendar example, which features the Event
message from above and a TimeStamp
message containing three integers for the
hour, minute and second. All example output snippets will reference the Event
message since it’s a bit more interesting. Note that the examples in this README
may not be kept up to date, but the files in testdata/out will be.
For this example, grackle might be invoked as
grackle -coq-logical-path Grackle.test -coq-physical-path testdata/out/coq/ -go-output-path testdata/out/go -go-package github.com/mjschwenne/grackle/testdata/out/go -goose-output testdata/out/goose/ testdata/proto/calendar
Which would generate these files (which can be accessed in the testdata/out directory of this repository, along with the other test files):
testdata/out ├── coq │ ├── event_proof.v │ └── timestamp_proof.v ├── go │ ├── event_gk │ │ └── event_gk.go │ └── timestamp_gk │ └── timestamp_gk.go └── goose └── github_com └── mjschwenne └── grackle └── testdata └── out └── go ├── event_gk.v └── timestamp_gk.v
For a top-level message, grackle will output:
- A go package with the name of the message, all lowercase, and the
_gk
suffix, such asevent_gk
. Inside the package is one file with the same name. - That file contains
- A struct definition
S
. Since go access patterns always use the package name, every message will haveS
as the struct name. It can be accessed from outside the package as, e.g.event_gk.S
which will differentiate between separate messages. - A
Marshal
function which takes anS
and a byte sliceprefix
and returnsprefix
with the encoding ofS
appended to it. - An
Unmarshal
function which takes a byte slice and returns anS
struct and the unusedsuffix
.
- A struct definition
The use of prefix
and suffix
enable a stateless marshaling design which is
easier to reason about and more compositional. More information can be found in
the Technical Considerations section.
Here is an example of the generated struct
definition for the Event
message:
type S struct {
Id uint32
Name string
StartTime timestamp_gk.S
EndTime timestamp_gk.S
}
Now we have the Marshal
function. As a common pattern in grackle, this function
can be broken into three parts:
- Setup : Including the function signature and first line of the function, which is needed for goose compatibility.
- Fields : Loop over the fields of the message, performing the same action on
each one of them. In this case, serializing each field and appending it to
enc
. - Conclusion : Return the byte slice containing
prefix
and the newly encodedEvent
.
func Marshal(e S, prefix []byte) []byte {
var enc = prefix
enc = marshal.WriteInt32(enc, e.Id)
nameBytes := []byte(e.Name)
enc = marshal.WriteInt(enc, uint64(len(nameBytes)))
enc = marshal.WriteBytes(enc, nameBytes)
enc = timestamp_gk.Marshal(e.StartTime, enc)
enc = timestamp_gk.Marshal(e.EndTime, enc)
return enc
}
Finally we have the Unmarshal
function. This function follows the same three
steps, although the setup is more complex than one line. Goose does not support
mixed variable assignment and initialization using the :=
operator when a
function returns multiple return values, so grackle creates the variables
beforehand. Then iterates through the fields of the struct, deserializing them
one at a time. In the function conclusion the values read from the serialized
byte slice are packed into a struct and returned, along with the remaining bytes
in the input byte slice.
func Unmarshal(s []byte) (S, []byte) {
var enc = s // Needed for goose compatibility
var id uint32
var name string
var startTime timestamp_gk.S
var endTime timestamp_gk.S
id, enc = marshal.ReadInt32(enc)
var nameLen uint64
var nameBytes []byte
nameLen, enc = marshal.ReadInt(enc)
nameBytes, enc = marshal.ReadBytesCopy(enc, nameLen)
name = string(nameBytes)
startTime, enc = timestamp_gk.Unmarshal(enc)
endTime, enc = timestamp_gk.Unmarshal(enc)
return S{
Id: id,
Name: name,
StartTime: startTime,
EndTime: endTime,
}, enc
}
The go code generation is the simple part, while the Coq code proof generation
is much more complex.
For a top-level message, grackle will generate a proof file with the lower-cased
message name plus the _gk
suffix. That file will contain:
- A coq record
C
corresponding to the go struct and the proto message. - A definition
has_encoding
of what it means for a byte slice (or list of bytes in coq) to encode the record. - A separation logic proposition
own
about what it means to own a struct. - A function
to_val'
to convert a record into a goose lang value. - A function
from_val'
to convert a goose lang value into a record. - Proof that the into value and from value functions define what it means to
have a record as a value,
<message name>_into_val
. - Proof that having the value form of the record correctly encodes the goose
lang struct definition,
<message name>_into_val_for_type
. - An encoding lemma
wp_Encode
which describes the behavior of calling theMarhsal
function, as well as a proof of correctness. - A decoding lemma
wp_Decode
which describes the behavior of calling theUnmarshal
function, as well as a proof of correctness.
Just like with the Go code, grackle starts by outputing a Coq record. For the
running Event
example, that looks like this:
Record C :=
mkC {
id : u32 ;
name : string ;
startTime : TimeStamp.C ;
endTime : TimeStamp.C ;
}.
The structure of the records is intentionally similar to that of the go struct
S
and of course the proto message.
Next grackle produces a has_encoding
definition which relates the record C
to
the wire format. In cases where the field is an integer or similar stand-alone
field, it is directly converted into a sequence of bytes and appended together.
Nested messages are a bit different though. The definition is designed to be
compositional, so Event.has_encoding
asserts that there exists some sequence of
bytes which faithfully encodes the startTime
and endTime
time stamps using the
TimeStamp.has_encoding
definition.
Definition has_encoding (encoded:list u8) (args:C) : Prop :=
∃ start_enc end_enc,
encoded = (u32_le args.(id)) ++
(u64_le $ length $ string_to_bytes args.(name)) ++ string_to_bytes args.(name) ++
start_enc ++ end_enc
/\ TimeStamp.has_encoding start_enc args.(startTime)
/\ TimeStamp.has_encoding end_enc args.(endTime).
Some extra flexibility here is that grackle can allow the TimeStamp
encoding to
change without (hopefully) requiring regeneration of the Event
code, although a
recompile will be required according to coq consistency requirements.
After has_encoding
, which related the record to the wire format, we need a way
to represent owning an instance of the record, which will be used in the pre and
post conditions for the serialization lemmas.
Definition own (args__v: val) (args__c: C) (dq: dfrac) : iProp Σ :=
∃ (startTime__v endTime__v : val) ,
"%Hown_struct" ∷ ⌜ args__v = (#args__c.(id), (#(str args__c.(name)), (TimeStamp.to_val' args__c.(startTime), (TimeStamp.to_val' args__c.(endTime), #()))))%V ⌝ ∗
"Hown_startTime" ∷ TimeStamp.own startTime__v args__c.(startTime) dq ∗
"Hown_endTime" ∷ TimeStamp.own endTime__v args__c.(endTime) dq.
The own
proposition relates the coq record to struct representation in goose
lang. All values in goose lang have the val
type at the meta level, but we can
decompose them into a struct with the correct values in the correct order. Like
with has_encoding
, most of the simple fields like integers can be embedded
directly. For nested messages, grackle asserts that the values in the goose lang
val
correspond to the nested messages using the next definition, to_val'
. Since
owning a nested struct may entail using some other piece of state (if there is a
slice in the nested message, for instance), grackle also needs to assert
ownership over that, which is accomplished by adding a call to the nested own
predicate.
As briefly mentioned above, all values in goose lang are represented as a val
at
the Coq level. To enable this message to be nested within another message,
grackle needs to know what value a given record will have. To do this, it
generates a function from C
to val
, named to_val'
. The apostrophe is required to
conflicting with perennial names. This function is identical to the pure fact
"%Hown_struct"
which is generated in the own
predicate.
Definition to_val' (c : C) : val :=
(#c.(id), (#(str c.(name)), (TimeStamp.to_val' c.(startTime), (TimeStamp.to_val' c.(endTime), #())))).
Grackle produces a corresponding function which takes arbitrary goose lang values
and attempts to convert them to the grackle coq record. The return type of this
function is option C
since a val
can have algebraic structures ranging form a
single literal to arbitrary structs, most of which will not be compatible with
the layout of this specific struct.
Definition from_val' (v : val) : option C :=
match v with
| (#(LitInt32 id), (#(LitString name), (startTime, (endTime, #()))))%V =>
match TimeStamp.from_val' startTime with
| Some startTime =>
match TimeStamp.from_val' endTime with
| Some endTime =>
Some (mkC id name startTime endTime)
| None => None
end
| None => None
end
| _ => None
end.
Most of the simple fields like integers can be interpreted at the pattern match
level, nested messages cannot since the type of the message isn’t a constructor
for a val
. Thus, a nested match
structure is generated which requires all nested
messages to parse correctly to produce a record from a val
.
While we’ve already seen instances where converting back and forth between C
’s
and val
’s are important, most of the time we want proofs involving grackle
messages to handle these conversions using standard perennial relations and
functions. In order for that to happen, grackle has to show that to_val'
and
from_val'
refine IntoVal
for C
. This means that grackle can convert a C
into a
val
when needed.
Disclaimer: The below code snippet isn’t automatically generated yet but should be by the end of the week.
#[global]
Instance Event_into_val : IntoVal C.
Proof.
refine {|
to_val := to_val';
from_val := from_val';
IntoVal_def := (mkC (W32 0) "" (IntoVal_def TimeStamp.C) (IntoVal_def TimeStamp.C))
|}.
intros v.
destruct v as [i n [sh sm ss] [eh em es]]; done.
Defined.
To show the C
can be converted into a value, grackle provides the to_val'
and
from_val'
functions as well as an initial value for a newly allocated instance
of the record. These values match the default values go
uses if you encountered
something like Event{}
since all types in go
have default values. Since the
structures of C
and S
match so closely, the proof is simple, requiring mostly a
large destruct
and calling done
on all the new goals.
While it’s true that grackle can convert a C
to a val
and vice versa, it knows
more than just that. Specifically, the val
is an instance of the goose lang
representation of the go struct. To ascribe a type to the returned val
, grackle
generates an instance of IntoValForType C (struct.t <message name>)
.
Disclaimer: The below code snippet isn’t automatically generated yet but should be by the end of the week.
#[global]
Instance Event_into_val_for_type : IntoValForType C (struct.t Event).
Proof. constructor; auto 10. Defined.
This proof is extremely simple, requiring only a single tactic.
Recall the observation that the "%Hown_struct"
proposition in own
is identical
to to_val'=. For some field types, such as nested messages and slices, grackle
needs to know that having ownership over a record =C
implies something about the
shape of the corresponding val
. To accomplish this goal, grackle generates
own_to_val
,
Disclaimer: The below code snippet isn’t automatically generated yet but should be by the end of the week.
Lemma own_to_val (v : val) (c : C) (dq : dfrac) :
own v c dq -∗ own v c dq ∗ ⌜ v = to_val c ⌝.
Proof.
(* Generated, but omitted from this README *)
Qed.
This lemma allows grackle and later us to extract this correspondence between
the two. Note that this lemma uses the goose standard way to convert types to
values, to_val
(missing the '
). The lemma returns ownership of the coq record
since otherwise perennial would consume it and the proof could only learn the
pure fact while losing access to own
since the program logic is affine.
All of the above coq definitions exist to enable grackle to actually verify the
Marshal
and Unmarshal
function generated on the Go side. The wp_Encode
lemma
describes the behavior of the Marshal
function.
Lemma wp_Encode (args__v : val) (args__c : C) (pre_sl : Slice.t) (prefix : list u8) (dq : dfrac):
{{{
own args__v args__c dq ∗
own_slice pre_sl byteT (DfracOwn 1) prefix
}}}
event_gk.Marshal args__v (slice_val pre_sl)
{{{
enc enc_sl, RET (slice_val enc_sl);
⌜ has_encoding enc args__c ⌝ ∗
own args__v args__c dq ∗
own_slice enc_sl byteT (DfracOwn 1) (prefix ++ enc)
}}}.
Proof.
(* Generated, but omitted from this README *)
Qed.
The encoding lemma requires some level of ownership over an instance of the
generated message, using args__v
as the goose lang value and args__c
as the coq
level representation. Here, dq
quantifies the discardable fractional ownership
over any heap values in the own
definition. Since only read access is needed to
marshal, the value of dq
is unbounded with respect to a dfrac
. The Marshal
function also needs ownership of the prefix
slice that the serialized args__v
will be appended to. Since Marshal
writes to this slice, full ownership
(DfracOwn 1
) is required.
After marshaling, ownership of args__v
is returned to the caller (the third
piece of the post condition, own args__v args__c dq=) at the same level of
ownership =Marshal
was called with. Additionally, the caller learns that the
returned slice encodes prefix ++ enc
(the second part of the postcondition),
with enc
being some list of bytes which correctly has an encoding for args__c
(using has_encoding
, the first part of the post condition).
The mirror image of wp_Encode
is naturally wp_Decode
, which specifies the
behavior of calling Unmarshal
.
Lemma wp_Decode (enc : list u8) (enc_sl : Slice.t) (args__c : C) (suffix : list u8) (dq : dfrac):
{{{
⌜ has_encoding enc args__c ⌝ ∗
own_slice_small enc_sl byteT dq (enc ++ suffix)
}}}
event_gk.Unmarshal (slice_val enc_sl)
{{{
args__v suff_sl, RET (args__v, suff_sl);
own args__v args__c (DfracOwn 1) ∗
own_slice_small suff_sl byteT dq suffix
}}}.
Proof.
(* Generated, but omitted from this README *)
Qed.
The conditions for this Texan triple are basically the inverse of wp_Encode
.
Calling Unmarshal
requires some level of access to a byte slice (the serialized
struct with a potentially empty suffix) and the pure fact that the beginning of
the byte slice encodes some coq level record args__c
.
After Unmarshal
returns, the caller learns that they have full ownership of the
newly created struct and ownership at the same level of just the suffix
slice.
Note that dq
is only bound by being a valid discardable fractional permission.
Removing enc
from the beginning of the slice isn’t a write (which would require
full ownership), but rather returns a new pointer to an existing location in the
same potentially read only slice used to call Unmarshal
.
There are several technical considerations for a piece of software like grackle, relating to API design, proof generation and the underlying tool-chain. While the below discussion is non-exhaustive, they are some points I have additional comments on.
Grackle builds on top of the existing marshal library for perennial, which
provides a set of “primitives” for reading integers, byte slices and other go
types. This library provides two APIs, a stateful and stateless one. The
stateful version requires the programmer to create an object which internal
tracks pervious write calls before expelling a binary blob when marshaling is
complete (not unlike a java StringBuilder
). The nice feature about this design
is that it removes the requirement that the programmer track the prefix
/
suffix
slices.
Grackle uses the stateless API which always return the prefix
/ suffix
slices
and adopts that style itself. The reason for this is mostly that this style is
easier to reason about within perennial and enables stronger compositionality
between messages. While grackle is technically a simple compiler, it doesn’t
employ standard compiler techniques like generating then transforming an
abstract syntax tree, instead using go’s template package (see go templates for
more information). It’s ability to track complex and contextual information like
managing state is limited, particularly on the coq side (I think I could generate
stateful go code, but it would make the coq proofs much more challenging).
Ultimately the stateless design is a way to reduce complexity for an already
complex piece of software.
The body of the main wp_Encode
and wp_Decode
lemmas aren’t given in this README,
but there are some interesting things I have to say about their structure. These
proofs can be found in the event_proof.v file.
As first described during go code generation, the proofs for these important lemmas can be thought of as a proof setup, followed by a series of proof snippets, one for each field in the proto message and then a conclusion. Visually, that might look something like this:
Proof. +----------------------------------------------------------------------------+ | | | Proof Setup | | | +----------------------------------------------------------------------------+ +----------------------------------------------------------------------------+ | | | Field Snippets | | | | +------------------------------------------------------------------------+ | | | | | | | First Field | | | | | | | +------------------------------------------------------------------------+ | | | | +------------------------------------------------------------------------+ | | | | | | | Second Field | | | | | | | +------------------------------------------------------------------------+ | | | | . | | . | | . | | | | +------------------------------------------------------------------------+ | | | | | | | Last Field | | | | | | | +------------------------------------------------------------------------+ | | | +----------------------------------------------------------------------------+ +----------------------------------------------------------------------------+ | | | Conclusion | | | +----------------------------------------------------------------------------+ Qed.
At first pass, it seems difficult to find proof snippets which can be
arbitrarily ordered and instantiated. Fortunately, we can effectively break the
task of marshaling or unmarshaling a struct down to operating on the head or end
of a slice at a field level. Each snippet requires the same set of hypotheses,
such as Hsl
which tracks the prefix
/ suffix
slices. Thus, it’s the job of the
proof setup to create the initial hypotheses needed to process each field. Since
perennial employs an affine logic, each snippet then has to reintroduce the next
version of any common hypotheses it consumes. The snippets also leak information
about each field further down the chain of fields until the proof conclusion is
reached. The conclusion’s role is to show that the post condition as been
satisfied, and to do this it assumes that all of the needed hypotheses have been
created with a well-known name by the corresponding snippet.
This type of proof design incentivizes compact, highly local proof snippets
which ignore everything that isn’t related to the field at hand. However in the
conclusion, grackle needs to tie everything together. Consider the conclusion
from the Event
Marshal
proof:
wp_load. iApply "HΦ". iModIntro. rewrite -?app_assoc.
iFrame. iPureIntro.
unfold has_encoding. split.
{
exists startTime_enc, endTime_enc.
rewrite ?string_bytes_length.
rewrite Hargs_name_sz.
rewrite ?w64_to_nat_id. exact.
} done.
The part in the braces needs to know that startTime_enc
, endTime_enc
and
Hargs_name_sz
all exist, since those are variables and propositions generated
over the course of the proof. The existence of these non-local proof constructs
mandates the usage of well-know names for everything that needs to referenced
later. Maintaining these types of structures are required but more challenging
than individual proof snippets. I like to believe that this requirement also
enables grackle proofs to be highly readable and understanding by the proof
engineer.
Grackle makes extensive use of the text/template templating package in the go
standard library. The templates them selves can be found in internal/templates.
Note that grackle uses <<
and >>
as template delimiters to avoid conflicts with
Texan triples. This library is surprisingly expressive and powerful, but does
have limitations.
- I’m realizing as I write this README that the templates do support comments
using
<</*
and*/>>
, so adding some comments would improve readability. - The templates are all handcrafted. This is a boon as much as a con, since I can finely control things like indentation in the Coq output, but it often leads to hard to read templates that may do the same thing in multiple different ways.
- The templates struggle to track state between templates, which influenced the choice to use the stateless API for grackle.
However I don’t really want to be excessively negative here. The templating library did include several features I was surprised by, including:
- Recursive templates. Generating things like the
to_val'
function and"%Hown_struct"
require carefully balancing nested and skewed parentheses. While these could be tracked via arange
construct, I find that a recursive template is more effective. - Exposing custom
go
functions to the templates. In thesetupTemplates
function in grackle.go, lots of custom helper functions are exposed to the templates. A lot of these are basic string manipulation functions, but several extend the functionality of the templates themselves such ascallTemplate
which enabled support for dynamically building the name of the template to call.dict
which I use to bundle multiple “arguments” into one template call since the templates only support taking one argument by default.
- Extremely fine-grained control over the style of the output code. This is
particularly relevant for the coq code since the go code passes through
gofmt
before becoming output. This enables me to pursue readable automatically generated coq which helps with debugging the proof snippets themselves.
Grackle development makes use of several tools selected solely for my own
convenience, however they aren’t required to use or development the software.
The only hard requirement is to have go
and protoc
installed (although I’d
highly recommend having Coq installed to so that you can compile the output
proofs). Notably, no code generation plugin is needed for protoc
. Grackle uses
it to generate a descriptor set which can be parsed by the go protobuf libraries
to interpret the structure of the input proto files.
Other things to be aware of in the grackle repository:
direnv
is setup to use anix
flake. This is because I use nixos as my primary operating system. The flake notably installs go, coq, protoc and goose. Goose is invoked by grackle via a go package, so it does not need to be installed on your system, but I’ve found it helpful to have around.- The
run_tests.fish
script runs grackle over the example proto files, then checks that all the output code compiles. Finally it runs the go test suite, which is admittedly limited since it’s easy to generate code that looks correct but doesn’t compile. Running this script provides a higher level of confidence than the go test suite itself.