Skip to content

Automated proof-instrumented marshalling and unmarshalling for verified distributed systems in Go.

Notifications You must be signed in to change notification settings

mjschwenne/grackle

Repository files navigation

Grackle

Grackle is a project to automate the marshaling and unmarshaling code for grove. Effectively, it aims to build on top of the old marshaling code found here by creating a compiler capable of taking a subset of the Protobuf specification (.proto file) and emitting the Go code which can perform the marshaling and unmarshaling of the described buffers and a Gallia script which enables reasoning about the marshaling process.

Table of Contents

Usage

Grackle is capable of generating three outputs for an input directory of proto files:

  1. Go code which marshals and unmarshals the messages.
  2. Goose output, a representation of the generated go code in goose lang.
  3. Coq proofs using perennial that the generated go code is correct.

Grackle can generate just the go code, just the coq code, go and goose code, coq and go or all three. Basically, the go and coq code generation can happen separately, but goose code can only be generated when go code it.

Usage of grackle [-options] <proto file directory>:
  -coq-logical-path string
        Logical path to import the marshal proofs from
  -coq-physical-path string
        Physical output path for coq proofs
  -debug
        Output all generated code to stdout
  -go-output-path string
        Physical path to output go code into
  -go-package string
        Fully qualified root package for the output packages
  -goose-output string
        Directory to write the goose output

To use grackle, the tool needs at least three pieces of information, a directory with the .proto files and either

  • the two coq flags
  • the two go flags

The other flags are optional. A full invocation of grackle would be

grackle -coq-logical-path Grackle.test -coq-physical-path testdata/out/coq/
-go-output-path testdata/out/go -go-package github.com/mjschwenne/grackle/testdata/out/go
-goose-output testdata/out/goose/ testdata/proto/calendar

A more detailed description of the flags and ideas is given below

  • coq-logical-path : The Coq import systems works by mapping physical directories on disk into logical directories which are referenced in the proofs to perform the imports. This flag provides the logical path where the output coq proofs will be written. For any message with nested messages, this is the logical path where the proofs of the other messages will be imported from. The proofs will always import the corresponding goose output from a logical directory named Goose. This is a standard assumption for perennial and related projects.
  • coq-physical-path : This is where on disk to write the coq proofs. There is no strict relation between the coq physical path and the corresponding coq logical path. The two should be linked via a -Q flag in the _CoqProject file. Grackle will create this directory if it doesn’t exist.
  • debug : Causes the generated go and coq code to be written to standard output. The format for this output is
    --- Begin: < file output path > ---
    
    < file contents >
    
    --- End: < file output path > ---
    --- Begin: < next file output path > ---
    
    < next file contents >
    
    --- End: < next file output path > ---
        
  • go-output-path : The path on disk to write the go output from grackle. As explained later, each message in the proto files of the input directories will get it’s own go package, so grackle will create a nested directory within the provided one.
  • go-package : The full path name for the go package that would correspond to the go-output-path. Unlike the coq logical and physical paths, there is an expected (but not required) correspondence between the package name and the output path. For projected hosted on GitHub, go convention expects the package name to start with github.com/<user name>/<repo name>, for example importing something from grackle as a library would start with github.com/mjschwenne/grackle. From this base, go convention expects the package name to be basically the relative path to the repository root to the directory the package lives in, go-output-path. As with the coq logical path, the generated go code will import from the go code for any message containing enumerations or nested messages from this package.
  • goose-output : If this flag is set, call goose programmatically on the output go files, writing the output to this directory. The output will follow the standard goose practice of writing the output into subdirectories that follow the full go package name.

Why “Grackle”

A grackle is a bird capable of mimicking other birds and even human speech if it wants to, and it’s also a bird whose name starts with ‘g’ to complement goose, a project which converts a subset of Go to Coq and is used as part of grackle.

Compatibility with protoc

Grackle is not compatible with protoc. Neither the wire format grackle uses nor the generated go API is compatible. Some major differences are highlighted below.

  • The int32 and int64 fields are treated as unsigned, rather than signed.
  • The wire format is markedly different, see Grackle Wire Format for more information. Probably the most notable two differences is that the grackle wire format does not use variable width integer encodings and fields are not tagged with the field number, but rather serialized in order as they appear in the proto file.
  • Go code generation is structured differently. The protoc compiler outputs one file per proto file while grackle outputs one go package per message (and enum).
  • Recursive messages, naturally including mutually recursive message, are not supported by grackle despite being supported by protoc.
  • Grackle appends a _gk suffix to generated go packages and proof file names. On the Coq side, output modules do not include the suffix and are referenced directly by the name of the proto messages.

Supported Protobuf Features

The proto language is used by Google’s protocol buffers to describe the layout of binary messages in a language agnostic manner (which is a bit ironic since proto is itself a language). Below is a list of supported and unsupported features as well as any commentary on them.

Supported

  • Messages, including nested messages. (Mutually) recursive use of messages is not supported, regardless of if it is supported by protoc.
  • Integer fields, although all of the integers are mapped to unsigned integers in both go and coq. This is mostly due to limited support for signed integers within perennial. Should support improve, support for signed integers may be added.
    • int32
    • uint32
    • fixed32
    • int64
    • uint64
    • fixed64
  • Boolean fields, bool.
  • String fields, string.
  • Byte fields, bytes.
  • Imports between proto files in the input proto directory or a subdirectory of the input proto directory.

Planned Support

  • Slices of fields. These are represented with the repeated label, i.e. repeated uint32. Slices will support types that do not have pointers in them due to coq proof limitations rather than go code generation.
  • Enumerations. Go code generation is nearly complete, but the coq proofs cannot handle the use of enumerations in the proofs yet.
  • Maps. After slices, map support is important and used in several places in the perennial ecosystem.

Unsupported

  • Signed integer fields sint32, sfixed32, sint64 and sfixed64. This is mostly due to limited support for signed integers within perennial. According to the proto language, regular int32 and int64 should also be signed even though grackle treats them as unsigned. It is unclear at this point if support for signed integers would change the interpretation of these fields, which would be a breaking change.
  • Recursive messages. This issue seems particularly challenging on the coq side, although go code generation is unlikely to be significant issue.
  • Nested messages declaration. Referencing other top-level messages for a field is supported. However, protobuf supports declaring nested messages within a message, like this example from the protobuf documentation:
    message SearchResponse {
      message Result {
        string url = 1;
        string title = 2;
        repeated string snippets = 3;
      }
      repeated Result results = 1;
    }
        
  • any and oneof labels. These labels operate as a wildcard type and sum type. They are ignored by grackle and unsupported. Moreover, support for these labels is not planned.

Grackle Wire Format

The grackle wire format is must simpler than the protobuf wire format. The two most notable differences are:

  1. Grackle does not use a variable width encoding for integers. All integers are fixed width and embedded in the serialized message in little-endian byte order.
  2. Grackle fields are not tagged with the field number defined in the proto file. Fields are serialized and deserialized in the order they appear in the proto file.

As an example, consider the following event message. The complete proto file for this message can be found here.

message Event {
    int32 id = 1;
    string name = 4;
    TimeStamp startTime = 2;
    TimeStamp endTime = 3;
}

The wire format for the message would start with the four bytes needed for id, then the length of the name string as a 64 bit number followed by the bytes composing the string itself. The null byte is not included since casting a string to a byte slice ([]byte) does not include the null byte. Finally, the marshaled message includes the 12 bytes needed to marshal a TimeStamp. In the below diagram, the number after each field is the length of the field. Note that the nested structures don’t have to be of a fixed size, but are in this example.

[        |                  |                 |                |              ]
[ id (4) | length(name) (8) | name (variable) | startTime (12) | endTime (12) ]
[        |                  |                 |                |              ]

Generated Code Guide

Below is an overview of what type of code is generated by grackle. Full examples can be found in the testdata/out subdirectories.

The generated code guide will use the calendar example, which features the Event message from above and a TimeStamp message containing three integers for the hour, minute and second. All example output snippets will reference the Event message since it’s a bit more interesting. Note that the examples in this README may not be kept up to date, but the files in testdata/out will be.

For this example, grackle might be invoked as

grackle -coq-logical-path Grackle.test -coq-physical-path testdata/out/coq/
-go-output-path testdata/out/go -go-package github.com/mjschwenne/grackle/testdata/out/go
-goose-output testdata/out/goose/ testdata/proto/calendar

Which would generate these files (which can be accessed in the testdata/out directory of this repository, along with the other test files):

testdata/out
├── coq
│   ├── event_proof.v
│   └── timestamp_proof.v
├── go
│   ├── event_gk
│   │   └── event_gk.go
│   └── timestamp_gk
│       └── timestamp_gk.go
└── goose
    └── github_com
        └── mjschwenne
            └── grackle
                └── testdata
                    └── out
                        └── go
                            ├── event_gk.v
                            └── timestamp_gk.v

Go

For a top-level message, grackle will output:

  • A go package with the name of the message, all lowercase, and the _gk suffix, such as event_gk. Inside the package is one file with the same name.
  • That file contains
    • A struct definition S. Since go access patterns always use the package name, every message will have S as the struct name. It can be accessed from outside the package as, e.g. event_gk.S which will differentiate between separate messages.
    • A Marshal function which takes an S and a byte slice prefix and returns prefix with the encoding of S appended to it.
    • An Unmarshal function which takes a byte slice and returns an S struct and the unused suffix.

The use of prefix and suffix enable a stateless marshaling design which is easier to reason about and more compositional. More information can be found in the Technical Considerations section.

Go Struct

Here is an example of the generated struct definition for the Event message:

type S struct {
	Id        uint32
	Name      string
	StartTime timestamp_gk.S
	EndTime   timestamp_gk.S
}

Marshal Function

Now we have the Marshal function. As a common pattern in grackle, this function can be broken into three parts:

  • Setup : Including the function signature and first line of the function, which is needed for goose compatibility.
  • Fields : Loop over the fields of the message, performing the same action on each one of them. In this case, serializing each field and appending it to enc.
  • Conclusion : Return the byte slice containing prefix and the newly encoded Event.
func Marshal(e S, prefix []byte) []byte {
	var enc = prefix

	enc = marshal.WriteInt32(enc, e.Id)
	nameBytes := []byte(e.Name)
	enc = marshal.WriteInt(enc, uint64(len(nameBytes)))
	enc = marshal.WriteBytes(enc, nameBytes)
	enc = timestamp_gk.Marshal(e.StartTime, enc)
	enc = timestamp_gk.Marshal(e.EndTime, enc)

	return enc
}

Unmarshal Function

Finally we have the Unmarshal function. This function follows the same three steps, although the setup is more complex than one line. Goose does not support mixed variable assignment and initialization using the := operator when a function returns multiple return values, so grackle creates the variables beforehand. Then iterates through the fields of the struct, deserializing them one at a time. In the function conclusion the values read from the serialized byte slice are packed into a struct and returned, along with the remaining bytes in the input byte slice.

func Unmarshal(s []byte) (S, []byte) {
	var enc = s // Needed for goose compatibility
	var id uint32
	var name string
	var startTime timestamp_gk.S
	var endTime timestamp_gk.S

	id, enc = marshal.ReadInt32(enc)
	var nameLen uint64
	var nameBytes []byte
	nameLen, enc = marshal.ReadInt(enc)
	nameBytes, enc = marshal.ReadBytesCopy(enc, nameLen)
	name = string(nameBytes)
	startTime, enc = timestamp_gk.Unmarshal(enc)
	endTime, enc = timestamp_gk.Unmarshal(enc)

	return S{
		Id:        id,
		Name:      name,
		StartTime: startTime,
		EndTime:   endTime,
	}, enc
}

The go code generation is the simple part, while the Coq code proof generation is much more complex.

Coq

For a top-level message, grackle will generate a proof file with the lower-cased message name plus the _gk suffix. That file will contain:

  • A coq record C corresponding to the go struct and the proto message.
  • A definition has_encoding of what it means for a byte slice (or list of bytes in coq) to encode the record.
  • A separation logic proposition own about what it means to own a struct.
  • A function to_val' to convert a record into a goose lang value.
  • A function from_val' to convert a goose lang value into a record.
  • Proof that the into value and from value functions define what it means to have a record as a value, <message name>_into_val.
  • Proof that having the value form of the record correctly encodes the goose lang struct definition, <message name>_into_val_for_type.
  • An encoding lemma wp_Encode which describes the behavior of calling the Marhsal function, as well as a proof of correctness.
  • A decoding lemma wp_Decode which describes the behavior of calling the Unmarshal function, as well as a proof of correctness.

Coq Record

Just like with the Go code, grackle starts by outputing a Coq record. For the running Event example, that looks like this:

Record C :=
  mkC {
      id : u32 ;
      name : string ;
      startTime : TimeStamp.C ;
      endTime : TimeStamp.C ;
    }.

The structure of the records is intentionally similar to that of the go struct S and of course the proto message.

Has Encoding

Next grackle produces a has_encoding definition which relates the record C to the wire format. In cases where the field is an integer or similar stand-alone field, it is directly converted into a sequence of bytes and appended together. Nested messages are a bit different though. The definition is designed to be compositional, so Event.has_encoding asserts that there exists some sequence of bytes which faithfully encodes the startTime and endTime time stamps using the TimeStamp.has_encoding definition.

Definition has_encoding (encoded:list u8) (args:C) : Prop :=
  ∃ start_enc end_enc,
    encoded = (u32_le args.(id)) ++
              (u64_le $ length $ string_to_bytes args.(name)) ++ string_to_bytes args.(name) ++
              start_enc ++ end_enc
    /\ TimeStamp.has_encoding start_enc args.(startTime)
    /\ TimeStamp.has_encoding end_enc args.(endTime).

Some extra flexibility here is that grackle can allow the TimeStamp encoding to change without (hopefully) requiring regeneration of the Event code, although a recompile will be required according to coq consistency requirements.

Own

After has_encoding, which related the record to the wire format, we need a way to represent owning an instance of the record, which will be used in the pre and post conditions for the serialization lemmas.

Definition own (args__v: val) (args__c: C) (dq: dfrac) : iProp Σ :=
  ∃ (startTime__v endTime__v : val) ,
  "%Hown_struct" ∷ ⌜ args__v = (#args__c.(id), (#(str args__c.(name)), (TimeStamp.to_val' args__c.(startTime), (TimeStamp.to_val' args__c.(endTime), #()))))%V ⌝ ∗
  "Hown_startTime" ∷ TimeStamp.own startTime__v args__c.(startTime) dq ∗
  "Hown_endTime" ∷ TimeStamp.own endTime__v args__c.(endTime) dq.

The own proposition relates the coq record to struct representation in goose lang. All values in goose lang have the val type at the meta level, but we can decompose them into a struct with the correct values in the correct order. Like with has_encoding, most of the simple fields like integers can be embedded directly. For nested messages, grackle asserts that the values in the goose lang val correspond to the nested messages using the next definition, to_val'. Since owning a nested struct may entail using some other piece of state (if there is a slice in the nested message, for instance), grackle also needs to assert ownership over that, which is accomplished by adding a call to the nested own predicate.

To Value

As briefly mentioned above, all values in goose lang are represented as a val at the Coq level. To enable this message to be nested within another message, grackle needs to know what value a given record will have. To do this, it generates a function from C to val, named to_val'. The apostrophe is required to conflicting with perennial names. This function is identical to the pure fact "%Hown_struct" which is generated in the own predicate.

Definition to_val' (c : C) : val :=
  (#c.(id), (#(str c.(name)), (TimeStamp.to_val' c.(startTime), (TimeStamp.to_val' c.(endTime), #())))).

From Value

Grackle produces a corresponding function which takes arbitrary goose lang values and attempts to convert them to the grackle coq record. The return type of this function is option C since a val can have algebraic structures ranging form a single literal to arbitrary structs, most of which will not be compatible with the layout of this specific struct.

Definition from_val' (v : val) : option C :=
  match v with
  | (#(LitInt32 id), (#(LitString name), (startTime, (endTime, #()))))%V =>
    match TimeStamp.from_val' startTime with
    | Some startTime =>
        match TimeStamp.from_val' endTime with
        | Some endTime =>
            Some (mkC id name startTime endTime)
        | None => None
        end
    | None => None
    end
  | _ => None
  end.

Most of the simple fields like integers can be interpreted at the pattern match level, nested messages cannot since the type of the message isn’t a constructor for a val. Thus, a nested match structure is generated which requires all nested messages to parse correctly to produce a record from a val.

IntoVal

While we’ve already seen instances where converting back and forth between C’s and val’s are important, most of the time we want proofs involving grackle messages to handle these conversions using standard perennial relations and functions. In order for that to happen, grackle has to show that to_val' and from_val' refine IntoVal for C. This means that grackle can convert a C into a val when needed.

Disclaimer: The below code snippet isn’t automatically generated yet but should be by the end of the week.

#[global]
Instance Event_into_val : IntoVal C.
Proof.
  refine {|
      to_val := to_val';
      from_val := from_val';
      IntoVal_def := (mkC (W32 0) "" (IntoVal_def TimeStamp.C) (IntoVal_def TimeStamp.C))
    |}.
  intros v.
  destruct v as [i n [sh sm ss] [eh em es]]; done.
Defined.

To show the C can be converted into a value, grackle provides the to_val' and from_val' functions as well as an initial value for a newly allocated instance of the record. These values match the default values go uses if you encountered something like Event{} since all types in go have default values. Since the structures of C and S match so closely, the proof is simple, requiring mostly a large destruct and calling done on all the new goals.

IntoValForType

While it’s true that grackle can convert a C to a val and vice versa, it knows more than just that. Specifically, the val is an instance of the goose lang representation of the go struct. To ascribe a type to the returned val, grackle generates an instance of IntoValForType C (struct.t <message name>).

Disclaimer: The below code snippet isn’t automatically generated yet but should be by the end of the week.

#[global]
Instance Event_into_val_for_type : IntoValForType C (struct.t Event).
Proof. constructor; auto 10. Defined.

This proof is extremely simple, requiring only a single tactic.

Own to Val

Recall the observation that the "%Hown_struct" proposition in own is identical to to_val'=​. For some field types, such as nested messages and slices, grackle needs to know that having ownership over a record =C implies something about the shape of the corresponding val. To accomplish this goal, grackle generates own_to_val,

Disclaimer: The below code snippet isn’t automatically generated yet but should be by the end of the week.

Lemma own_to_val (v : val) (c : C) (dq : dfrac) :
  own v c dq -∗ own v c dq ∗ ⌜ v = to_val c ⌝.
Proof.
  (* Generated, but omitted from this README *)
Qed.

This lemma allows grackle and later us to extract this correspondence between the two. Note that this lemma uses the goose standard way to convert types to values, to_val (missing the '). The lemma returns ownership of the coq record since otherwise perennial would consume it and the proof could only learn the pure fact while losing access to own since the program logic is affine.

Encoding

All of the above coq definitions exist to enable grackle to actually verify the Marshal and Unmarshal function generated on the Go side. The wp_Encode lemma describes the behavior of the Marshal function.

Lemma wp_Encode (args__v : val) (args__c : C) (pre_sl : Slice.t) (prefix : list u8) (dq : dfrac):
  {{{
        own args__v args__c dq ∗
        own_slice pre_sl byteT (DfracOwn 1) prefix
  }}}
    event_gk.Marshal args__v (slice_val pre_sl)
  {{{
        enc enc_sl, RET (slice_val enc_sl);
        ⌜ has_encoding enc args__c ⌝ ∗
        own args__v args__c dq ∗
        own_slice enc_sl byteT (DfracOwn 1) (prefix ++ enc)
  }}}.

Proof.
  (* Generated, but omitted from this README *)
Qed.

The encoding lemma requires some level of ownership over an instance of the generated message, using args__v as the goose lang value and args__c as the coq level representation. Here, dq quantifies the discardable fractional ownership over any heap values in the own definition. Since only read access is needed to marshal, the value of dq is unbounded with respect to a dfrac. The Marshal function also needs ownership of the prefix slice that the serialized args__v will be appended to. Since Marshal writes to this slice, full ownership (DfracOwn 1) is required.

After marshaling, ownership of args__v is returned to the caller (the third piece of the post condition, own args__v args__c dq=​) at the same level of ownership =Marshal was called with. Additionally, the caller learns that the returned slice encodes prefix ++ enc (the second part of the postcondition), with enc being some list of bytes which correctly has an encoding for args__c (using has_encoding, the first part of the post condition).

Decoding

The mirror image of wp_Encode is naturally wp_Decode, which specifies the behavior of calling Unmarshal.

Lemma wp_Decode (enc : list u8) (enc_sl : Slice.t) (args__c : C) (suffix : list u8) (dq : dfrac):
  {{{
        ⌜ has_encoding enc args__c ⌝ ∗
        own_slice_small enc_sl byteT dq (enc ++ suffix)
  }}}
    event_gk.Unmarshal (slice_val enc_sl)
  {{{
        args__v suff_sl, RET (args__v, suff_sl);
        own args__v args__c (DfracOwn 1) ∗
        own_slice_small suff_sl byteT dq suffix
  }}}.

Proof.
  (* Generated, but omitted from this README *)
Qed.

The conditions for this Texan triple are basically the inverse of wp_Encode. Calling Unmarshal requires some level of access to a byte slice (the serialized struct with a potentially empty suffix) and the pure fact that the beginning of the byte slice encodes some coq level record args__c.

After Unmarshal returns, the caller learns that they have full ownership of the newly created struct and ownership at the same level of just the suffix slice. Note that dq is only bound by being a valid discardable fractional permission. Removing enc from the beginning of the slice isn’t a write (which would require full ownership), but rather returns a new pointer to an existing location in the same potentially read only slice used to call Unmarshal.

Technical Considerations

There are several technical considerations for a piece of software like grackle, relating to API design, proof generation and the underlying tool-chain. While the below discussion is non-exhaustive, they are some points I have additional comments on.

Stateless Design

Grackle builds on top of the existing marshal library for perennial, which provides a set of “primitives” for reading integers, byte slices and other go types. This library provides two APIs, a stateful and stateless one. The stateful version requires the programmer to create an object which internal tracks pervious write calls before expelling a binary blob when marshaling is complete (not unlike a java StringBuilder). The nice feature about this design is that it removes the requirement that the programmer track the prefix / suffix slices.

Grackle uses the stateless API which always return the prefix / suffix slices and adopts that style itself. The reason for this is mostly that this style is easier to reason about within perennial and enables stronger compositionality between messages. While grackle is technically a simple compiler, it doesn’t employ standard compiler techniques like generating then transforming an abstract syntax tree, instead using go’s template package (see go templates for more information). It’s ability to track complex and contextual information like managing state is limited, particularly on the coq side (I think I could generate stateful go code, but it would make the coq proofs much more challenging). Ultimately the stateless design is a way to reduce complexity for an already complex piece of software.

Proof Structure

The body of the main wp_Encode and wp_Decode lemmas aren’t given in this README, but there are some interesting things I have to say about their structure. These proofs can be found in the event_proof.v file.

As first described during go code generation, the proofs for these important lemmas can be thought of as a proof setup, followed by a series of proof snippets, one for each field in the proto message and then a conclusion. Visually, that might look something like this:

Proof.
 +----------------------------------------------------------------------------+
 |                                                                            |
 |                                 Proof Setup                                |
 |                                                                            |
 +----------------------------------------------------------------------------+

 +----------------------------------------------------------------------------+
 |                                                                            |
 |                               Field Snippets                               |
 |                                                                            |
 | +------------------------------------------------------------------------+ |
 | |                                                                        | |
 | |                               First Field                              | |
 | |                                                                        | |
 | +------------------------------------------------------------------------+ |
 |                                                                            |
 | +------------------------------------------------------------------------+ |
 | |                                                                        | |
 | |                              Second Field                              | |
 | |                                                                        | |
 | +------------------------------------------------------------------------+ |
 |                                                                            |
 |                                     .                                      |
 |                                     .                                      |
 |                                     .                                      |
 |                                                                            |
 | +------------------------------------------------------------------------+ |
 | |                                                                        | |
 | |                               Last Field                               | |
 | |                                                                        | |
 | +------------------------------------------------------------------------+ |
 |                                                                            |
 +----------------------------------------------------------------------------+

 +----------------------------------------------------------------------------+
 |                                                                            |
 |                                 Conclusion                                 |
 |                                                                            |
 +----------------------------------------------------------------------------+
Qed.

At first pass, it seems difficult to find proof snippets which can be arbitrarily ordered and instantiated. Fortunately, we can effectively break the task of marshaling or unmarshaling a struct down to operating on the head or end of a slice at a field level. Each snippet requires the same set of hypotheses, such as Hsl which tracks the prefix / suffix slices. Thus, it’s the job of the proof setup to create the initial hypotheses needed to process each field. Since perennial employs an affine logic, each snippet then has to reintroduce the next version of any common hypotheses it consumes. The snippets also leak information about each field further down the chain of fields until the proof conclusion is reached. The conclusion’s role is to show that the post condition as been satisfied, and to do this it assumes that all of the needed hypotheses have been created with a well-known name by the corresponding snippet.

This type of proof design incentivizes compact, highly local proof snippets which ignore everything that isn’t related to the field at hand. However in the conclusion, grackle needs to tie everything together. Consider the conclusion from the Event Marshal proof:

wp_load. iApply "HΦ". iModIntro. rewrite -?app_assoc.
iFrame. iPureIntro.

unfold has_encoding. split.
{
exists startTime_enc, endTime_enc.
rewrite ?string_bytes_length.
rewrite Hargs_name_sz.
rewrite ?w64_to_nat_id. exact.

} done.

The part in the braces needs to know that startTime_enc, endTime_enc and Hargs_name_sz all exist, since those are variables and propositions generated over the course of the proof. The existence of these non-local proof constructs mandates the usage of well-know names for everything that needs to referenced later. Maintaining these types of structures are required but more challenging than individual proof snippets. I like to believe that this requirement also enables grackle proofs to be highly readable and understanding by the proof engineer.

Go Templates

Grackle makes extensive use of the text/template templating package in the go standard library. The templates them selves can be found in internal/templates. Note that grackle uses <​< and >​> as template delimiters to avoid conflicts with Texan triples. This library is surprisingly expressive and powerful, but does have limitations.

  • I’m realizing as I write this README that the templates do support comments using <​</* and */>​>, so adding some comments would improve readability.
  • The templates are all handcrafted. This is a boon as much as a con, since I can finely control things like indentation in the Coq output, but it often leads to hard to read templates that may do the same thing in multiple different ways.
  • The templates struggle to track state between templates, which influenced the choice to use the stateless API for grackle.

However I don’t really want to be excessively negative here. The templating library did include several features I was surprised by, including:

  • Recursive templates. Generating things like the to_val' function and "%Hown_struct" require carefully balancing nested and skewed parentheses. While these could be tracked via a range construct, I find that a recursive template is more effective.
  • Exposing custom go functions to the templates. In the setupTemplates function in grackle.go, lots of custom helper functions are exposed to the templates. A lot of these are basic string manipulation functions, but several extend the functionality of the templates themselves such as
    • callTemplate which enabled support for dynamically building the name of the template to call.
    • dict which I use to bundle multiple “arguments” into one template call since the templates only support taking one argument by default.
  • Extremely fine-grained control over the style of the output code. This is particularly relevant for the coq code since the go code passes through gofmt before becoming output. This enables me to pursue readable automatically generated coq which helps with debugging the proof snippets themselves.

Development

Grackle development makes use of several tools selected solely for my own convenience, however they aren’t required to use or development the software. The only hard requirement is to have go and protoc installed (although I’d highly recommend having Coq installed to so that you can compile the output proofs). Notably, no code generation plugin is needed for protoc. Grackle uses it to generate a descriptor set which can be parsed by the go protobuf libraries to interpret the structure of the input proto files.

Other things to be aware of in the grackle repository:

  • direnv is setup to use a nix flake. This is because I use nixos as my primary operating system. The flake notably installs go, coq, protoc and goose. Goose is invoked by grackle via a go package, so it does not need to be installed on your system, but I’ve found it helpful to have around.
  • The run_tests.fish script runs grackle over the example proto files, then checks that all the output code compiles. Finally it runs the go test suite, which is admittedly limited since it’s easy to generate code that looks correct but doesn’t compile. Running this script provides a higher level of confidence than the go test suite itself.

About

Automated proof-instrumented marshalling and unmarshalling for verified distributed systems in Go.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published