Skip to content

Abstractions for Reasoning about Memory in F* Low*

Danel Ahman edited this page May 13, 2019 · 25 revisions

Background on Low*

Low* is a DSL in F* that allows to build verified C-like programs. It provides a low-level memory model with support for pointers, arrays, stack and heap allocation, deallocation, and arbitrary aliasing patterns. It has a straightforward metatheory built by relating its memory model to that of CompCert Clight.

Low* has been used successfully to verify a variety of non-trivial programs. The main verification task in Low* involves reasoning about memory. For this, Low* provides libraries that structure specifications in the style of Kassios' dynamic frames (aka modifies clauses). This style of specification has a number of benefits: it is general, can handle arbitrary pattern of aliasing, and it is embedded within first-order logic so encodable to an SMT solver.

Nevertheless, reasoning about memory in Low* remains challenging, incurring a significant specification and proof overhead in stating and maintaining anti-aliasing invariants. We find that it scales poorly in two respects:

Scaling problem 1: reasoning about the shape and contents of memory is intermingled with reasoning about other concerns (e.g., arithmetic), and the interactions can cause proofs to slow down significantly.

Scaling problem 2: when programming with a large number of mutable locations, specifications, and proofs, it often becomes quadratic to reason about locations pairwise.

Separation logic

A widespread approach to reasoning about memory (and an alternative to dynamic frames) is separation logic (SL). SL simplifies reasoning about memory, particularly in the common case where accessing mutually disjoint memory locations. In such a setting, SL specifications and proofs can closely resemble reasoning about functional programs that manipulate an n-tuple of values, largely ignoring the fact that these values are actually maintained within mutable memory cells.

Structuring proofs using separation logic stands to significantly simplify the verification of many (most?) Low* programs, since most programs use relatively simple aliasing patterns. However, building verification conditions in separation logic in a manner suitable for encoding to an SMT solver remains an active area of research.

Prior attempts at encoding separation logic in F*/Low* involved the following elements:

  • A library that defined the separating conjunction ( * ) of SL as an assertion on top of F*'s existing memory models.

  • Specifying programs using * instead of the underlying, lower-level memory model concepts

  • Building verification conditions as usual, but pre-processing the VC with tactics to solve for quantifiers using AC-rewriting of the *-connective, before feeding this VC to the SMT solver.

While this approach is general and may still be promising, we have yet to make it work at scale. Part of the difficulty is perhaps due to the need to run tactics efficiently over monolithic large VCs that mix a variety of concerns, memory being only one part. In other words, our prior attempts have tackled "Scaling problem 2", but "Scaling problem 1" remains a significant hurdle. But even then, our prior attempts have in most part relied solely on tactics to write and verify such programs, not providing the programmer a convenient manual fallback option, e.g., to make an explicit call to the framing combinator.

Resources and views for separating structured reasoning about memory

Meanwhile, we have been exploring another approach that aims to provide similar SL-like capabilities to reasoning about memory, but through a very different (actually, increasingly more related) mechanism, that aims to also address "Scaling problem 1", i.e., we are aiming to factor reasoning about memory footprints out of the VC completely.

The main idea: RST a res pre post

We provide an indexed effect as a structured way to specify the footprint and an invariant of a computation, together with a high-level view of the relevant portion of the memory.

This contrasts with Low*'s ST a pre post effect, where read footprints are specified using liveness assertions in pre, and write footprints are specified using modifies assertions in post ...

The index res : resource on the RST effect describes:

  • the set fp of memory locations that a computation may read or write

  • an invariant inv on memory that depends on at most fp locations

  • a type t of high-level views of memories satisfying inv

  • a function sel computing a view for a given memory satisfying inv, which (similarly to inv) depends on at most fp locations

We can then straightforwardly define (i) an empty resource (with fp empty, inv taken to be True, and view type unit), and (ii) a separating conjunction of resources res1 <*> res2 (with fp the union of footprints, inv taken to be the pointwise conjunction of the individual invariants together with an additional footprint disjointness condition, and sel computing the pair of the two views).

For example, consider two styles of specification for a computation that swaps the contents of two pointers.

  1. Classic Low*
  val swap (x y: pointer a)
    : Stack unit
      (requires fun h -> 
          live h x /\
          live h y /\
          disjoint x y)
      (ensures fun h0 _ h1 ->
          live h1 x /\
          live h1 y /\
          h1.[x] == h0.[y] /\
          h1.[y] == h0.[x])
  1. RST
  val swap (x y: pointer a)
    : RST unit (ptr_resource x <*> ptr_resource y)
      (requires fun _ -> True)
      (ensures fun h0 _ h1 ->
          sel h0 (ptr_view x) == sel h1 (ptr_view y) /\
          sel h0 (ptr_view y) == sel h1 (ptr_view x))

Some main benefits of this style:

  • It factors out specification and reasoning about memory footprints from the rest of the spec. (addressing problem 1, i.e., memory reasoning is no longer intermingled with the rest)

  • It internalises reasoning about disjointness/liveness, and maybe even some footprint-local invariants into the resource, freeing the rest of the program from confronting them. (addressing problem 2, providing a form of separation logic)

  • It provides a user-defined (more) high-level view of the memory on top of the underlying monolithic memory model to write specifications.

The RST a res pre post effect is defined on top of Low*'s stack effect

effect RST (a:Type)
           (res:resource)
           (pre:imem (inv res) -> Type0)
           (post: imem (inv res) -> a -> imem (inv res) -> Type0) = 
       STATE a
            (fun (k:a -> HS.mem -> Type)
               (h0:HS.mem) ->
               inv res h0 /\                 //Require the resource invariant
               rst_inv res h0 /\             //Require the additional footprints invariant
               pre h0 /\                     //Require the pre-condition
               (forall (x:a) (h1:HS.mem).
                 inv res h1 /\               //Ensure the resource invariant
                 rst_inv res h1 /\           //Ensure the additional footprints invariant
                 modifies res h0 h1 /\       //Ensure that only resource's footprint is modified
                 post h0 x h1 ==>            //Ensure the post-condition
                 k x h1))                    //prove the continuation under this hypothesis

where imem inv = h:HS.mem{inv h}. Regarding the invariants involved in this effect, inv res is the invariant of the resource; rst_inv is an abstract invariant about the footprint of res being among the not unused locations in a given memory; and modifies res h0 h1 abstractly packages up Low*'s modifies clause on the resource's footprint and hyperstack's domain equality.

Note: In order to work with the invariant rst_inv, we also need Low*'s buffer library to provide us additional lemmas about abstract memory locations not being reused after they are freed, such as

assume val lemma_loc_not_unused_in_modifies (l:B.loc) (h0 h1:HS.mem)
  : Lemma (requires (B.loc_includes (B.loc_not_unused_in h0) l /\ B.modifies l h0 h1))
          (ensures  (B.loc_includes (B.loc_not_unused_in h1) l))

The RST a res pre post effect differs from the previous iteration of these ideas (the LensST a l pre post effect) as follows:

  • The resource res does not contain an abstract set of handles (to name the memory locations in fp) any more. The motivation for this change is two-fold: on the one hand, this information is already naturally encapsulated in the definition of sel. But more importantly, as RST is just a shallow embedding, it does not seem feasible to automatically generate and manage such handles, e.g., compared to what one can do about them in a deeply embedded setting, such as Vale.

  • The resource contains only a (high-level) view of the memory instead of a full-blown lens (with both view and update components). This change was motivated by the style of specs we are already writing, i.e., the sel-modifies style, which naturally does not include uses of memory updates. In fact, with this new RST effect, we will no more make use of modifies clauses in user-written specifications, instead making use of equality of views of frames for that purpose.

  • The pre- and postconditions in RST a res pre post are on entire memories (restricted to satisfy the inv of res), instead of being on the view type t. This change is related to not associating any abstract set of handles with a resource. While the pre- and postconditions are defined on the entire memory, we expect that the style of specification one writes will be given in terms of the views of (abstract) resources (see the type of the swap function above). Also, having access to the entire memory allows us to easily define the framing combinator in terms of the view of the frame (below).

Compositionality and framing

(This was previously stated as an outstanding challenge in the LensST iteration.)

In order to (i) define the framing combinator, and (ii) relate abstract resources to resources describing their internal components (e.g., relating a 2D point with two disjoint pointers), we define a general notion of (constructive) resource inclusion as a type let r_includes (outer inner:resource) = delta:resource{...} of resource-deltas between outer and inner, where the refinement requires delta to truly be a delta (in terms of fp and inv).

We can then straightforwardly give instances of r_includes (res1 <*> res2) res1 and r_includes (res2 <*> res1) res1, both given by res2. But this notion of resource inclusion can also be used to pack and unpack abstract resources, e.g., r_includes (ptr_resource x <*> ptr_resource y) (point_resource (mk_point x y)), that packs a pair of pointers into one abstract 2D point, and vice versa.

The plan is to infer these deltas (at least of the first kind) using tactics and meta-programming, and then use them in the applications of the framing combinator. Note that the generality of this notion of resource inclusion means that we ought to be instantiate our existing commutative monoid tactics to split a term built out of resources and <*>s into two resources, the footprint and the frame, and them simply treat the frame part as the needed resource-delta.

The framing combinator itself can then be defined as follows:

let frame #a #outer #inner (delta:r_includes outer inner) #pre #post
          (f:unit -> RST a inner pre post)
  : RST a outer pre (fun h0 x h1 -> post h0 x h1 /\ 
                       sel (view_of delta) h0 == sel (view_of delta) h1)
  = f ()

where we use the view of the frame to state that the frame remains unchanged during the evaluation of f, as opposed to using the modifies clause. For this to be possible, it is crucial that we store an entire resource as the delta, rather than, say, just a delta-footprint.

As a first stepping stone, we would like to infer the delta argument using tactics, and then at a later date perhaps add custom support to F* to insert frames automatically.

As noted above, compared to our prior separation logic attempts, where framing was built into the specifications of actions and we solely relied on tactics for discharging the generated VCs, this new RST style allows programmers to call the frame combinator explicitly in case it or the delta cannot be inferred, or if that inference has not yet been implemented (like now).

Abstraction

(This was also stated as an outstanding challenge in the LensST iteration.)

As the initial experiments show, working with abstract resources fits neatly into the RST a res pre post setting. But more needs to be done, e.g., moving from the toy example of 2D points to linked lists and other more complex datastructures.

Allocation

The RST a res pre post setting also naturally accommodates allocation (of, say, pointers). However, as a result of there being only one resource index on the effect, it only supports scoped allocation. For more general allocation, see below.

The type of scoped allocation of pointers looks as follows:

val with_new_ptr (#res:resource)
                 (#a:Type)
                 (init:a)
                 (#b:Type)
                 (#pre:with_new_ptr_pre res)
                 (#post:with_new_ptr_post res b)
                 (f:(ptr:B.pointer a -> RST b (res <*> (ptr_resource ptr)) 
                                              (fun h -> pre h /\ sel (ptr_view ptr) h == init) 
                                              post)) 
               : RST b res pre post

where <with_new_ptr_pre res and with_new_ptr_post res b are types of refined pre- and postconditions that are required to be suitably stable under pointer allocation (pushing and popping frames).

Code snippets

The current state of the RST a res pre post can be found in the danel_lowstar_lens branch in:

Beyond RST a res pre post

While our initial experiences with RST a res pre post are positive, there are still many places which we can improve.

For instance, coming back to the 2D point example, let us restrict ourselves to points that also carry an additional invariant relating the values of the two coordinates, e.g., saying that they have to be equal, keeping the point on the diagonal. In this case, we cannot give an instance of the resource inclusion r_includes (eq_point_resource p) (ptr_resource p.x <*> ptr_resource p.y) because we are forced to select the empty resource as the only delta candidate (delta can only have the empty footprint), and as a result we no longer can prove that the outer resource's invariant is equivalent to the conjunction of the inner resources' invariants.

We can work around this problem by (i) defining a weaker form of resource inclusion (invariants do not have to be equivalent but instead the outer invariant just has to imply the conjunction of the inner invariants), and (ii) by defining a variant of frame that is restricted to postconditions that are sufficient to re-establish the outer invariant at the end of frame. In that sense, this variant of frame is a little bit similar to the use of snapshots in our monotonic state work to temporarily escape the preorder.

However, while we can work around the above problem, we are still left facing other problems, e.g., how to give an instance of r_includes (ptr_resource p.x <*> ptr_resource p.y) (eq_point_resource p), and others alike, to use framing for packing-unpacking resources.

We believe running into such problems is a direct consequence of RST being indexed by a single resource, leading to various combinators having to be defined in a scoped manner.

Instead, we see as a possible next step on from RST a res pre post an effect indexed by three resources

RST a (expects res1) (returns res2) (provides res3) pre post

where res1 are the resources the caller has to hold to call a computation of type RST a res1 res2 res3 pre post, res2 are the resources that are returned back to the caller, and res3 are the fresh resources generated by the given computation and handed to the caller. Underneath we would also have invariants such as that res2 is included in res1, and that res3 is disjoint from res1.

In particular, we expect that this 3-index approach will allow us to implement both non-scoped allocation, and packing-unpacking invariants not possible with a single resource index. In particular, we would turn packing and unpacking abstract resources from being having to be expressed as resource inclusion (as in the above 1-index style) to effectful computations consuming-returning-providing resources. Of course we would also keep the notion of resource inclusion, so that we can still define framing.

For example, for the equal-coordinates 2D point, we imagine the following example scenario (with framing implicit):

{empty_resource}
let (x,y) = (alloc 0,alloc 0) in
{ptr_resource x <*> ptr_resource y}
let p = pack x y in
{eq_point_resource p}
move_up (); move_down (); move_up ();
{eq_point_resource p}
let (x',y') = unpack p in
{ptr_resource x' <*> ptr_resource y'}
assert (!x' = !y');
{ptr_resource x' <*> ptr_resource y'}
dealloc x'; dealloc y'
{empty_resource}

with pack and unpack having the following types

val unpack (p:eq_point)
  : RST (B.pointer int & B.pointer int)
        (expects  (eq_point_resource p))
        (returns  (empty_resource))
        (provides (fun (x,y) -> ptr_resource x <*> ptr_resource y))
        (requires (fun h -> True))
        (ensures  (fun h0 (x,y) h1 -> sel_x (point_view p) h0 = sel (ptr_view x) h1 /\
                                      sel_y (point_view p) h0 = sel (ptr_view y) h1
                                      (* sel (ptr_view x) h1 = sel (ptr_view y) h1 *)))

val pack (x y:B.pointer int)
  : RST (eq_point)
        (expects  (ptr_resource x <*> ptr_resource y))
        (returns  (empty_resource))
        (provides (fun p -> eq_point_resource p))
        (requires (fun h -> sel (ptr_view x) h = sel (ptr_view y) h))
        (ensures  (fun h0 p h1 -> sel (ptr_view x) h0 = sel_x (point_view p) h1 /\
                                  sel (ptr_view y) h0 = sel_y (point_view p) h1 /\
                                  (* sel_x (point_view p) h1 = sel_y (point_view p) h1 *)))
Clone this wiki locally