Skip to content

Abstractions for Reasoning about Memory in F* Low*

Catalin Hritcu edited this page Jun 4, 2019 · 25 revisions

Background on Low*

Low* is a DSL in F* that allows to build verified C-like programs. It provides a low-level memory model with support for pointers, arrays, stack and heap allocation, deallocation, and arbitrary aliasing patterns. It has a straightforward metatheory built by relating its memory model to that of CompCert Clight.

Low* has been used successfully to verify a variety of non-trivial programs. The main verification task in Low* involves reasoning about memory. For this, Low* provides libraries that structure specifications in the style of Kassios' dynamic frames (aka modifies clauses). This style of specification has a number of benefits: it is general, can handle arbitrary pattern of aliasing, and it is embedded within first-order logic so encodable to an SMT solver.

Nevertheless, reasoning about memory in Low* remains challenging, incurring a significant specification and proof overhead in stating and maintaining anti-aliasing invariants. We find that it scales poorly in two respects:

Scaling problem 1: reasoning about the shape and contents of memory is intermingled with reasoning about other concerns (e.g., arithmetic), and the interactions can cause proofs to slow down significantly.

Scaling problem 2: when programming with a large number of mutable locations, specifications, and proofs, it often becomes quadratic to reason about locations pairwise.

Separation logic

A widespread approach to reasoning about memory (and an alternative to dynamic frames) is separation logic (SL). SL simplifies reasoning about memory, particularly in the common case where accessing mutually disjoint memory locations. In such a setting, SL specifications and proofs can closely resemble reasoning about functional programs that manipulate an n-tuple of values, largely ignoring the fact that these values are actually maintained within mutable memory cells.

Structuring proofs using separation logic stands to significantly simplify the verification of many (most?) Low* programs, since most programs use relatively simple aliasing patterns. However, building verification conditions in separation logic in a manner suitable for encoding to an SMT solver remains an active area of research.

Prior attempts at encoding separation logic in F*/Low* involved the following elements:

  • A library that defined the separating conjunction ( * ) of SL as an assertion on top of F*'s existing memory models; or as a partial (commutative, monoid) operation on the heaps themselves.

  • Specifying programs using * instead of the underlying, lower-level memory model concepts.

  • Building verification conditions as usual, but pre-processing the VC with tactics to solve for quantifiers using AC-rewriting of the *-connective, before feeding this VC to the SMT solver.

While this approach is general and may still be promising, we have yet to make it work at scale. Part of the difficulty is perhaps due to the need to run tactics efficiently over monolithic large VCs that mix a variety of concerns, memory being only one part. In other words, our prior attempts have tackled "Scaling problem 2", but "Scaling problem 1" remains a significant hurdle. But even then, our prior attempts have in most part relied solely on tactics to write and verify such programs, not providing the programmer a convenient manual fallback option, e.g., to make an explicit call to the framing combinator.

Resources and views for separating structured reasoning about memory

Meanwhile, we have been exploring another approach that aims to provide similar SL-like capabilities to reasoning about memory, but through a very different (actually, increasingly more and more similar) mechanism, that aims to also address "Scaling problem 1", i.e., we are aiming to factor reasoning about memory footprints out of the VC completely.

The main idea: RST a (expects res0) (provides res1) (requires pre) (ensures post)

We provide an indexed state effect as a structured way to specify the footprint and an invariant of a computation, together with a high-level view of the relevant portion of the memory.

This contrasts with Low*'s ST a pre post state effect, where read footprints are specified using liveness assertions in pre, and write footprints are specified using modifies assertions in post ...

The pre-resource index res0 : resource on the RST effect describes:

  • the set fp of memory locations (in the initial memory) that a computation may read or write

  • an invariant inv on (the initial) memory that depends on at most fp locations

  • a type t of high-level views of memories satisfying inv

  • a function sel computing a view for a given memory satisfying inv, which (similarly to inv) depends on at most fp locations

The post-resource index res1 : a -> resource is similar to res0, but instead describes the relevant shape of the final memory, and additionally depends on the values returned by the RST computation.

We can then straightforwardly define (i) an empty resource (with fp empty, inv taken to be True, and view type unit), and (ii) a separating conjunction of resources res1 <*> res2 (with fp the union of footprints, inv taken to be the pointwise conjunction of the individual invariants together with an additional footprint disjointness condition, and sel computing the pair of the two views).

For example, consider two styles of specification for a computation that swaps the contents of two pointers.

  1. Classic Low*
  val swap (x y: pointer a)
    : Stack unit
      (requires fun h -> 
          live h x /\
          live h y /\
          disjoint x y)
      (ensures fun h0 _ h1 ->
          live h1 x /\
          live h1 y /\
          h1.[x] == h0.[y] /\
          h1.[y] == h0.[x])
  1. RST
  val swap (x y: pointer a)
    : RST unit (ptr_resource x <*> ptr_resource y)
               (fun _ -> ptr_resource x <*> ptr_resource y)
               (requires fun _ -> True)
               (ensures fun h0 _ h1 ->
                  sel h0 (ptr_view x) == sel h1 (ptr_view y) /\
                  sel h0 (ptr_view y) == sel h1 (ptr_view x))

Some main benefits of this style:

  • It factors out specification and reasoning about memory footprints from the rest of the spec. (addressing Problem 1, i.e., memory reasoning is no longer intermingled with the rest)

  • It internalises reasoning about disjointness/liveness, and maybe even some footprint-local invariants into the resource, freeing the rest of the program from confronting them. (addressing Problem 2, providing a form of separation logic)

  • It provides a user-defined (more) high-level view of the memory on top of the underlying monolithic memory model to write specifications.

The RST a (expects res0) (provides res1) (requires pre) (ensures post) effect is defined on top of Low*'s STATE effect in the usual two steps:

effect RSTATE (a:Type)
              (res0:resource)
              (res1:a -> resource)
              (wp:rstate_wp a res0 res1) =
       STATE a 
           (fun (p:a -> HS.mem -> Type)
              (h0:HS.mem) -> 
                inv res0 h0 /\
                rst_inv res0 h0 /\
                wp (fun x h1 -> 
                     inv (res1 x) h1 /\
                     rst_inv (res1 x) h1 /\
                     modifies res0 (res1 x) h0 h1 ==>
                     p x h1) h0)

effect RST (a:Type)
           (res0:resource)
           (res1:a -> resource)
           (pre:r_pre res0)
           (post:r_post res0 a res1) = 
       RSTATE a res0 res1 (fun p h0 -> 
         pre h0 /\ (forall x h1 . post h0 x h1 ==> p x h1))

Regarding the invariants involved in this effect, inv res is the invariant of the resource; rst_inv is an abstract invariant about the footprint of res being among the not unused locations in a given memory; and modifies res0 res1 h0 h1 abstractly packages up Low*'s modifies clause on resource res0's footprint, and a 2-state invariant that h0 and h1 are only modified in was that are closed under framing.

The RST a res0 res1 pre post effect differs from the previous iteration of these ideas (the LensST a l pre post effect) as follows:

  • The resources res0 and res1 do not contain abstract sets of handles (to name the memory locations in fp) any more. The motivation for this change is two-fold: on the one hand, this information is already naturally encapsulated in the definition of sel. But more importantly, as RST is just a shallow embedding, it does not seem feasible to automatically generate and manage such handles, e.g., compared to what one can do about them in a deeply embedded setting, such as Vale.

  • The resource contains only a (high-level) view of the memory instead of a full-blown lens (with both view and update components). This change was motivated by the style of specs we are already writing, i.e., the sel-modifies style, which naturally does not include uses of memory updates. In fact, with this new RST effect, we will no more make use of modifies clauses in user-written specifications, instead making use of equality of views of frames for that purpose.

  • The pre- and postconditions in RST a res0 res1 pre post are on entire memories (restricted to satisfy the inv of res0 and res1), instead of being only on the view types t. This change is related to not associating any abstract set of handles with a resource. While the pre- and postconditions are defined on the entire memory, we expect that the style of specification one writes will be given in terms of the views of (abstract) resources (see the type of the swap function above). Also, having access to the entire memory allows us to easily define the framing combinator in terms of the view of the frame (below).

Compositionality and framing

(This was previously stated as an outstanding challenge in the LensST iteration.)

In order to (i) define the framing combinator, and (ii) relate abstract resources to resources describing their internal components (e.g., relating a 2D point with two disjoint pointers), we define a general notion of (constructive) resource inclusion as a predicate/relation can_be_split_into (outer:resource) ((inner,delta):resource & resource), which holds if the inner resource is included in the outer resource witnessed by the delta resource (the predicate requires delta to truly be a delta-resource, in terms of both fp and inv).

For example, we can then straightforwardly give instances of can_be_split_into (res1 <*> res2) (res1,res2) and can_be_split_into (res1 <*> res2) (res2,res1). But this notion of resource inclusion can also be used to pack and unpack abstract resources, e.g., can_be_split_into (point_resource (mk_point x y)) (ptr_resource x <*> ptr_resource y,empty_resource) that unpacks an abstract 2D point into two disjoint pointer resources, and vice versa.

The plan is to infer these deltas (at least of the first kind) using tactics and meta-programming, and then use them in the applications of the framing combinator. Note that the generality of this notion of resource inclusion means that we ought to be instantiate our existing commutative monoid tactics to split a term built out of resources and <*>s into two resources, the footprint and the frame, and them simply treat the frame part as the needed resource-delta.

The framing combinator itself can then be defined as follows:

let frame #outer0 #inner0 #a #outer1 #inner1
          (delta:resource{(outer0 `can_be_split_into` (inner1,delta)) /\
                          (forall x . (outer1 x) `can_be_split_into` (inner1 x,delta))})
          #pre #post
          (f:unit -> RST a inner0 inner1 pre post)
  : RST a outer0
          outer1
          pre
          (fun h0 x h1 ->
             post h0 x h1 /\ 
             sel (view_of delta) h0 == sel (view_of delta) h1)
  = f ()

where we use the view of the frame (delta) to state that the frame remains unchanged during the evaluation of f, as opposed to using Low*'s modifies clause. For this to be possible, it is crucial that we store an entire resource as the delta, rather than, say, just a delta-footprint.

As a first stepping stone, we would like to infer the delta argument using tactics, and then at a later date perhaps add custom support to F* to also insert the frame combinator itself automatically.

As noted above, compared to our prior separation logic attempts, where framing was built into the specifications of actions and we solely relied on tactics for discharging the generated VCs, this new RST style allows programmers to call the frame combinator explicitly in case it or the delta cannot be inferred, or if that inference has not yet been implemented (like now).

Abstraction

(This was also stated as an outstanding challenge in the LensST iteration.)

As the initial experiments show, working with abstract resources fits neatly into the RST a res0 res1 pre post setting. But more needs to be done, e.g., moving from the toy example of 2D points to (doubly) linked lists and other more complex datastructures.

Allocation

The RST a res0 res1 pre post setting also naturally accommodates allocation (of, say, pointers), both in a scoped and unscoped manner.

The types of unscoped allocation and deallocation are as expected:

val ptr_alloc (#a:Type)
              (init:a)
            : RST (B.pointer a)
                (empty_resource)
                (fun ptr -> ptr_resource ptr)
                (fun _ -> True)
                (fun _ ptr h1 -> 
                   sel (ptr_view ptr) h1 == init)

val ptr_free (#a:Type)
             (ptr:B.pointer a)
           : RST unit (ptr_resource ptr)
                      (fun ptr -> empty_resource)
                      (fun _ -> True)
                      (fun _ ptr h1 -> True)

The type of scoped allocation is also as expected,

val with_new_ptr (#res:resource)
                 (#a:Type)
                 (init:a)
                 (#b:Type)
                 (#pre:with_new_ptr_pre res)
                 (#post:with_new_ptr_post res b (fun _ -> res))
                 (f:(ptr:B.pointer a -> RST b (res <*> (ptr_resource ptr))
                                              (fun _ -> res <*> (ptr_resource ptr))
                                              (fun h -> pre h /\ sel (ptr_view ptr) h == init) 
                                              (post))) 
               : RST b res pre post

where <with_new_ptr_pre res and with_new_ptr_post res b (fun _ -> res) are types of refined stateful pre- and postconditions that are required to be stable under memory mutations that leave the view of res unchanged. Internally, scoped allocation can essentially just be defined in terms of unscoped allocation and deallocation (admittedly, omitting some implicit arguments that are not currently automatically inferred by F*):

let with_new_ptr #res #a init #b #pre #post f = 
  let ptr = frame res (fun _ -> ptr_alloc init) in
  let x = f ptr in 
  frame res (fun _ -> ptr_free ptr);
  x

Some code snippets

The current state of the RST a res0 res1 pre post can be found in the danel_lowstar_lens branch in:

Some further challenges

  1. Abstract effects and effect layering to truly hide any modifies-based reasoning from the uses of RST (see this proposal).

    In short, as currently RSTATE and RST are defined as a shallow encoding on top of STATE, then all the invariants and our abstract modifie clause used in this encoding are visible in the VCs sent to SMT. As a result, e.g., when one does 100 swaps of pointer values (see Swap.fst), the transitivity axiom for modifies gets invoked 100 times, leading to one needing to fine-tune the SMT settings to account for this.

    Instead, if F* had support for abstract effects as proposed in the above note, one could instead use a derived bind combinator which hides all reasoning about resource invariants and the abstract modifies clause (see here).

  2. Concurrency (either || or fork,join). At first axiomatically, later by computing all possible interleavings on computation trees by using the technology developed in the Dijkstra Monads For All project (draft paper).

    Important note: to be able to use liveness as a notion of ownership, we need to rule out being able to recall it automatically using monotonicity. One idea to achieve this we have been talking about is to split the current Low*'s STATE effect into two, one that supports monotonicity (witness,recall) and one that doesn't (functions relying on monotonicity would then have additional pre-conditions). (Concurrent) RSTATE and RST would then be defined in terms of the latter effect.

  3. Monotonicity for resources, both for the sequential and concurrent settings. Here the idea would be to additionally package a preorder into the resources used in the RST effect. The hope is that when monotonicity is lifted to the level of resources we can do examples currently out of reach in Low* (monotonicity about abstract datastructures, while currently monotonicity is limited point-wise to references/buffers).

  4. Accommodating local mutable variables (var x = 10 in ...) not by modelling them as resources but instead as a finite register map from natural numbers to values. This involves extending RSTATE and RST's memory from being a refinement of HyperStack.mem to HyperStack.mem & DMap nat (Option (a:Type & a)).

  5. New front-end support to: (i) infer frames, (ii) insert the framing combinator, (iii) infer the view/resource argument for calls to the sel function in specifications, (iv) map local mutable variables to natural-number keys in the register map, (v) ...

Clone this wiki locally