-
Notifications
You must be signed in to change notification settings - Fork 236
Abstractions for Reasoning about Memory in F* Low*
Low* is a DSL in F* that allows to build verified C-like programs. It provides a low-level memory model with support for pointers, arrays, stack and heap allocation, deallocation, and arbitrary aliasing patterns. It has a straightforward metatheory built by relating its memory model to that of CompCert Clight.
Low* has been used successfully to verify a variety of non-trivial programs. The main verification task in Low* involves reasoning about memory. For this, Low* provides libraries that structure specifications in the style of Kassios' dynamic frames (aka modifies clauses). This style of specification has a number of benefits: it is general, can handle arbitrary pattern of aliasing, and it is embedded within first-order logic so encodable to an SMT solver.
Nevertheless, reasoning about memory in Low* remains challenging, incurring a significant specification and proof overhead in stating and maintaining anti-aliasing invariants. We find that it scales poorly in two respects:
Scaling problem 1: reasoning about the shape and contents of memory is intermingled with reasoning about other concerns (e.g., arithmetic), and the interactions can cause proofs to slow down significantly.
Scaling problem 2: when programming with a large number of mutable locations, specifications, and proofs, it often becomes quadratic to reason about locations pairwise.
A widespread approach to reasoning about memory (and an alternative to dynamic frames) is separation logic (SL). SL simplifies reasoning about memory, particularly in the common case where accessing mutually disjoint memory locations. In such a setting, SL specifications and proofs can closely resemble reasoning about functional programs that manipulate an n-tuple of values, largely ignoring the fact that these values are actually maintained within mutable memory cells.
Structuring proofs using separation logic stands to significantly simplify the verification of many (most?) Low* programs, since most programs use relatively simple aliasing patterns. However, building verification conditions in separation logic in a manner suitable for encoding to an SMT solver remains an active area of research.
Prior attempts at encoding separation logic in F*/Low* involved the following elements:
-
A library that defined the separating conjunction ( * ) of SL as an assertion on top of F*'s existing memory models; or as a partial (commutative, monoid) operation on the heaps themselves.
-
Specifying programs using * instead of the underlying, lower-level memory model concepts.
-
Building verification conditions as usual, but pre-processing the VC with tactics to solve for quantifiers using AC-rewriting of the *-connective, before feeding this VC to the SMT solver.
While this approach is general and may still be promising, we have yet to make it work at scale. Part of the difficulty is perhaps due to the need to run tactics efficiently over monolithic large VCs that mix a variety of concerns, memory being only one part. In other words, our prior attempts have tackled "Scaling problem 2", but "Scaling problem 1" remains a significant hurdle. But even then, our prior attempts have in most part relied solely on tactics to write and verify such programs, not providing the programmer a convenient manual fallback option, e.g., to make an explicit call to the framing combinator.
Meanwhile, we have been exploring another approach that aims to provide similar SL-like capabilities to reasoning about memory, but through a very different (actually, increasingly more and more similar) mechanism, that aims to also address "Scaling problem 1", i.e., we are aiming to factor reasoning about memory footprints out of the VC completely.
The main idea: RST a (expects res0) (provides res1) (requires pre) (ensures post)
We provide an indexed state effect as a structured way to specify the footprint and an invariant of a computation, together with a high-level view of the relevant portion of the memory.
This contrasts with Low*'s ST a pre post
state effect, where read footprints
are specified using liveness assertions in pre, and write footprints
are specified using modifies
assertions in post
...
The pre-resource index res0 : resource
on the RST effect describes:
-
the set
fp
of memory locations (in the initial memory) that a computation may read or write -
an invariant
inv
on (the initial) memory that depends on at mostfp
locations -
a type
t
of high-level views of memories satisfyinginv
-
a function
sel
computing a view for a given memory satisfyinginv
, which (similarly toinv
) depends on at mostfp
locations
The post-resource index res1 : a -> resource
is similar to res0
,
but instead describes the relevant shape of the final memory, and
additionally depends on the values returned by the RST computation.
We can then straightforwardly define (i) an empty resource (with fp
empty, inv
taken to be True
, and view type unit
), and (ii) a
separating conjunction of resources res1 <*> res2
(with fp
the
union of footprints, inv
taken to be the pointwise conjunction of
the individual invariants together with an additional footprint
disjointness condition, and sel
computing the pair of the two views).
For example, consider two styles of specification for a computation that swaps the contents of two pointers.
- Classic Low*
val swap (x y: pointer a)
: Stack unit
(requires fun h ->
live h x /\
live h y /\
disjoint x y)
(ensures fun h0 _ h1 ->
live h1 x /\
live h1 y /\
h1.[x] == h0.[y] /\
h1.[y] == h0.[x])
- RST
val swap (x y: pointer a)
: RST unit (ptr_resource x <*> ptr_resource y)
(fun _ -> ptr_resource x <*> ptr_resource y)
(requires fun _ -> True)
(ensures fun h0 _ h1 ->
sel h0 (ptr_view x) == sel h1 (ptr_view y) /\
sel h0 (ptr_view y) == sel h1 (ptr_view x))
Some main benefits of this style:
-
It factors out specification and reasoning about memory footprints from the rest of the spec. (addressing Problem 1, i.e., memory reasoning is no longer intermingled with the rest)
-
It internalises reasoning about disjointness/liveness, and maybe even some footprint-local invariants into the resource, freeing the rest of the program from confronting them. (addressing Problem 2, providing a form of separation logic)
-
It provides a user-defined (more) high-level view of the memory on top of the underlying monolithic memory model to write specifications.
The RST a (expects res0) (provides res1) (requires pre) (ensures post)
effect is defined on top of Low*'s STATE
effect in the usual two steps:
effect RSTATE (a:Type)
(res0:resource)
(res1:a -> resource)
(wp:rstate_wp a res0 res1) =
STATE a
(fun (p:a -> HS.mem -> Type)
(h0:HS.mem) ->
inv res0 h0 /\
rst_inv res0 h0 /\
wp (fun x h1 ->
inv (res1 x) h1 /\
rst_inv (res1 x) h1 /\
modifies res0 (res1 x) h0 h1 ==>
p x h1) h0)
effect RST (a:Type)
(res0:resource)
(res1:a -> resource)
(pre:r_pre res0)
(post:r_post res0 a res1) =
RSTATE a res0 res1 (fun p h0 ->
pre h0 /\ (forall x h1 . post h0 x h1 ==> p x h1))
Regarding the invariants
involved in this effect, inv res
is the invariant of the resource;
rst_inv
is an abstract invariant about the footprint of res
being among the not unused locations in a given memory; and
modifies res0 res1 h0 h1
abstractly packages up Low*'s modifies
clause on resource res0
's footprint, and a 2-state invariant
that h0
and h1
are only modified in was that are closed under
framing.
The RST a res0 res1 pre post
effect differs from the previous iteration of
these ideas (the LensST a l pre post
effect) as follows:
-
The resources
res0
andres1
do not contain abstract sets of handles (to name the memory locations infp
) any more. The motivation for this change is two-fold: on the one hand, this information is already naturally encapsulated in the definition ofsel
. But more importantly, asRST
is just a shallow embedding, it does not seem feasible to automatically generate and manage such handles, e.g., compared to what one can do about them in a deeply embedded setting, such as Vale. -
The resource contains only a (high-level) view of the memory instead of a full-blown lens (with both view and update components). This change was motivated by the style of specs we are already writing, i.e., the
sel-modifies
style, which naturally does not include uses of memory updates. In fact, with this newRST
effect, we will no more make use ofmodifies
clauses in user-written specifications, instead making use of equality of views of frames for that purpose. -
The pre- and postconditions in
RST a res0 res1 pre post
are on entire memories (restricted to satisfy theinv
ofres0
andres1
), instead of being only on the view typest
. This change is related to not associating any abstract set of handles with a resource. While the pre- and postconditions are defined on the entire memory, we expect that the style of specification one writes will be given in terms of the views of (abstract) resources (see the type of theswap
function above). Also, having access to the entire memory allows us to easily define the framing combinator in terms of the view of the frame (below).
Compositionality and framing
(This was previously stated as an outstanding challenge in the LensST
iteration.)
In order to (i) define the framing combinator, and (ii) relate
abstract resources to resources describing their internal
components (e.g., relating a 2D point with two disjoint pointers),
we define a general notion of (constructive) resource inclusion
as a predicate/relation
can_be_split_into (outer:resource) ((inner,delta):resource & resource)
,
which holds if the inner
resource is included in the outer
resource
witnessed by the delta
resource (the predicate requires delta
to truly be a delta-resource, in terms of both fp
and inv
).
For example, we can then straightforwardly give instances of
can_be_split_into (res1 <*> res2) (res1,res2)
and
can_be_split_into (res1 <*> res2) (res2,res1)
.
But this notion of resource inclusion can also be used to pack
and unpack abstract resources, e.g.,
can_be_split_into (point_resource (mk_point x y)) (ptr_resource x <*> ptr_resource y,empty_resource)
that unpacks an abstract 2D point into two disjoint pointer resources, and vice versa.
The plan is to infer these deltas (at least of the first kind) using
tactics and meta-programming, and then use them in the applications of the
framing combinator. Note that the generality of this notion of
resource inclusion means that we ought to be instantiate our existing
commutative monoid tactics to split a term built out of resources
and <*>
s into two resources, the footprint and the frame, and
them simply treat the frame part as the needed resource-delta.
The framing combinator itself can then be defined as follows:
let frame #outer0 #inner0 #a #outer1 #inner1
(delta:resource{(outer0 `can_be_split_into` (inner1,delta)) /\
(forall x . (outer1 x) `can_be_split_into` (inner1 x,delta))})
#pre #post
(f:unit -> RST a inner0 inner1 pre post)
: RST a outer0
outer1
pre
(fun h0 x h1 ->
post h0 x h1 /\
sel (view_of delta) h0 == sel (view_of delta) h1)
= f ()
where we use the view of the frame (delta
) to state that the
frame remains unchanged during the evaluation of f
, as opposed
to using Low*'s modifies clause. For this to
be possible, it is crucial that we store an entire resource
as the delta, rather than, say, just a delta-footprint.
As a first stepping stone, we would like to infer the delta
argument
using tactics, and then at a later date perhaps add custom support to
F* to also insert the frame
combinator itself automatically.
As noted above, compared to our prior separation logic attempts,
where framing was built into the specifications of actions and we
solely relied on tactics for discharging the generated VCs,
this new RST
style allows programmers to call the frame
combinator explicitly in case it or the delta
cannot be inferred,
or if that inference has not yet been implemented (like now).
Abstraction
(This was also stated as an outstanding challenge in the LensST
iteration.)
As the initial experiments show, working with abstract resources
fits neatly into the RST a res0 res1 pre post
setting. But more
needs to be done, e.g., moving from the toy example of 2D points
to (doubly) linked lists and other more complex datastructures.
Allocation
The RST a res0 res1 pre post
setting also naturally accommodates
allocation (of, say, pointers), both in a scoped and unscoped manner.
The types of unscoped allocation and deallocation are as expected:
val ptr_alloc (#a:Type)
(init:a)
: RST (B.pointer a)
(empty_resource)
(fun ptr -> ptr_resource ptr)
(fun _ -> True)
(fun _ ptr h1 ->
sel (ptr_view ptr) h1 == init)
val ptr_free (#a:Type)
(ptr:B.pointer a)
: RST unit (ptr_resource ptr)
(fun ptr -> empty_resource)
(fun _ -> True)
(fun _ ptr h1 -> True)
The type of scoped allocation is also as expected,
val with_new_ptr (#res:resource)
(#a:Type)
(init:a)
(#b:Type)
(#pre:with_new_ptr_pre res)
(#post:with_new_ptr_post res b (fun _ -> res))
(f:(ptr:B.pointer a -> RST b (res <*> (ptr_resource ptr))
(fun _ -> res <*> (ptr_resource ptr))
(fun h -> pre h /\ sel (ptr_view ptr) h == init)
(post)))
: RST b res pre post
where <with_new_ptr_pre res
and with_new_ptr_post res b (fun _ -> res)
are types of refined stateful pre- and postconditions that
are required to be stable under memory mutations that leave
the view of res
unchanged. Internally, scoped allocation
can essentially just be defined in terms of unscoped allocation
and deallocation (admittedly, omitting some implicit arguments
that are not currently automatically inferred by F*):
let with_new_ptr #res #a init #b #pre #post f =
let ptr = frame res (fun _ -> ptr_alloc init) in
let x = f ptr in
frame res (fun _ -> ptr_free ptr);
x
Some code snippets
The current state of the RST a res0 res1 pre post
can be found in the danel_lowstar_lens
branch in:
-
views and resources (LowStar.Resource.fst)
-
RST
effect and framing (LowStar.RST.fst) -
pointer resource (reading, writing, unscoped allocation and deallocation, scoped allocation) (LowStar.RST.Pointer.fst)
-
swapping values of two pointer resources (Swap.fst)
-
2D points (Point.fst, Point.Test.fst)
-
2D points with equal coordinates (EqPoint.fst,EqPoint.Test.fst)
-
Abstract effects and effect layering to truly hide any modifies-based reasoning from the uses of RST (see this proposal).
In short, as currently RSTATE and RST are defined as a shallow encoding on top of STATE, then all the invariants and our abstract modifie clause used in this encoding are visible in the VCs sent to SMT. As a result, e.g., when one does 100 swaps of pointer values (see
Swap.fst
), the transitivity axiom for modifies gets invoked 100 times, leading to one needing to fine-tune the SMT settings to account for this.Instead, if F* had support for abstract effects as proposed in the above note, one could instead use a derived bind combinator which hides all reasoning about resource invariants and the abstract modifies clause (see here).
-
Concurrency (either
||
orfork,join
). At first axiomatically, later by computing all possible interleavings on computation trees by using the technology developed in the Dijkstra Monads For All project (draft paper).Important note: to be able to use liveness as a notion of ownership, we need to rule out being able to recall it automatically using monotonicity. One idea to achieve this we have been talking about is to split the current Low*'s STATE effect into two, one that supports monotonicity (witness,recall) and one that doesn't (functions relying on monotonicity would then have additional pre-conditions). (Concurrent) RSTATE and RST would then be defined in terms of the latter effect.
-
Monotonicity for resources, both for the sequential and concurrent settings. Here the idea would be to additionally package a preorder into the resources used in the RST effect. The hope is that when monotonicity is lifted to the level of resources we can do examples currently out of reach in Low* (monotonicity about abstract datastructures, while currently monotonicity is limited point-wise to references/buffers).
-
Accommodating local mutable variables (
var x = 10 in ...
) not by modelling them as resources but instead as a finite register map from natural numbers to values. This involves extending RSTATE and RST's memory from being a refinement ofHyperStack.mem
toHyperStack.mem & DMap nat (Option (a:Type & a))
. -
New front-end support to: (i) infer frames, (ii) insert the framing combinator, (iii) infer the view/resource argument for calls to the
sel
function in specifications, (iv) map local mutable variables to natural-number keys in the register map, (v) ...