-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Concretizing LLVM memory #1217
Comments
I'm afraid I don't understand the premise of this issue, so I'm not yet able to offer an informed answer to your questions. First, a very dumb question: what is the relationship between the functionality you are trying to implement and
What does the range in this
I don't understand this part at all. What exactly is |
The overall motivation is making it easier to understand what went wrong when a safety assertion failed. Concretizing
This is a very good question! crucible/crucible-llvm/src/Lang/Crucible/LLVM/MemModel/MemLog.hs Lines 758 to 762 in 19e9a52
It goes through a
Right, so this was my first stab at a simpler representation. The keys are block numbers, and the values are the allocations they correspond to.
Yeah sorry, this was a bit in the weeds. This function is trying to take a symbolic memory write (a |
I see! So you're trying to develop a simplified version of memory—got it. In order to answer your questions better, it might help to list the reasons why you find the log-of-writes |
Sure. And to be clear: the problem is not necessarily that I personally find it difficult to read and understand (although this is probably true, and this is how I framed it above), but rather that I would assert that users of Crucible-based tooling are likely to find it obtuse. Consider the following example of a pretty-printed log-of-writes memory:
To understand this printout, users have to know:
Most importantly, to understand what is live at each address in the final state of memory, one actually has to walk over the writes (in their brain), collecting writes and overwriting previous ones. An alternative representation could perform this "collection" in advance, presenting just the "final picture" of what exists in memory, much like a core dump would. |
Excellent, this is a fantastic problem statement. I like the idea of working to make the output more like a core dump. Some various thoughts, which may or may not help inform a design for a simplified memory representation:
|
Thank you for the thoughts!
Ah, but this might hide the order in which the writes occur, making it harder to understand which of two overlapping writes is "live".
Yeah, so the three uses for this data structure are probably:
None of these are super likely to be hot loops, but I think we should still be a little concerned with the performance of constructing concretized memory. This is because concretization has to happen all at once inside a callback that receives the What4 The reason I mention using fixed-size, mutable arrays is because when turning the log-of-writes into a concretized memory, we have to traverse the writes, and apply an equivalent write to the concrete memory. So if a write has a non-zero offset, we'll be seeking/indexing into the concrete allocation in order to perform the write. Using mutable arrays where each entry represents a byte is likely the most performant way to accomplish this. In contrast, using a representation where each entry is not byte-size would incur O(n) overhead to seek to the place where a write should occur (because you would have to traverse the whole prefix of the array, counting the number of bytes you've passed over). |
Yes, this is true. But isn't your goal to produce the "final picture" of the memory state? In that case, I would expect that you would only want to display the most recent information vis-à-vis overlapping writes.
A fair point. If there are ways to make constructing memory a little faster (e.g., by using fixed-size arrays in places that support them), then I'm all for it. That being said, I think that we'll inevitably need some amount of indirections, both for things like unbounded allocations as well as the fact that writes may not occupy contiguous offsets in an allocation region. |
Not to get derailed from the main thrust of the discussion, but I'm not sure that this is possible in the format you described. Consider an allocation of size 4 with two writes: one of three |
Yes, I agree that merging the writes is the least surprising way to present that information. I suppose this means that there will be some amount of cleverness required in implementing the merging logic, but hopefully less cleverness than would be required to map different block numbers onto the same address space. |
When an SMT solver yields
unsat
, it generally provides a model, i.e., an instantiation of all of the variables in the query. Given a model, we can "concretize" Crucible values (RegValue
s), replacing symbolic terms with concrete values (#1207). This enables us to do things like report concrete variable assignments that lead to violations of safety conditions. Concrete values are significantly easier to read and understand, enabling users to more quickly grasp why a proof/simulation fails.For the same use-cases, it would be nice to develop a similar capability for LLVM memories. I started down this path only to realize that there is at least one tricky design question that needs to be resolved: The representation of pointers in the concrete memory.
Background
In the Crucible-LLVM memory model (permalink),
Said another way, the memory model is two-dimensional: One "axis" of a pointer determines which allocation (possibly allocations in the case of a symbolic block identifier) it points to, and the other "axis" is the offset within that allocation.1 For example, the null pointer is represented by the pair
(0, 0x0)
.A first attempt
To situate the problem, consider the following types I drafted up to represent concretized memory (simplified for presentation):
I ran into the problem when trying to handle stores (
MemStore
) of pointers:If the (concretized) block number
blk
is0
in the model, then we could just write out the bytes of the (concretized) offsetoff
. But what if it's not? There is no word that contains as much information as the pointer (because the pointer consists of a word-size offset and a block number), so we appear to be stuck.Some options
So how can we work around this?
(1) Define a representation of pointers as (concrete) bytes: We could associate each block number with an arbitrary base address (likely chosen to be reasonably similar to actual heap addresses on x86_64 Linux). When writing pointers to concrete memory, we would look up the base address and add the offset, and write that word. This has the virtue of simplicity and familiarity - it is actually how pointers work, after all. We would need to be careful when assigning base addresses to avoid seemingly-overlapping allocations, and it's not clear how that would work with Crucible-LLVM's "unbounded" allocations (but perhaps that's an edge case that's not work optimizing for). This has all the disadvantages of real pointers, i.e., some bit-patterns would look like pointers, but might not necessarily have been derived from the pointers they resemble.
(2) Opaque pointer bytes: We could consider changing
MemByte
to somthing more like this:This comes at the cost of both performance and normalization. We can no longer use fixed size arrays for storage, nor index into them as arrays of bytes. Furthermore,
ConcAllocStorage
could contain two adjacentUninitialized
s (orInitialized
s), and so have multiple representations of the same data.(3) Duplicated opaque pointer bytes: We could instead add a constructor to
MemByte
like so:When writing pointers to concrete memory, we would write
N
of these opaquePtrByte
s, whereN
is the size of a pointer. This has the disadvantage of storing (a pointer to, haha) the concrete pointer several times, but has the advantage of retaining some nice performance characteristics (i.e., the ability to treatBoundedStorage
as an array of bytes). It also comes at the price of a bit of denormalization: What would it mean to have fewer thanN
of these in a row in some allocation? Or adjacentPtrByte
s with differentConcLLVMPtr
s?(4) Your solution here: Anyone have a better idea? Or opinions about these options?
Footnotes
Non-pointer bitvectors (e.g. the 5 in
int x = 5;
) are represented the same way as pointers, but with a block identifier that’s concretely 0. ↩The text was updated successfully, but these errors were encountered: