-
Notifications
You must be signed in to change notification settings - Fork 32
Advanced Fleece
This is a continuation of Using Fleece that dives into more features, some of which get pretty esoteric.
A Doc
is a container for holding an encoded Fleece document. It retains the alloc_slice
holding the data, and lets you access its root value. So in a way it’s just a fancy form of Value::fromData()
. But that’s only the tip of the iceberg.
Docs are important for memory management. When you parse Fleece data by calling Value::fromData()
, the resulting values are only valid for as long as the input data is, since they are literally pointers into it. Fleece doesn’t manage that data, you do, and if it’s invalidated all of the values will turn to garbage. By using a Doc, however, you’re giving Fleece a ref-counted heap block containing the data. The Doc retains that alloc_slice
, so the memory remains valid as long as the Doc exists.
If a Value is managed by a Doc, it’s possible to retain that Value, forcing it to stay valid until you release it. Fleece finds the Doc managing the heap block containing that Value, and retains or releases it accordingly. In C++, use the RetainedValue
class; anything you store in it will be retained. In C, call FLValue_Retain
and FLValue_Release
.
Mutable collections make use of this too. A value added to a mutable Dict or Array is always retained so it stays alive. Mutable values simply have an internal reference-count. But adding a read-only value causes it to be retained as described above — the Doc containing it is what actually gets retained.
A similar situation happens if you make a mutable copy of an immutable Dict or Array. The resulting mutable object actually points to the original one, because it’s not a complete copy and may need to access the original later. So it retains the original value. (If you need to avoid this, use the copy mode kFLDeepCopyImmutables
, which creates a deep copy that has no reference to the original.)
Warning: This means that read-only values can only be added to mutable collections, or mutably shallow-copied, if they’re managed by Docs. If not, you’ll get an exception at runtime.
Note: This can in some cases affect memory usage. If you read a large Fleece document into memory using a Doc, add one value from it to a mutable collection, then release the Doc, the entire document is still in memory, and will remain in memory until that value is removed or the parent collection is freed.
Docs are also required for using Fleece documents that were encoded with shared keys; but we haven’t talked about those yet, so this will be described later in the Shared Keys section.
A KeyPath
object represents a sequence of Dict and/or Array lookups. It’s a useful and efficient shortcut for finding values deep inside documents.
You construct a KeyPath from a specifier string, whose syntax is similar to a Swift or Objective-C key-path, or JSONPath or JSONPointer. It looks like “foo.bar[2][-3].baz
” -- that is, properties delimited by “.
”, and array indexes in square brackets. (Negative indexes count from the end of the array.)
A '\\
' can be used to escape a special character (‘.
’, ‘[
’ or ‘$
’) at the start of a property name (but not yet in the middle of a name.)
Note: For compatibility with JSONPath, a leading “
$.
” is allowed but ignored.
Once you’ve created a KeyPath you can call its eval
method and pass in any Value, and the result will be returned. If the path can’t be evaluated fully due to a missing property, an array that’s too short, or a type mismatch, don’t worry; the method will just return NULL.
Note: Try to create a single KeyPath object for a given specifier and use it multiple times, instead of creating it every time you want to use it. Evaluating a KeyPath is faster than manual lookup, but the cost of creating it will outweigh that if you only use it once.
A DeepIterator
performs a recursive traversal of a Value. It’s similar to an Array or Dict iterator, except that it will dive into nested Arrays and Dicts and iterate those too. First the root itself is visited, then all the items in the root container, then all the items in its first sub-container, etc. (So it's breadth-first within a container, but depth-first overall.)
At every step in the iteration, a number of properties are available:
-
value()
returns the current value. -
key()
returns the current key associated with the value in the parent Dict, ornullslice
if you’re not in a Dict. -
index()
returns the current array index, or 0 if you’re not in an Array. -
depth()
returns how deeply nested you are: 0 at the root, 1 at its children, … -
pathString()
returns a KeyPath spec giving the path to the current value. -
JSONPointer()
returns the path in JSONPointer (RFC6901) syntax, in case for some reason you need that.
You can also call skipChildren()
to tell the iterator not to visit the children of the current value (if it’s a collection.)
In some use cases (like a database) you may have large numbers of Fleece documents that have similar schemas. In particular, they’re likely to use a common set of dict keys. These strings are usually short, but their size will add up if every document contains them.
For this reason, Fleece supports shared keys. With shared keys you maintain an external SharedKeys
object that has an ordered list of key strings. This list associates every key with a small integer (its index in the list.)
When you encode a Fleece dict and associate a SharedKeys object with the Encoder, eligible keys are written as 2-byte integers instead of strings. The encoder will automatically add eligible Dict key strings to the SharedKeys. (Eligible keys are 16 bytes or shorter, and contain only ASCII alphanumerics or “-“ or “_”.)
When you parse a document that was encoded with shared keys, you need to pass a reference to a compatible SharedKeys object. Then when key lookup happens, the requested key string can be mapped to its integer and looked up that way.
For even higher optimization, you can use a DictKey object. This is a higher-level representation of a key string. Internally it stores both the string and, if there is one, its small-integer value. You can pass a DictKey when looking up a key in a Dict, and it will be faster than using a string if that key has a shared-keys encoding.
A Fleece document that was encoded with shared keys must be parsed later using a Doc
, not by Value::fromData()
. The SharedKeys object is passed to the Doc
constructor as an extra parameter.
Warning: If you don’t heed this advice, you’ll hit a runtime exception when looking up Dict keys in that document — currently it’s a pretty unintuitive message like “
assertion failed: sharedKeys || gDisableNecessarySharedKeysCheck
”.
Why is this necessary? If you look up a key in a Dict, and that Dict was encoded using shared keys, the lookup has to find the SharedKeys object to make sense of the numeric keys in the encoded data. The way this works is:
- When you create the
Doc
object you pass a SharedKeys reference along with the Fleece data. - The Doc registers itself as the owner of that data (the
alloc_slice
.) - When the Dict lookup notices there are numeric (shared) keys, it looks up the owner of the memory range the Dict occupies.
- The owning Doc has a SharedKeys, which the Dict code then looks at to find the matching string.
Note: You might wonder why the Dict itself can’t keep a pointer to the SharedKeys. The reason is that the Dict is literally just a pointer to a read-only object baked into the encoded data. It has no mutable state of its own.
Fleece has a delta encoder that can find the differences between two Fleece values, a source and a destination, and encode them in a fairly compact form called a delta (expressed in JSON, for compatibility.) The delta can subsequently be applied to the original source value, producing the destination value.
This is typically used to optimize network bandwidth. It looks like this:
- Alice sends Bob a Fleece document.
- Bob modifies the document, but keeps a copy of the original.
- Bob calls
FLCreateJSONDelta()
on the original and new versions of the document, producing a JSON string. - Bob sends that JSON string back to Alice. It’s much shorter than his document.
- Alice calls
FLApplyJSONDelta
with her original document and the JSON, and it returns a new Fleece document identical to Bob’s.
Deltas can also be used to create efficient backups. After making changes you can create a reverse delta, passing your current version as the source and the old version as the destination. Then you can throw away the old version, since you can reconstitute it by applying the delta to the current version! (This is the basic concept behind all version control systems, like Git, Mercurial, SVN, …)
The tricky part of using delta compression is making sure that the delta is applied to exactly the same source document that it was created from. If not, then the decoding will either fail with an error, or (more likely) produce an incorrect result that doesn’t match the destination. For this reason, most systems that use delta encoding also use some kind of version indicator like a hash that reliably identifies the source.
Note: If you’re curious about the exact format of a delta, it’s described in this design document.