-
Notifications
You must be signed in to change notification settings - Fork 32
Slices
Jens Alfke — 22 April 2021
The slice is a low-level, general-purpose data structure that’s used extensively by Fleece, Couchbase Lite Core and Couchbase Lite For C.
It’s actually really simple. A slice is a pointer to a range of memory. It is literally a struct consisting of a void*
and a size_t
. Here’s the C declaration:
typedef struct FLSlice {
const void* buf;
size_t size;
} FLSlice;
and a synopsis of the C++ declaration:
namespace fleece {
struct slice {
const void* const buf;
size_t const size;
};
}
In C++, while you can use FLSlice
, it’s preferable to use fleece::slice
. They are equivalent data structures, and freely convertible; but slice
has a rich API that makes it much more convenient to work with.
Note:
FLString
is simply a typedef forFLSlice
. It’s just a mnemonic to make it clear that the given slice contains (UTF-8 encoded) text.
A slice points to the range of memory from address buf
to address buf + size
(non-inclusive). You can think of it as a one-dimensional pointer, where a regular pointer is zero-dimensional.
- A slice whose size is 0 is an empty slice.
- The slice
{NULL, 0}
is the null slice, known asnullslice
in C++ orkFLSliceNull
in C. - A slice may never have
buf
= 0 butsize
> 0.
Slices are a convenient language-independent way to represent strings, since unlike a char*
they don’t require a trailing zero byte. (In fact, a slice is nearly identical to a C++ string_view
.)
Two important properties of slices:
- Just like a pointer, a basic slice does not imply any ownership of the memory it points to.
- As the
const
in its declaration implies, a slice is read-only.
Slices tend to be used in three ways:
You can easily make a string literal slice. Since string literals are stored in the executable and have an unlimited lifespan, a slice pointing to one is always valid with no ownership issues.
In C++, you can just construct a slice from a string literal with zero runtime overhead:
slice("foo")
Or, if the fleece
namespace is in scope, you can append _sl
to a string literal to make it a slice:
"foo"_sl
In C, there’s a macro you wrap around a string literal:
FLSTR("foo")
Slices are often passed to functions, as read-only references to a range of memory; most of the time these are interpreted as strings.
This is almost always safe. If you have a valid reference to some memory at the time that you call a function, you still have that reference while the function is running, right? Usually, yes, but there are some tricky edge cases that tend to look like
- Function
A()
has a reference to objectx
. - A calls
B(x.s)
, passing a slice to memory that’s owned byx
. - Function B does something that causes object
x
to be freed. - Function B accesses the slice it was passed … which now points to garbage ☠️
This is just an instance of a pretty general anti-pattern that as a C/C++ programmer you probably already know too well. In my experience it doesn’t come up that often, but be aware of it. (Hint: The Clang Address Sanitizer will catch such problems and make them much easier to debug.)
A function can safely return a slice that points to memory that it owns, or that has global scope. This is very common in accessor methods that return constants or data members:
class foo {
private:
std::string _name, _title;
public:
virtual slice className() {return "foo"_sl;} // 👍🏻
slice name() const {return slice(_name);} // 👍🏻
};
Note: A C++
slice
constructed from astd::string
points into the string’s data.slice(_name)
is equivalent toslice(_name.data(), name.size())
.
Or in C:
FLSlice name(Person *p) {
return FLSlice(p->name, strlen(p->name)); // 👍🏻
}
When implementing a function that returns slice
/ FLSlice
, just watch out for the well-known problem of returning a pointer to a local variable, or to memory that gets freed when the function exits:
slice nameAndTitle() const { return slice(_title + " " + _name); } // ☠️
The above example constructs a temporary std::string
object and returns a slice pointing to its data. Unfortunately the temporary string is destructed when the function returns, so the function returns garbage. A similar example in C is:
FLSlice nameAndTitle(Person *p) {
char str[100];
sprintf(str, "%s %s", p->title, p->name);
return FLSlice(str, strlen(str)); // ☠️
}
This returns a pointer to a local variable str
on the stack, which of course becomes garbage as soon as the function returns.
So how do you use slices to return memory allocated by the called function? We cover that below…
alloc_slice
(FLSliceResult
in C) is a subclass of slice
that manages its own memory; the bytes pointed to by an alloc_slice
are always valid.
Note: C doesn’t support subclassing, so
FLSliceResult
is a separate struct with the same layout. This means you can’t directly pass one to a parameter of typeFLSlice
. Instead use the inline functionFLSliceResult_AsSlice
, or simply use thebuf
andsize
values to populate an FLSlice yourself.
Under the hood, the memory is allocated on the heap with malloc
. Each memory block uses reference-counting to keep track of how many alloc_slice
objects point to it, and frees itself when the count reaches zero. This unavoidably makes alloc_slice
more expensive to create, and to pass by value.
Let’s revisit the dangerous example from the previous section. The proper way to implement a function that returns a constructed slice is to return alloc_slice
:
alloc_slice nameAndTitle() const { return alloc_slice(_title + " " + _name); } // 👍🏻
Here the string concatenation creates a temporary std::string
; an alloc_slice
is constructed from this string, which copies the characters to a new reference-counted heap block; and that alloc_slice
is returned.
{
alloc_slice n = somebody.nameAndTitle();
printf("Name: %.*s\n", (const char*)n.buf, (int)n.size);
}
After the printf
, n
goes out of scope and is destructed, and the heap block is freed since there are now zero references to it.
Note: This example also shows how to pass a slice to a
printf
-type function. Note the “%.*s
” format specifier, which says that a pointer and a length will be passed. There’s a helper macroFMTSLICE
that you can use to simplify the parameter list, e.g.printf("Name: %.*s\n", FMTSLICE(s));
In C the function looks like this:
FLSliceResult nameAndTitle(Person *p) {
FLSliceResult result = FLSliceResult_New(100);
result.size = sprintf((char*)result.buf, "%s %s", p->title, p->name);
return result; // 👍🏻
}
Note that the caller is responsible for releasing the result by calling FLSliceResult_Release
when it’s done with it, else the memory will be leaked:
FLSliceResult n = nameAndTitle(somebody);
printf("Name: %.*s\n", (const char*)n.buf, (int)n.size);
FLSliceResult_Release(n);
Whenever a function returns FLSliceResult
, that means it returns a reference that the caller must later release. (Unless the returned value happened to be null.)
FLHeapSlice
, in the C API, is a bit of an odd duck. It’s an alias for FLSlice, but indicates that the value stored therein is actually an FLSliceResult
, i.e. its buf
points to a managed block. This can allow the caller to “promote” it to an FLSliceResult
without having to copy the data.
Our advice is to treat this just like FLSlice
when you see it, and don’t use it yourself in your own APIs.
Sometimes you’re writing C++ code that calls a C API, or else that implements a C API. That means you end up using the C types, even though the C++ ones are so much more convenient to use.
Fortunately there is some special gunk in slice.hh
to make them interoperable. You can:
- Implicitly convert a
slice
to anFLSlice
or vice versa - Implicitly convert
alloc_slice
toFLSlice
orFLHeapSlice
. - Implicitly convert
FLSliceResult
toalloc_slice
orslice
-
Explicitly convert
alloc_slice
toFLSliceResult
— you have to do it explicitly to avoid leaks, because it creates a reference that has to be manually released
One conversion in particular is worth calling out: converting an rvalue of type FLSliceResult
to alloc_slice
acts like a “move” of the reference, meaning that you don’t have to release it manually. For example, we can call the C function from above like this:
auto name = alloc_slice(nameAndTitle(somebody));
nameAndTitle()
returns an FLSliceResult
, but since this value is never assigned to a variable but just passed directly into the alloc_slice
constructor, the constructor is able to use the existing reference. When the alloc_slice
destructs it will release that reference, freeing the memory.
The opposite conversion works too, alloc_slice
rvalue to FLSliceResult
. This one happens when you’re implementing an extern "C"
function in C++ that returns an FLSliceResult:
extern "C" FLSliceResult nameAndTitle(Person *p);
FLSliceResult nameAndTitle(Person *p) {
return FLSliceResult( p->nameAndTitle() );
}
Here the C++ nameAndTitle()
method returns an alloc_slice
, whose reference gets moved to the constructed FLSliceResult
. Of course the caller will be responsible for releasing that reference.
The C API is in fleece/FLSlice.h
. There’s not a whole lot to it, and some of that is for use by C++. You get the struct definitions, the constant kFLSliceNull
, and some functions:
FLSlice:
-
FLStr(str)
— Creates a slice that points to a C string. Takes O(n) time since it has to callstrlen
. -
FLSTR("...")
— Declares a slice that points to a string literal. This is more efficient since the size of the literal is known at compile time. -
FLSlice_Equal(a,b)
— Returns true if the two slices have equal contents. -
FLSlice_Compare(a,b)
— Compares the contents asstrcmp
would do, i.e. returning negative, zero or positive. -
FLSlice_Hash(s)
— Generates a 32-bit hash code from the contents of the slice. -
FLSlice_ToCString(s, buf, capacity)
— Copies a slice (or as much as will fit) into a buffer and adds a terminating 0 byte.
FLSliceResult:
-
FLSliceResult_New(size)
— Allocates a new slice of the given size. The contents are uninitialized (garbage). You’ll need to castbuf
to a non-const pointer and write into them. -
FLSliceResult_CreateWith(bytes,size)
— Allocates a new slice of the given size and copies data frombytes
into it. -
FLSlice_Copy(s)
— Same as the above, but the bytes & size are expressed as aFLSlice
. -
FLSliceResult_Release(s)
— Decrements the ref-count; call this when you’re done with it. Note that it’s a safe no-op to call this on a null slice. -
FLSliceResult_Retain(s)
— Increments the ref-count of aFLSliceResult
; not often needed. This call must be balanced with a call toFLSliceResult_Release
.
The C++ API is in fleece/slice.hh
, and is quite large! That means there are all kinds of helpful utility methods, comparable to what you’d find in a string or array (of bytes) class.
Common API:
Most of the common API of slice
and alloc_slice
is defined in their common base class pure_slice
. (You never use pure_slice
yourself, and you can forget it exists except when looking at the API. All of its public methods can be called on both slice
and alloc_slice
.)
- You can construct a
slice
in a number of ways: with a pointer and a size, with a begin and an end pointer, with anFLSlice
orFLSliceResult
. - You can cast to/from a
std::string
orstd::string_view
, i.e.string(slc)
orslice(str)
, and cast a C string to a slice, i.e.slice(cstr)
. (You can’t cast a slice to a C string because slices don’t have trailing zero bytes. Instead usetoCString
, which copies the slice’s contents to a buffer you provide and appends a zero byte, or elsestring(slc).c_str()
.) -
operator bool
— Bool conversion lets you use a slice as a test inif
orwhile
. A slice evaluates as true unless it’snullslice
; soif(s)
is the same asif(s.buf != nullptr)
. -
empty
— Tests if a slice’ssize
is 0. -
begin
andend
— Their existence lets you use afor
loop to iterate over a slice’s bytes, withfor(uint8_t c : s)
. -
operator[]
returns the byte at an offset, as though the slice were auint8_t[]
. -
operator()
takes an offset and a length and returns the equivalent slice. (It can’t use square brackets because C++ doesn’t allowoperator[]
to take more than one parameter!) -
upTo
andfrom
return prefixes and suffixes respectively. - There are several
find...
methods that search for particular bytes or substrings. - There are a number of comparison operators like
==
,<
,compare
— these compare the contents, not addresses. -
caseEquivalent
andcaseEquivalentCompare
are case-insensitive versions of==
andcompare
, but they only know about the English (ASCII) alphabet. -
hasPrefix
andhasSuffix
compare the beginning or end of the slice. -
hash
returns a 32-bit hash code of the contents. -
copyTo
callsmemcpy
to copy the entire slice to a destination address.
alloc_slice API:
- You can construct an
alloc_slice
in a number of ways: with a pointer and a size, with a begin and an end pointer, with anFLSlice
orFLSliceResult
. In all these cases it copies the contents of the source into a new ref-counted heap block. - You can also construct an
alloc_slice
with a size alone, in which case it allocates a block of the given size but uninitialized (garbage) contents. -
nullPaddedString(slice)
is a static factory method that creates analloc_slice
copying the given slice, and guarantees that there will be a zero byte just past its end. The zero byte is not part of the slice, but it means you can cast itsbuf
toconst char*
and pass it to any API that takes a C string. -
resize
changes the size of the slice by reallocating its buffer, similar torealloc
. (If the new size is larger, new bytes at the end will be uninitialized garbage.) There’s a variant calledshorten
that first asserts that the new size is not larger. -
append
grows the buffer and copies the given slice’s contents to its end. This is not super efficient: unlikestd::string
it does not pre-allocate spare capacity, so every call toappend
does arealloc
. If you need to efficiently concatenate and produce analloc_slice
, look at theWriter
class in Fleece.