Skip to content
Jens Alfke edited this page Apr 22, 2021 · 2 revisions

Slices

Jens Alfke — 22 April 2021

1. INTRODUCTION

The slice is a low-level, general-purpose data structure that’s used extensively by Fleece, Couchbase Lite Core and Couchbase Lite For C.

It’s actually really simple. A slice is a pointer to a range of memory. It is literally a struct consisting of a void* and a size_t. Here’s the C declaration:

typedef struct FLSlice {
    const void* buf;
    size_t      size;
} FLSlice;

and a synopsis of the C++ declaration:

namespace fleece {
    struct slice {
        const void* const buf;
        size_t      const size;
    };
}

In C++, while you can use FLSlice, it’s preferable to use fleece::slice. They are equivalent data structures, and freely convertible; but slice has a rich API that makes it much more convenient to work with.

Note: FLString is simply a typedef for FLSlice. It’s just a mnemonic to make it clear that the given slice contains (UTF-8 encoded) text.

A slice points to the range of memory from address buf to address buf + size (non-inclusive). You can think of it as a one-dimensional pointer, where a regular pointer is zero-dimensional.

  • A slice whose size is 0 is an empty slice.
  • The slice {NULL, 0} is the null slice, known as nullslice in C++ or kFLSliceNull in C.
  • A slice may never have buf = 0 but size > 0.

Slices are a convenient language-independent way to represent strings, since unlike a char* they don’t require a trailing zero byte. (In fact, a slice is nearly identical to a C++ string_view.)

2. BASIC SLICES (slice, FLSlice)

Two important properties of slices:

  1. Just like a pointer, a basic slice does not imply any ownership of the memory it points to.
  2. As the const in its declaration implies, a slice is read-only.

Slices tend to be used in three ways:

As string constants

You can easily make a string literal slice. Since string literals are stored in the executable and have an unlimited lifespan, a slice pointing to one is always valid with no ownership issues.

In C++, you can just construct a slice from a string literal with zero runtime overhead:

slice("foo")

Or, if the fleece namespace is in scope, you can append _sl to a string literal to make it a slice:

"foo"_sl

In C, there’s a macro you wrap around a string literal:

FLSTR("foo")

As function/method parameters

Slices are often passed to functions, as read-only references to a range of memory; most of the time these are interpreted as strings.

This is almost always safe. If you have a valid reference to some memory at the time that you call a function, you still have that reference while the function is running, right? Usually, yes, but there are some tricky edge cases that tend to look like

  1. Function A() has a reference to object x.
  2. A calls B(x.s), passing a slice to memory that’s owned by x.
  3. Function B does something that causes object x to be freed.
  4. Function B accesses the slice it was passed … which now points to garbage ☠️

This is just an instance of a pretty general anti-pattern that as a C/C++ programmer you probably already know too well. In my experience it doesn’t come up that often, but be aware of it. (Hint: The Clang Address Sanitizer will catch such problems and make them much easier to debug.)

As function/method return values

A function can safely return a slice that points to memory that it owns, or that has global scope. This is very common in accessor methods that return constants or data members:

class foo {
private:
    std::string _name, _title;
public:
    virtual slice className()  {return "foo"_sl;}         // 👍🏻
    slice name() const         {return slice(_name);}     // 👍🏻
};

Note: A C++ slice constructed from a std::string points into the string’s data. slice(_name) is equivalent to slice(_name.data(), name.size()).

Or in C:

FLSlice name(Person *p) {
    return FLSlice(p->name, strlen(p->name));            // 👍🏻
}

Be careful!

When implementing a function that returns slice / FLSlice, just watch out for the well-known problem of returning a pointer to a local variable, or to memory that gets freed when the function exits:

slice nameAndTitle() const { return slice(_title + " " + _name); }   // ☠️

The above example constructs a temporary std::string object and returns a slice pointing to its data. Unfortunately the temporary string is destructed when the function returns, so the function returns garbage. A similar example in C is:

FLSlice nameAndTitle(Person *p) {
    char str[100];
    sprintf(str, "%s %s", p->title, p->name);
    return FLSlice(str, strlen(str));           // ☠️
}

This returns a pointer to a local variable str on the stack, which of course becomes garbage as soon as the function returns.

So how do you use slices to return memory allocated by the called function? We cover that below…

3. HEAP-ALLOCATED SLICES

alloc_slice (FLSliceResult in C) is a subclass of slice that manages its own memory; the bytes pointed to by an alloc_slice are always valid.

Note: C doesn’t support subclassing, so FLSliceResult is a separate struct with the same layout. This means you can’t directly pass one to a parameter of type FLSlice. Instead use the inline function FLSliceResult_AsSlice, or simply use the buf and size values to populate an FLSlice yourself.

Under the hood, the memory is allocated on the heap with malloc. Each memory block uses reference-counting to keep track of how many alloc_slice objects point to it, and frees itself when the count reaches zero. This unavoidably makes alloc_slice more expensive to create, and to pass by value.

Example

…in C++

Let’s revisit the dangerous example from the previous section. The proper way to implement a function that returns a constructed slice is to return alloc_slice:

alloc_slice nameAndTitle() const { return alloc_slice(_title + " " + _name); }   // 👍🏻

Here the string concatenation creates a temporary std::string; an alloc_slice is constructed from this string, which copies the characters to a new reference-counted heap block; and that alloc_slice is returned.

{
    alloc_slice n = somebody.nameAndTitle();
    printf("Name: %.*s\n", (const char*)n.buf, (int)n.size);
}

After the printf, n goes out of scope and is destructed, and the heap block is freed since there are now zero references to it.

Note: This example also shows how to pass a slice to a printf-type function. Note the “%.*s” format specifier, which says that a pointer and a length will be passed. There’s a helper macro FMTSLICE that you can use to simplify the parameter list, e.g. printf("Name: %.*s\n", FMTSLICE(s));

…in C

In C the function looks like this:

FLSliceResult nameAndTitle(Person *p) {
    FLSliceResult result = FLSliceResult_New(100);
    result.size = sprintf((char*)result.buf, "%s %s", p->title, p->name);
    return result;           // 👍🏻
}

Note that the caller is responsible for releasing the result by calling FLSliceResult_Release when it’s done with it, else the memory will be leaked:

FLSliceResult n = nameAndTitle(somebody);
printf("Name: %.*s\n", (const char*)n.buf, (int)n.size);
FLSliceResult_Release(n);

Whenever a function returns FLSliceResult, that means it returns a reference that the caller must later release. (Unless the returned value happened to be null.)

FLHeapSlice

FLHeapSlice, in the C API, is a bit of an odd duck. It’s an alias for FLSlice, but indicates that the value stored therein is actually an FLSliceResult, i.e. its buf points to a managed block. This can allow the caller to “promote” it to an FLSliceResult without having to copy the data.

Our advice is to treat this just like FLSlice when you see it, and don’t use it yourself in your own APIs.

4. C / C++ INTEROPERABILITY

Sometimes you’re writing C++ code that calls a C API, or else that implements a C API. That means you end up using the C types, even though the C++ ones are so much more convenient to use.

Fortunately there is some special gunk in slice.hh to make them interoperable. You can:

  • Implicitly convert a slice to an FLSlice or vice versa
  • Implicitly convert alloc_slice to FLSlice or FLHeapSlice.
  • Implicitly convert FLSliceResult to alloc_slice or slice
  • Explicitly convert alloc_slice to FLSliceResult — you have to do it explicitly to avoid leaks, because it creates a reference that has to be manually released

Rvalue conversions

One conversion in particular is worth calling out: converting an rvalue of type FLSliceResult to alloc_slice acts like a “move” of the reference, meaning that you don’t have to release it manually. For example, we can call the C function from above like this:

auto name = alloc_slice(nameAndTitle(somebody));

nameAndTitle() returns an FLSliceResult, but since this value is never assigned to a variable but just passed directly into the alloc_slice constructor, the constructor is able to use the existing reference. When the alloc_slice destructs it will release that reference, freeing the memory.

The opposite conversion works too, alloc_slice rvalue to FLSliceResult. This one happens when you’re implementing an extern "C" function in C++ that returns an FLSliceResult:

extern "C" FLSliceResult nameAndTitle(Person *p);

FLSliceResult nameAndTitle(Person *p) {
    return FLSliceResult( p->nameAndTitle() );
}

Here the C++ nameAndTitle() method returns an alloc_slice, whose reference gets moved to the constructed FLSliceResult. Of course the caller will be responsible for releasing that reference.

5. API

C API

The C API is in fleece/FLSlice.h. There’s not a whole lot to it, and some of that is for use by C++. You get the struct definitions, the constant kFLSliceNull, and some functions:

FLSlice:

  • FLStr(str) — Creates a slice that points to a C string. Takes O(n) time since it has to call strlen.
  • FLSTR("...") — Declares a slice that points to a string literal. This is more efficient since the size of the literal is known at compile time.
  • FLSlice_Equal(a,b) — Returns true if the two slices have equal contents.
  • FLSlice_Compare(a,b) — Compares the contents as strcmp would do, i.e. returning negative, zero or positive.
  • FLSlice_Hash(s) — Generates a 32-bit hash code from the contents of the slice.
  • FLSlice_ToCString(s, buf, capacity) — Copies a slice (or as much as will fit) into a buffer and adds a terminating 0 byte.

FLSliceResult:

  • FLSliceResult_New(size) — Allocates a new slice of the given size. The contents are uninitialized (garbage). You’ll need to cast buf to a non-const pointer and write into them.
  • FLSliceResult_CreateWith(bytes,size) — Allocates a new slice of the given size and copies data from bytes into it.
  • FLSlice_Copy(s) — Same as the above, but the bytes & size are expressed as a FLSlice.
  • FLSliceResult_Release(s) — Decrements the ref-count; call this when you’re done with it. Note that it’s a safe no-op to call this on a null slice.
  • FLSliceResult_Retain(s) — Increments the ref-count of a FLSliceResult; not often needed. This call must be balanced with a call to FLSliceResult_Release.

C++ API

The C++ API is in fleece/slice.hh, and is quite large! That means there are all kinds of helpful utility methods, comparable to what you’d find in a string or array (of bytes) class.

Common API:

Most of the common API of slice and alloc_slice is defined in their common base class pure_slice. (You never use pure_slice yourself, and you can forget it exists except when looking at the API. All of its public methods can be called on both slice and alloc_slice.)

  • You can construct a slice in a number of ways: with a pointer and a size, with a begin and an end pointer, with an FLSlice or FLSliceResult.
  • You can cast to/from a std::string or std::string_view, i.e. string(slc) or slice(str), and cast a C string to a slice, i.e. slice(cstr). (You can’t cast a slice to a C string because slices don’t have trailing zero bytes. Instead use toCString, which copies the slice’s contents to a buffer you provide and appends a zero byte, or else string(slc).c_str().)
  • operator bool — Bool conversion lets you use a slice as a test in if or while. A slice evaluates as true unless it’s nullslice; so if(s) is the same as if(s.buf != nullptr).
  • empty — Tests if a slice’s size is 0.
  • begin and end  — Their existence lets you use a for loop to iterate over a slice’s bytes, with for(uint8_t c : s).
  • operator[] returns the byte at an offset, as though the slice were a uint8_t[].
  • operator() takes an offset and a length and returns the equivalent slice. (It can’t use square brackets because C++ doesn’t allow operator[] to take more than one parameter!)
  • upTo and from return prefixes and suffixes respectively.
  • There are several find... methods that search for particular bytes or substrings.
  • There are a number of comparison operators like ==, <, compare — these compare the contents, not addresses.
  • caseEquivalent and caseEquivalentCompare are case-insensitive versions of == and compare, but they only know about the English (ASCII) alphabet.
  • hasPrefix and hasSuffix compare the beginning or end of the slice.
  • hash returns a 32-bit hash code of the contents.
  • copyTo calls memcpy to copy the entire slice to a destination address.

alloc_slice API:

  • You can construct an alloc_slice in a number of ways: with a pointer and a size, with a begin and an end pointer, with an FLSlice or FLSliceResult. In all these cases it copies the contents of the source into a new ref-counted heap block.
  • You can also construct an alloc_slice with a size alone, in which case it allocates a block of the given size but uninitialized (garbage) contents.
  • nullPaddedString(slice) is a static factory method that creates an alloc_slice copying the given slice, and guarantees that there will be a zero byte just past its end. The zero byte is not part of the slice, but it means you can cast its buf to const char* and pass it to any API that takes a C string.
  • resize changes the size of the slice by reallocating its buffer, similar to realloc. (If the new size is larger, new bytes at the end will be uninitialized garbage.) There’s a variant called shorten that first asserts that the new size is not larger.
  • append grows the buffer and copies the given slice’s contents to its end. This is not super efficient: unlike std::string it does not pre-allocate spare capacity, so every call to append does a realloc. If you need to efficiently concatenate and produce an alloc_slice, look at the Writer class in Fleece.