C++ Small Containers
- Applications usually contain many auxiliary small data structures for each large collection of values. Container implementations often include several optimizations for the case when they are small.
- These optimizations cannot usually make it to the STL because of ABI compatibility issues. Users might need to reimplement these containers or rely on frameworks that include these implementations.
- Depending on large library collections for simple containers might impose a cost on the user that's higher than necessary and hinder collaboration on the evolution of these containers.
- This library includes independent implementations of the main STL containers optimized for the case when they are small.
Table of Contents
Integration:
=== "CMake"
=== "Add subdirectory"
```bash
git clone https://github.com/alandefreitas/small/
```
```cmake
add_subdirectory(small)
# ...
add_executable(your_target main.cpp)
target_link_libraries(your_target PUBLIC small::small)
```
=== "Fetch content"
```cmake
include(FetchContent)
FetchContent_Declare(small
GIT_REPOSITORY https://github.com/alandefreitas/small
GIT_TAG origin/master # or whatever tag you want
)
FetchContent_GetProperties(small)
if(NOT small_POPULATED)
FetchContent_Populate(small)
add_subdirectory(${small_SOURCE_DIR} ${small_BINARY_DIR} EXCLUDE_FROM_ALL)
endif()
# ...
add_executable(your_target main.cpp)
target_link_libraries(your_target PUBLIC small::small)
```
=== "External package"
```cmake
# Follow installation instructions and then...
find_package(small REQUIRED)
if(NOT small_FOUND)
# Throw or put your FetchContent script here
endif()
# ...
add_executable(your_target main.cpp)
target_link_libraries(your_target PUBLIC small::small)
```
=== "Install"
!!! note
Get the binary package from the [release section](https://github.com/alandefreitas/small/releases).
These binaries refer to the latest release version of small.
!!! hint
If you need a more recent version of `small`, you can download the binary packages from the CI artifacts or build the library from the source files.
=== "Build from source"
!!! note
Ensure your C++ compiler and CMake are up-to-date and then:
=== "Ubuntu + GCC"
```bash
# Create a new directory
mkdir build
cd build
# Configure
cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_FLAGS="-O2"
# Build
sudo cmake --build . --parallel 2 --config Release
# Install
sudo cmake --install .
# Create packages
sudo cpack .
```
=== "Mac Os + Clang"
```bash
# Create a new directory
mkdir build
cd build
# Configure
cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_FLAGS="-O2"
# Build
cmake --build . --parallel 2 --config Release
# Install
cmake --install .
# Create packages
cpack .
```
=== "Windows + MSVC"
```bash
# Create a new directory
mkdir build
cd build
# Configure
cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_FLAGS="/O2"
# Build
cmake --build . --parallel 2 --config Release
# Install
cmake --install .
# Create packages
cpack .
```
!!! hint "Parallel Build"
Replace `--parallel 2` with `--parallel <number of cores in your machine>`
!!! note "Setting C++ Compiler"
If your C++ compiler that supports C++17 is not your default compiler, make sure you provide CMake with the compiler location with the DCMAKE_C_COMPILER and DCMAKE_CXX_COMPILER options. For instance:
```bash
cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_FLAGS="-O2" -DCMAKE_C_COMPILER=/usr/bin/gcc-8 -DCMAKE_CXX_COMPILER=/usr/bin/g++-8
```
=== "File amalgamation"
!!! note
Because containers are header-only, you can directly copy the contents from the `source` directory into your project.
!!! hint
In that case, you are responsible for setting the appropriate target include directories and any compile definitions you might require.
Once the library is properly integrated, you can create containers from the namespace small
like any other STL container:
--8<-- "examples/main.cpp"
All containers are optimized for the case when they're small but also efficient when they are large. The containers mix the common techniques found in other small container libraries:
- Inline allocation for small containers
- Custom expected sizes
- Identification of relocatable types
- Custom growth factors with better defaults
- Communication with system memory allocators
- Explicit consideration of CPU cache sizes and branch prediction
Most applications have many small lists and sets of elements. These containers avoid spending a lot of time with large containers that contain just a few elements. Small containers usually try to use the stack before dynamically allocating memory and try to represent associative containers with stack arrays, unless these sets are very large.
The following containers are available:
small::vector
small::max_size_vector
small::string
small::set
small::max_size_set
small::multiset
small::max_size_multiset
small::unordered_set
small::max_size_unordered_set
small::unordered_multiset
small::max_size_unordered_multiset
small::map
small::max_size_map
small::multimap
small::max_size_multimap
small::unordered_map
small::max_size_unordered_map
small::unordered_multimap
small::max_size_unordered_multimap
small::stack
small::queue
small::priority_queue
Although many compilers support small string optimization (SSO) already, this library will ensure all strings support SOO, custom inline sizes, relocation, and unicode.
This small vector implementation includes:
- Inline allocation for small vectors
- Custom expected size
- Special treatment of relocatable types
- Relocatable types can be moved with
memcpy
, bypassing destructors and constructors. - Relocatable types are defined by default for POD types and aggregate types of PODs
- The
small::is_relocatable
traits can be used as an extension point for custom types
- Relocatable types can be moved with
- Better growth factors
- Consider the cache line size in allocations
- Heap allocations can be disabled with
small::max_size_vector
When there are fewer elements than a given threshold, the elements are kept in a stack buffer for small vectors. Otherwise, the vector works as usual. However, if you are 100% sure you will never need more than N
elements, you can use a max_size_vector
, where elements are always inline.
The default number of elements in a small vector is usually the number of elements we can already fit inline in a vector. For larger data types, the default_inline_storage
trait can be used as an extension point where one can define how many elements a small vector of that type should contain by default.
--8<-- "examples/default_inline_storage.cpp"
When there's a reasonable default for the number of inline elements, this strategy avoids multiple vector type instantiations for different inline storage sizes.
This small vector implementation used folly, abseil, and LLVM as a reference.
The small string includes all the common optimizations for small vectors, and a custom template parameter to set how large we expect the string to be (in bytes).
However, when strings are representing text, if there's one thing that makes them not small is not supporting UTF8. In addition to the common interface for strings, small::string
includes extra functions to identify and work with UTF8 code points with random access.
--8<-- "examples/unicode_strings.cpp"
The problem of supporting UTF8 is easier to explain than it is to solve. Programming languages tend to solve this problem by (1) forbidding byte or substring access, and/or (2) allowing only access to code points with O(n)
cost, where n
is the number of code points. Because anything that forbids byte access would be incompatible with a C++ string, we allow direct byte access, and strings are allowed to be malformed unicode, which we can check with small::is_malformed
.
All capacity and access functions contain extra overloads that accept codepoint indexes, defined as a strong type, rather than byte indexes. By using these functions, one can ensure the string is never malformed. It's up to the user to decide whether these access functions are useful and worth it in a particular application.
Access to codepoints is provided with an inline lookup-table trick that allows us to access codepoints in O(log m)
time, where m
is the number of multibyte code points in the strings. When there are no multibyte codepoints in the string, the string works as usual and no extra memory is required for the table.
The small set/map classes use a more cache-friendly flat set/map and all other optimizations mentioned above for internal algorithms. As with other small containers, a custom template parameter can be used to define the number of inline elements in the container.
The small::default_inline_storage
and small::is_relocatable
trait can also be defined for custom types, and all the usual set/map, ordered/unordered, uni/multi variants are also provided:
--8<-- "examples/associative.cpp"
Unlike a small::vector
or small::string
, the asymptotic time complexities of flat sets/maps are very different from their std::
counterparts and should only be used when they are small. Because they are internally implemented as arrays, manipulating these containers costs O(n)
.
For large containers, you can use std
containers with custom allocators. Or for efficient large containers, you can use the abseil containers, implemented as B+-trees.
- Discussions are concentrated on our GitHub discussions page. Don't refrain from asking questions and proposing ideas. If this library helps you create something interesting, please divulge it with the community.
- If you are a programmer with good ideas, please share these ideas with us.
- Academic collaboration is more than welcome. It'd be great to see this library help people write papers.
Feel free to contribute new features to this library. For complex features and changes, consider getting feedback from the community first. Contributing to an existing code base with its conventions might seem obscure at first but please don't let that discourage you from sharing your ideas.
There are many ways in which you can contribute to this library:
- Testing the library in new environments see 1, 2, 3
- Contributing with interesting examples see 1
- Finding problems in this documentation see 1
- Finding bugs in general see 1, 2, 3, 4
- Whatever idea seems interesting to you
The only thing we ask you is to make sure your contribution is not destructive. Some contributions in which we are not interested are:
- "I don't like this optional feature, so I removed/deprecated it"
- "I removed this feature to support older versions of C++" but have not provided an equivalent alternative
- "I removed this feature, so I don't have to install/update ______" but have not provided an equivalent alternative
- "I'm creating this high-cost promise that we'll support ________ forever" but I'm not sticking around to keep that promise
In doubt, please open a discussion first
If contributing with code, please leave all warnings ON (-DSMALL_BUILD_WITH_PEDANTIC_WARNINGS=ON
), use cppcheck, and clang-format.
If contributing to the documentation, please edit README.md
directly, as the files in ./docs
are automatically generated with mdsplit.
Alan De Freitas |
Yat Ho |
Tomasz-szyszko |
Marcos Pontes |
These are some references we used for this work: