Nvwa ("v" is pronounced like the French "u") is one of the most ancient Chinese goddesses. She was said to have created human beings, and, when the evil god Gong-gong crashed himself upon one of the pillars that supported the sky and made a hole in it, she mended the sky with five-coloured stones.
I thought of the name Nvwa by analogy with Loki. Since it is so small a project and it contains utilities instead of a complete framework, I think "stones" a good metaphor.
Nvwa versions prior to 1.0 did not use a namespace. It looked to me
overkill to use a namespace in such a small project. However, having
had more project experience and seen more good examples, I changed my
mind. All nvwa functions and global variables were moved into the
namespace nvwa
.
The changes do not mean anything to new users. You just need to follow
the modern C++ way. You should add the root directory of this project
to your include directory (there should be a nvwa directory inside it
with the source files). To include the header file, use
#include <nvwa/...>
.
A brief introduction follows. Check the Doxygen documentation for more (technical) details.
aligned_memory.cpp
aligned_memory.h
Two function for cross-platform aligned memory allocation and
deallocation. This is a thin layer, and it is needed only because the
C++17/C11 aligned_alloc
pairs with free
, which does not work on
Microsoft Windows.
bool_array.cpp
bool_array.h
A replacement of std::vector<bool>
. I wrote it before I knew of
vector<bool>
, or I might not have written it at all. However, it is
faster than many vector<bool>
implementations (your mileage may vary),
and it has members like at
, set
, reset
, flip
, and count
. I
personally find count
very useful.
c++_features.h
Detection macros for certain modern C++ features that might be of interest to code/library writers. I have used existing online resources, and I have tested them on popular platforms and compilers that I have access to (Windows, Linux, and macOS; MSVC, GCC, and Clang), but of course not all versions or all combinations. Patches are welcome for corrections and amendments.
class_level_lock.h
The Loki ClassLevelLockable
adapted to use the fast_mutex
layer.
One minor divergence from Loki is that the template has an additional
template parameter _RealLock
to boost the performance in non-locking
scenarios. Cf. object_level_lock.h.
cont_ptr_utils.h
Utility functors for containers of pointers adapted from Scott Meyers' Effective STL.
debug_new.cpp
debug_new.h
A cross-platform, thread-safe memory leak detector. It is a light-weight one designed only to catch unmatched pairs of new/delete. I know there are already many existing memory leak detectors, but as far as I know, free solutions are generally slow, memory-consuming, and quite complicated to use. This solution is very easy to use, and has very low space/time overheads. Just link in debug_new.cpp for leakage report, and include debug_new.h for additional file/line information. It will automatically switch on multi-threading when the appropriate option of a recognized compiler is specified. Check fast_mutex.h for more threading details.
Special support for gcc/binutils has been added to debug_new lately.
Even if the header file debug_new.h is not included, or
_DEBUG_NEW_REDEFINE_NEW
is defined to 0 when it is included, file/line
information can be displayed if debugging symbols are present in the
executable, since debug_new stores the caller addresses of memory
allocation/deallocation routines and they will be converted with
addr2line
(or atos
on macOS) on the fly. This makes memory tracing
much easier.
Note for Linux/macOS users: Nowadays GCC and Clang create
position-independent executables (PIEs) by default so that the OS can
apply address space layout randomization (ASLR), loading a PIE at a
random address each time it is executed and making it difficult to
convert the address to symbols. The simple solution is to use the
command-line option -no-pie
on Linux, or -Wl,-no_pie
on macOS.
With an idea from Greg Herlihy's post in comp.lang.c++.moderated, the
implementation was much improved in 2007. The most significant result
is that placement new
can be used with debug_new now! Full support
for new(std::nothrow)
is provided, with its null-returning error
semantics (by default). Memory corruption will be checked on freeing
the pointers and checking the leaks, and a new function
check_mem_corruption
is added for your on-demand use in debugging.
You may also want to define _DEBUG_NEW_TAILCHECK
to something like 4
for past-end memory corruption check, which is off by default to ensure
performance is not affected.
An article on its design and implementation is available at
A Cross-Platform Memory Leak Detector
Cf. memory_trace.cpp and memory_trace.h.
fast_mutex.h
The threading transparent layer simulating a non-recursive mutex. It
supports C++11 mutex, POSIX mutex, and Win32 critical section, as well
as a no-threads mode. Unlike Loki and some other libraries, threading
mode is not to be specified in code, but detected from the environment.
It will automatically switch on multi-threading mode (inter-thread
exclusion) when the -MT
/-MD
option of MSVC, the -mthreads
option
of MinGW GCC, or the -pthread
option of GCC under POSIX environments,
is used. One advantage of the current implementation is that the
construction and destruction of a static object using a static
fast_mutex
not yet constructed or already destroyed are allowed to
work (with lock/unlock operations ignored), and there are re-entry
checks for lock/unlock operations when the preprocessing symbol _DEBUG
is defined.
fc_queue.h
A queue that has a fixed capacity (maximum number of allowed items). All memory is pre-allocated when the queue is initialized, so memory consumption is well controlled. When the queue is full, inserting a new item will discard the oldest item in the queue. Care is taken to ensure this class template provides the strong exception safety guarantee.
This class template is a good exercise for me to make a STL-type
container. Depending on your needs, you may find circular_buffer
in
Boost more useful, which provides similar functionalities and a richer
API. However, the design philosophies behind the two implementations
are quite different. fc_queue
is modelled closely on std::queue
,
and I optimize mainly for a lockless producer-consumer pattern—when
there is one producer and one consumer, and the producer does not try to
queue an element when the queue is already full, no lock is needed for
the queue operations.
I have recently also switched to using C++11 atomic variables in
fc_queue
. Previously the code worked for me, but at least
theoretically it was not safe: compilers may optimize the code in
unexpected ways, and multiple processors may cause surprises too. Using
the old interface is not optimal now in the producer-consumer scenario,
though, and two new read/write functions are provided to make it work
more efficiently.
file_line_reader.cpp
file_line_reader.h
This is one of the line reading classes I implemented modelling the
Python approach. They make reading lines from a file a simple loop.
This implementation allows reading from a traditional FILE*
. Cf.
istream_line_reader.h, mmap_byte_reader.h, and
mmap_line_reader.h.
See the following blog for the motivation and example code:
Performance of My Line Readers
fixed_mem_pool.h
A memory pool implementation that requires initialization (allocates a
fixed-size chunk) prior to its use. It is simple and makes no memory
fragmentation, but the memory pool size cannot be changed after
initialization. Macros are provided to easily make a class use pooled
new
/delete
.
functional.h
This is my test bed for functional programming. If you are into functional programming, check it out. It includes functional programming patterns like:
- map
- reduce
- compose
- fixed-point combinator
- curry
- optional
My blogs on functional programming may be helpful:
Study Notes: Functional Programming with C++
Y Combinator and C++
Type Deduction and My Reference Mistakes
Generic Lambdas and the compose
Function
istream_line_reader.h
This is one of the line reading classes I implemented modelling the Python approach. They make reading lines from a file a simple loop. This implementation allows reading from an input stream. Cf. file_line_reader.h and mmap_line_reader.h.
See the following blogs for the motivation and example code:
Python yield
and C++ Coroutines
Performance of My Line Readers
malloc_allocator.h
An allocator that invokes malloc
/free
instead of operator
new
/delete
. It is necessary in the global operator new
/delete
replacement code, like memory_trace.cpp.
mem_pool_base.cpp
mem_pool_base.h
A class solely to be inherited by memory pool implementations. It is used by static_mem_pool and fixed_mem_pool.
memory_trace.cpp
memory_trace.h
This is a new version of the memory leak detector, designed to use
stackable memory checkpoints, instead of redefined new
to record the
context information. Like debug_new, it is quite easy to use, and
has very low space/time overheads. One needs to link in
memory_trace.cpp and aligned_memory.cpp for leakage report, and
include memory_trace.h for adding a new checkpoint with the macro
NVWA_MEMORY_CHECKPOINT()
.
See the following blog for its design:
mmap_byte_reader.h
This file contains the byte reading class template I implemented modelling the Python approach. It makes reading characters from a file a simple loop. This implementation uses memory-mapped file I/O. Cf. mmap_line_reader.h.
mmap_line_reader.h
This file contains the line reading class template I implemented modelling the Python approach. It makes reading lines from a file a simple loop. This implementation uses memory-mapped file I/O. Cf. istream_line_reader.h, file_line_reader.h, and mmap_byte_reader.h.
See the following blogs for the motivation and example code:
Performance of My Line Readers
C/C++ Performance, mmap
, and string_view
mmap_line_view.h
A mmap_line_view
is similar to mmap_line_reader_sv
defined in
mmap_line_reader.h, but it has gone an extra mile to satisfy the
view
concept (to be introduced in C++20). It can be trivially copied.
mmap_reader_base.cpp
mmap_reader_base.h
A class that wraps the difference of memory-mapped file I/O between Unix
and Windows. It is used by mmap_byte_reader
and mmap_line_reader
.
object_level_lock.h
The Loki ObjectLevelLockable
adapted to use the fast_mutex
layer.
The member function get_locked_object
does not exist in Loki, but is
also taken from Mr Alexandrescu's article. Cf. class_level_lock.h.
pctimer.h
A function to get a high-resolution timer for Win32/Cygwin/Unix. It is
useful for measurement and optimization, and can be easier to use than
std::chrono::high_resolution_clock
after the advent of C++11.
set_assign.h
Utility routines to make up for the fact that STL only has set_union
(+) and set_difference
(-) algorithms but no corresponding += and -=
operations available.
split.h
Implementation of a split routine that allows efficient and lazy split
of a string
(or string_view
). It takes advantage of the C++17
string_view
, and the split result can be either used on the fly or
returned as a vector.
While the ranges library provides a similar function, compiling with ranges is typically much slower—of course, it is much more powerful as well.
static_mem_pool.cpp
static_mem_pool.h
A memory pool implementation to pool memory blocks according to compile-time block sizes. Macros are provided to easily make a class use pooled new/delete.
An article on its design and implementation is available at
Design and Implementation of a Static Memory Pool
tree.h
A generic tree class template along with traversal utilities. Besides the usual template argument of value type, it has an additional argument of storage policy, which can be either unique or shared. Traversal utility classes are provided so that traversing a tree can be simply done in a range-based for loop. The test code, test/test_tree.cpp, shows its basic usage.