A small C++14 library for reading SQL columns within persisted MonetDB databases.
MonetDB is the de-facto standard analytics-oriented, columnar, relational, Free Software DBMS. MonetDB's persistence format is, admittedly, a bit contrived (since it is also used in-memory - most of it is mmap()ed as-is); but it's a neat little semi-standard storage format - for non-textual columnar relational data, unless you're living in the Hadoop-and-Spark world. you want to experiment with columnar data and not be laden with the entire DBMS (or with other large software systems like R etc.)
A small program is worth a thousand words... :-)
#include "monetdb-buffer-pool/buffer-pool.h"
#include <cstdlib>
#include <iostream>
#include <algorithm>
#include <exception>
#include <iostream>
#include <string>
using buffer_pool_t = monetdb::gdk::buffer_pool_t;
using sql_column_name_t = monetdb::sql_column_name_t;
using column_name_kind_t = monetdb::column_name_kind_t;
using namespace std::string_literals;
enum { max_number_of_elements_to_print = 20 };
[[noreturn]] bool die(const std::string& message)
{
std::cerr << message << std::endl;
exit(EXIT_FAILURE);
}
int main(int argc, char** argv)
{
(argc == 4) or die("Usage: "s + argv[0] + " <db path> <table name> <column name>");
auto db_path = argv[1];
auto table_name = argv[2];
auto column_name = argv[3];
auto full_column_name = sql_column_name_t(table_name, column_name);
auto buffer_pool = buffer_pool_t(db_path);
std::cout
<< "DB persistence layout version: 0" << std::oct << buffer_pool.version()
<< " (decimal " << std::dec << buffer_pool.version() << ")\n"
<< "This BAT Buffer pool contains " << buffer_pool.size() << " columns.\n";
auto index_in_pool = buffer_pool.find_column(full_column_name);
index_in_pool or die ("Couldn't find column \"" + std::string(full_column_name) + "\"");
auto column = buffer_pool[index_in_pool.value()];
auto type_name = monetdb::gdk::type_name(column.type());
std::cout
<< "Column index in pool: " << index_in_pool << '\n'
<< "Physical name: " << column.name<column_name_kind_t::physical>() << '\n'
<< "Logical name: " << column.name<column_name_kind_t::logical>() << '\n'
<< "SQL name: " << column.name<column_name_kind_t::sql>() << '\n'
<< "Element count: " << column.length() << '\n'
<< "Type: " << type_name << '\n';
if (type_name == "int"s) {
auto as_typed_span = column.as_span<int>();
auto num_elements_to_print = std::min<size_t>(
as_typed_span.size(), max_number_of_elements_to_print );
std::cout << "\nFirst " << num_elements_to_print << " elements: ";
for(auto x : column.as_span<int>()) {
std::cout << x << ' ';
if (--num_elements_to_print == 0) break;
}
std::cout << '\n';
}
return EXIT_SUCCESS;
}
Notes:
- The MonetDB server process for the DB you're reading most not be running. If it is running, the library will fail to obtain a lock and throw an exception.
- Only one version of MonetDB is supported at any given time (or rather, one version of MonetDB's GDK library; occasionally, subsequent MonetDB versions are compatible enough not to bump the GDK library's version number).
- Not all versions of MonetDB (/GDK) are supported; but the latest ones are, at the time of writing.
- There's not much support in the library for variable-width types other than by using the lower-level, MonetDB-specific data structures (through the
.record()
method of column proxies); but you can have a look atsrc/examples/monetdb-bp-reader.cpp
which prints the contents of string columns to the console - to see how it's done.
- An x86_64 machine capable of running GNU/Linux - hopefully this restriction can be lifted soon.
- CMake 3.1
- GCC 5.4.0 or up (probably any version starting from 5.0 should work really, and clang or MSVC may also work - but I haven't tested)
- A GNU/Linux x86_64 operating system - hopefully a temporary restriction.
- A recent MonetDB version, e.g. 11.33.3. Download it:
- as compiled binaries from the Download section, or
- as a source tarball from here, then build from sources
Just use CMake to configure, generate build files and build. Both in-source and out-of-source building is supported.
The library will be built as $BUILD_DIRECTORY/lib/monetdb-bbp-reader.a
, with some generated include files under $BUILD_DIRECTORY/include
in addition to those under src/
Installation is currently not supported, so that part is on you for now.
If you have...
- A bug / feature request -> File a new issue in the Issues section.
- A question about MonetDB itself -> Try one of the MonetDB mailing lists.
- Feedback questions or other interest in the reader library -> Write Eyal, the developer and maintainer.
This library as a whole is available under a BSD 3-clause license. However, some code was obtained from other sources, and is covered by other licenses - mostly (or solely) the MonetDB code. The original MonetDB code is available from the MonetDB website in its original form; and the modified form here is made available under MPL 2.0. Please contact Eyal if further clarification is necessary or for the possibility of use under a different license.