SkyhookDM extends Ceph distributed object storage with data management functionality for tabular data. Data is partitioned, and partitions are stored in Ceph objects. Data management methods are applied directly within the storage system, via the object-level, i.e., 'cls' interface. Methods include both typical push-downs (offloaded to storage) for query processing such as SELECT, PROJECT, AGGREGATE, and also more general data management techniques such as indexing and physical design, including data layouts and formats. Data partitions are currently stored as Apache Arrow or Google Flatbuffers format within objects.
TODO
On Kubernetes (via Rook)
TODO
For questions, please ask about SkyhookDM on StackOverflow with the tag [skyhook-ceph]
These instructions explain the container-native development setup for SkyhookDM. With this approach, there's no need to install build dependencies, and instead we download a Docker image containing the toolchain that builds, tests and packages the library. For information on how to install Docker, take a look at the official documentation.
In addition, instead of making use of docker
commands directly, we
automate and document these tasks using Popper. You can think of
Popper as make
for containers. The Popper tasks defined for this
project (dev-init
, build
, test
, and build-rook-img
) are
defined in the .popper.yml
file. For information on
how to install Popper, take a look at the official
documentation.
Any of the tasks defined in the .popper.yml
file can be executed in
interactive mode. For example, to open a shell on the build
step:
popper sh build
The above opens an interactive shell inside an instance of the builder image, which is a pre-built image with all the dependencies needed to build the CLS.
Note that each task depends on having executed the previous one at
least once. In the example above, we must have executed the dev-init
task.
popper run dev-init
The above clones ceph and creates symlinks to this project within the
ceph tree so that the cls folder is referenced to the right place
(ceph/src/cls/tabular
within the cloned ceph/
tree is a symlink to
the root of this project).
Build the skyhookdm library:
popper run build
The above builds the code inside the build/
directory generated by
CMake inside the ceph/
folder, as one would normally expect.
To interactively build the RADOS class, or run tests within this dev environment, we can open a shell in this build container for this step:
popper sh build
Then, inside the container, we can run tests by doing:
cd ceph/build
# build
make -j4
vstart.sh
# TODO: complete it
# - create pools
# - run bin/ceph_test_skyhook_query
# - etc
Once the build
step has been executed, whether in interactive
(popper sh
) or non-interactive (popper run
) way, this next step
generates a rook-compatible docker image:
popper run build-rook-img
The image contains, in addition to upstream ceph packages, the
libcls_tabular.so
, and auxiliary binaries so that container
instances of this image can be deployed as OSDs and be able to run the
tabular class methods.
The tests execute on the image built previously (the rook image). This is done so that we can test that the SkyhookDM RADOS class can be loaded in an upstream installation. In addition, this step also produces an image that can be uploaded to a container image registry that Rook can pull from.
To run tests on the ceph/
folder, you can run the build
step in
interactive mode as described previously.
popper run test