RDF Fusion is an experimental columnar SPARQL engine. It is based on Apache DataFusion, an extensible query engine based on Apache Arrow.
A primary goal of RDF Fusion is to preserve the strengths of DataFusion and make them available to the Semantic Web community. These strengths include:
- Extensibility: DataFusion features many extension points that we use to implement SPARQL. We expose these extension points to RDF Fusion users for developing customized SPARQL dialects. In the future, we would like to provide further extension points.
- Performance: DataFusion features a vectorized execution engine that can leverage the capabilities of modern CPUs. We track RDF Fusion's performance on CodSpeed and will provide comparisons to other query engines soon.
- Boring Architecture: DataFusion implements an "industry-proven" architecture for query planning and query execution. We refer to DataFusion's documentation for this purpose.
RDF Fusion can currently be used in two modes: as a "library" for DataFusion or via a convenient Store API.
The Store
API provides high-level methods for interacting with a triple store (e.g., inserting, querying).
Users that want to use RDF Fusion's capability are advised to use this API.
While the Store
API is similar to Oxigraph's Store
(remember, it started as a fork) there is not a full
compatibility.
TODO point to example
Only use RDF Fusion as a "library" for DataFusion and directly interact with DataFusion's APIs.
Users that want to significantly extend RDF Fusion's capability are advised to use this API.
Note that limited extension points can also be used via the Store
API (e.g., not altering SPARQL syntax).
Users can use RDF Fusion's implementation of SPARQL operators directly via DataFusion. They have full control over the processing of the query and only choose and pick the required parts of RDF Fusion.
TODO point to example
Here is a short comparison with other open-source SPARQL databases.
- Oxigraph was a major inspiration for this project, as RDF Fusion started as a fork from it. Oxigraph builds on a row-based query engine and cannot be extended with DataFusion's extension points. On the other side, Oxigraph is expected to have less overhead for non-CPU-bound queries and is more "battle-tested".
- Apache Jena is probably the de facto standard for experimenting with custom SPARQL dialects. It is implemented in Java and has a row-based query engine.
You can use cargo
to interact with the codebase or use Just to run the pre-defined
commands, also used for continuous integration builds.
git clone --recursive https://github.com/tobixdev/graphfusion.git # Clone Repository
git submodule update --init # Initialize submodules
just test # Run tests
Feel free to use GitHub discussions to ask questions or talk about RdfFusion. Bug reports are also very welcome.
This project is licensed under either of
- Apache License, Version 2.0, (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in RDF Fusion by you, as defined in the Apache-2.0 license, shall be dually licensed as above, without any additional terms or conditions.
The project started as a fork from Oxigraph, a graph database written in Rust with a custom SPARQL query engine. While large portions of the codebase have been written from scratch, there is still code from Oxigraph in this repository.