This document describes how to reproduce the results of the paper "Grafite: Taming Adversarial Queries with Optimal Range Filters".
The following software is required to reproduce the experiments of the paper:
- gcc-11 (or later)
- Boost 1.67.0 (or later)
- Git 2.13 (or later)
- Python 3.8 (or later)
- Jupyter Notebook 6.0.3 (or later)
- Matplotlib 3.2.1 (or later)
- Numpy 1.18.2 (or later)
- Pandas 1.0.3 (or later)
- Bash 4.4 (or later)
- realpath 8.28 (or later)
- wget 1.19.4 (or later)
- zsdt 0.0.1 (or later)
- md5sum 8.28 (or later)
- Latex full (for generating the plots)
We advise to use a Linux machine with at least 64GB of RAM.
To compile the files needed to run the experiments, do the following. First, clone the repository including all submodules and compile the code:
git clone --recurse-submodules -j8 https://github.com/marcocosta97/grafite.git
cd grafite
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
make -j8
Create a folder wherever desired (we assume it to be alongside the grafite
path) to contain the artifacts of the experiments and move into it:
mkdir ../../paper_results && cd ../../paper_results
Then, download the books_200M_uint64, fb_200M_uint64
and osm_cellids_200M_uint64
datasets (they will be placed in the real_datasets
subfolder) and generate the workloads:
bash ../grafite/bench/scripts/download_datasets.sh
bash ../grafite/bench/scripts/generate_datasets.sh ../grafite/build real_datasets
The generated workloads will be in the workloads
subfolder.
Now you can execute the tests:
bash ../grafite/bench/scripts/execute_tests.sh ../grafite/build workloads
The results will be in the results
subfolder.
Note that the experiments will take a long time to complete.
Finally, to generate the figures simply copy the grafite/bench/scripts/graphs.ipynb
notebook in the paper_results
folder and run it,
the figures will be saved in the figures
subfolder as pdf files.
At the end of the process, the paper_results
folder will have the following structure:
real_datasets
: contains the originalbooks_200M_uint64, fb_200M_uint64
andosm_cellids_200M_uint64
datasetsworkloads
: contains the generated datasets and workloads for each testresults
: contains the results of the experiments in csv formatfigures
: contains the figures of the paper
The following figures of the paper are generated by the graphs.ipynb
notebook in the paper_results/figures
folder:
- Figure 1.
corr_test_small.pdf
- Figure 3.
corr_test_twolines.pdf
- Figure 4.
fpr_test_heuristics.pdf
- Figure 5.
fpr_test_bounded.pdf
- Figure 6.
true_queries_test.pdf
- Figure 7.
constr_time_test.pdf
also the following tables are generated:
- Table of Fig.4.
table_heuristics.tex
- Table of Fig.5.
table_bounded.tex