Skip to content

Latest commit

 

History

History
83 lines (66 loc) · 2.93 KB

reproducibility.md

File metadata and controls

83 lines (66 loc) · 2.93 KB

Reproducibility

Introduction

This document describes how to reproduce the results of the paper "Grafite: Taming Adversarial Queries with Optimal Range Filters".

Requirements

The following software is required to reproduce the experiments of the paper:

  • gcc-11 (or later)
  • Boost 1.67.0 (or later)
  • Git 2.13 (or later)
  • Python 3.8 (or later)
    • Jupyter Notebook 6.0.3 (or later)
    • Matplotlib 3.2.1 (or later)
    • Numpy 1.18.2 (or later)
    • Pandas 1.0.3 (or later)
  • Bash 4.4 (or later)
    • realpath 8.28 (or later)
    • wget 1.19.4 (or later)
    • zsdt 0.0.1 (or later)
    • md5sum 8.28 (or later)
  • Latex full (for generating the plots)

We advise to use a Linux machine with at least 64GB of RAM.

Building

To compile the files needed to run the experiments, do the following. First, clone the repository including all submodules and compile the code:

git clone --recurse-submodules -j8 https://github.com/marcocosta97/grafite.git
cd grafite
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
make -j8

Setting the environment

Create a folder wherever desired (we assume it to be alongside the grafite path) to contain the artifacts of the experiments and move into it:

mkdir ../../paper_results && cd ../../paper_results

Then, download the books_200M_uint64, fb_200M_uint64 and osm_cellids_200M_uint64 datasets (they will be placed in the real_datasets subfolder) and generate the workloads:

bash ../grafite/bench/scripts/download_datasets.sh
bash ../grafite/bench/scripts/generate_datasets.sh ../grafite/build real_datasets

The generated workloads will be in the workloads subfolder.

Running the experiments

Now you can execute the tests:

bash ../grafite/bench/scripts/execute_tests.sh ../grafite/build workloads

The results will be in the results subfolder.

Note that the experiments will take a long time to complete.

Finally, to generate the figures simply copy the grafite/bench/scripts/graphs.ipynb notebook in the paper_results folder and run it, the figures will be saved in the figures subfolder as pdf files.

At the end of the process, the paper_results folder will have the following structure:

  • real_datasets: contains the original books_200M_uint64, fb_200M_uint64 and osm_cellids_200M_uint64 datasets
  • workloads: contains the generated datasets and workloads for each test
  • results: contains the results of the experiments in csv format
  • figures: contains the figures of the paper

Figures and Tables

The following figures of the paper are generated by the graphs.ipynb notebook in the paper_results/figures folder:

  • Figure 1. corr_test_small.pdf
  • Figure 3. corr_test_twolines.pdf
  • Figure 4. fpr_test_heuristics.pdf
  • Figure 5. fpr_test_bounded.pdf
  • Figure 6. true_queries_test.pdf
  • Figure 7. constr_time_test.pdf

also the following tables are generated:

  • Table of Fig.4. table_heuristics.tex
  • Table of Fig.5. table_bounded.tex