Skip to content

A set of scripts to download, process, load and run the Public BI Benchmark data and queries on MonetDB

License

Notifications You must be signed in to change notification settings

peterboncz/public_bi_benchmark-1

 
 

Repository files navigation

BI benchmark

A set of scripts to download, process, load and run the BI Benchmark data and queries on MonetDB.

Scripts

0-get-raw-dataset.sh <destination-dir>
destination-dir: path to directory where the dataset will be stored

  • downloads and extracts the data
  • in this form, part of the data will not load and the queries will not run
  • run time estimate: 30 minutes

1-MonetDBify-raw-dataset.sh <dataset-dir>
dataset-dir: path to dataset root directory

  • fixes csv files
  • alters queries to work on MonetDB
  • disables queries that cannot be fixed
  • final queries are in the <workbook>/queries/ directory (one query per file)
  • run time estimate: 90 minutes

2-MonetDB-load-data.sh <dataset-dir>
dataset-dir: path to dataset root directory

  • creates tables in MonetDB
  • loads data into MonetDB
  • output files (*.out and *.err) in each workbook directory contain the result of the load operation
  • see Notes section for requirements
  • run time estimate: 90 minutes

3-MonetDB-run-queries.sh <dataset-dir>
dataset-dir: path to dataset root directory

  • runs the queries on MonetDB
  • expects the data to be loaded
  • output files (*.out and *.err) in <workbook>/queries/ contain the result of the queries
  • see Notes section for requirements
  • run time estimate: 5 minutes

4-process-results.sh <dataset-dir>
dataset-dir: path to dataset root directory

  • processes the output files of scripts 2 and 3
  • generates a report based on it: 2 JSON formatted files in the current directory (workbooks.json, report.json)
  • run time estimate: 2 seconds

Notes

Dataset size:

  • compressed: ~45GB
  • uncompressed: ~422GB

Requirements for scripts 2 and 3:

  • running MonetDB instance
  • mclient command to be available and configured
  • example of mclient configuration:
[user@machine ~]$ cat ~/.monetdb
user=monetdb
password=monetdb
language=sql
database=tpb

Status

[04-02-2019]

212 (422 GB)    total tables
=
198 (405 GB)    load successfully
- 168 (396 GB)  used in successful queries
14 (17 GB)      error on loading

912         total queries
=
622 (68%)   queries work AND produce results (nb_rows > 0)
38  (4%)    queries work BUT do not produce results (nb_rows = 0)
156 (17%)   impossible to fix (129 + 3 + 16 + 8)
               - 137 queries using non-existing columns (122+7+8)
               - 3 queries on non-existing tables
               - 16 incomplete queries
96  (10%)   queries using functions not supported by MonetDB

About

A set of scripts to download, process, load and run the Public BI Benchmark data and queries on MonetDB

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Shell 66.4%
  • Python 33.6%