Skip to content

Add datagen tools for TPC-DS benchmarks #158

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
37 changes: 37 additions & 0 deletions tpcds/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Polars Decision Support DS (PDS-DS) Benchmark

The official TPC-DS tools can be found at the website of TPC (https://www.tpc.org/tpcds/default5.asp). This repository is based on `DSGen-software-code-4.0.0`. Changes have been made to run this code on MACOS.

**NOTE**

This benchmark is currently in active development and not yet ready for benchmarking purposes.

---

## Build the required datagen tooling

```bash
>>> git clone https://github.com/pola-rs/polars-benchmark/
>>> cd polars-benchmark/tpcds/tools
```

### Run on Linux

```bash
>>> make dsdgen
```

### Run on MACOS

```bash
>>> make OS=MACOS dsdgen
```

## Generate TPC-DS datasets

```bash
# create a folder for data and set scaling factor with -scale

>>> mkdir -p ../data/
>>> ./dsdgen -scale 1 -dir ../data/
```
105 changes: 105 additions & 0 deletions tpcds/answer_sets/1.ans
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
C_CUSTOMER_ID
----------------
AAAAAAAAAAABBAAA
AAAAAAAAAAADBAAA
AAAAAAAAAAADBAAA
AAAAAAAAAAAKAAAA
AAAAAAAAAABDAAAA
AAAAAAAAAABHBAAA
AAAAAAAAAABLAAAA
AAAAAAAAAABMAAAA
AAAAAAAAAACHAAAA
AAAAAAAAAACMAAAA
AAAAAAAAAADDAAAA
AAAAAAAAAADGAAAA
AAAAAAAAAADGBAAA
AAAAAAAAAADGBAAA
AAAAAAAAAADPAAAA
AAAAAAAAAAEBAAAA
AAAAAAAAAAEFBAAA
AAAAAAAAAAEGBAAA
AAAAAAAAAAEIAAAA
AAAAAAAAAAEMAAAA
AAAAAAAAAAFAAAAA
AAAAAAAAAAFPAAAA
AAAAAAAAAAGGBAAA
AAAAAAAAAAGHBAAA
AAAAAAAAAAGJAAAA
AAAAAAAAAAGMAAAA
AAAAAAAAAAHEBAAA
AAAAAAAAAAHFBAAA
AAAAAAAAAAIEBAAA
AAAAAAAAAAJGBAAA
AAAAAAAAAAJHBAAA
AAAAAAAAAAKCAAAA
AAAAAAAAAAKCAAAA
AAAAAAAAAAKJAAAA
AAAAAAAAAAKMAAAA
AAAAAAAAAAKMAAAA
AAAAAAAAAALAAAAA
AAAAAAAAAALABAAA
AAAAAAAAAALGAAAA
AAAAAAAAAALHBAAA
AAAAAAAAAALJAAAA
AAAAAAAAAANHAAAA
AAAAAAAAAANHBAAA
AAAAAAAAAANJAAAA
AAAAAAAAAANMAAAA
AAAAAAAAAANMAAAA
AAAAAAAAAANNAAAA
AAAAAAAAAAOBBAAA
AAAAAAAAAAODBAAA
AAAAAAAAAAOLAAAA
AAAAAAAAAAPGBAAA
AAAAAAAAABAAAAAA
AAAAAAAAABAEAAAA
AAAAAAAAABAEBAAA
AAAAAAAAABAFBAAA
AAAAAAAAABAIAAAA
AAAAAAAAABAOAAAA
AAAAAAAAABBDBAAA
AAAAAAAAABCFAAAA
AAAAAAAAABCHBAAA
AAAAAAAAABDHAAAA
AAAAAAAAABENAAAA
AAAAAAAAABFEBAAA
AAAAAAAAABFGAAAA
AAAAAAAAABFMAAAA
AAAAAAAAABFPAAAA
AAAAAAAAABGFAAAA
AAAAAAAAABGFBAAA
AAAAAAAAABGJAAAA
AAAAAAAAABIBBAAA
AAAAAAAAABICBAAA
AAAAAAAAABIIAAAA
AAAAAAAAABJNAAAA
AAAAAAAAABKGBAAA
AAAAAAAAABLOAAAA
AAAAAAAAABLPAAAA
AAAAAAAAABMABAAA
AAAAAAAAABMPAAAA
AAAAAAAAABNAAAAA
AAAAAAAAABNCBAAA
AAAAAAAAABNEBAAA
AAAAAAAAABNLAAAA
AAAAAAAAABNOAAAA
AAAAAAAAABNPAAAA
AAAAAAAAABOAAAAA
AAAAAAAAABOFBAAA
AAAAAAAAABOOAAAA
AAAAAAAAABOPAAAA
AAAAAAAAABPEAAAA
AAAAAAAAACADAAAA
AAAAAAAAACAFAAAA
AAAAAAAAACAFAAAA
AAAAAAAAACAHBAAA
AAAAAAAAACAJAAAA
AAAAAAAAACBDAAAA
AAAAAAAAACBDAAAA
AAAAAAAAACBEBAAA
AAAAAAAAACBNAAAA
AAAAAAAAACBPAAAA
AAAAAAAAACCHAAAA

100 rows selected.

10 changes: 10 additions & 0 deletions tpcds/answer_sets/10.ans
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
C C CD_EDUCATION_STATUS CNT1 CD_PURCHASE_ESTIMATE CNT2 CD_CREDIT_ CNT3 CD_DEP_COUNT CNT4 CD_DEP_EMPLOYED_COUNT CNT5 CD_DEP_COLLEGE_COUNT CNT6
- - -------------------- ---------- -------------------- ---------- ---------- ---------- ------------ ---------- --------------------- ---------- -------------------- ----------
F D Advanced Degree 1 3000 1 High Risk 1 2 1 4 1 5 1
F D Unknown 1 1500 1 Good 1 6 1 5 1 4 1
M D College 1 8500 1 Low Risk 1 3 1 0 1 1 1
M D Primary 1 7000 1 Unknown 1 2 1 1 1 1 1
M W Unknown 1 4500 1 Good 1 5 1 0 1 1 1



90 changes: 90 additions & 0 deletions tpcds/answer_sets/11.ans
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
CUSTOMER_ID CUSTOMER_FIRST_NAME CUSTOMER_LAST_NAME C
---------------- -------------------- ------------------------------ -
AAAAAAAAAFGBBAAA Howard Major Y
AAAAAAAAAMGDAAAA Kenneth Harlan Y
AAAAAAAAAOPFBAAA Jerry Fields Y
AAAAAAAABLEIBAAA Paula Wakefield Y
AAAAAAAABNBBAAAA Irma Smith Y
AAAAAAAACADPAAAA Cristobal Thomas Y
AAAAAAAACFENAAAA Christopher Dawson
AAAAAAAACIJMAAAA Elizabeth Thomas Y
AAAAAAAACJDIAAAA James Kerr N
AAAAAAAACNAGBAAA Virginia May N
AAAAAAAADBEFBAAA Bennie Bowers N
AAAAAAAADCKOAAAA Robert Gonzalez N
AAAAAAAADFKABAAA Latoya Craft N
AAAAAAAADIIOAAAA David Carroll Y
AAAAAAAADIJGBAAA Ruth Sanders N
AAAAAAAADLHBBAAA Henry Bertrand N
AAAAAAAAEADJAAAA Ruth Carroll N
AAAAAAAAEJDLAAAA Alice Wright N
AAAAAAAAEKFPAAAA Annika Chin N
AAAAAAAAEKJLAAAA Aisha Carlson Y
AAAAAAAAEPOGAAAA Felisha Mendes Y
AAAAAAAAFACEAAAA Priscilla Miller N
AAAAAAAAFBAHAAAA Michael Williams N
AAAAAAAAFGIGAAAA Eduardo Miller Y
AAAAAAAAFGPGAAAA Albert Wadsworth Y
AAAAAAAAFMHIAAAA Emilio Darling Y
AAAAAAAAFOGIAAAA Michelle Greene N
AAAAAAAAFOJAAAAA Don Castillo Y
AAAAAAAAGEHIAAAA Tyler Miller N
AAAAAAAAGHPBBAAA Nick Mendez Y
AAAAAAAAGNDAAAAA Terry Mcdowell N
AAAAAAAAHGOABAAA Sonia White N
AAAAAAAAHHCABAAA William Stewart Y
AAAAAAAAHJLAAAAA Audrey Beltran Y
AAAAAAAAHMJNAAAA Ryan Baptiste Y
AAAAAAAAHMOIAAAA Grace Henderson N
AAAAAAAAIADEBAAA Diane Aldridge N
AAAAAAAAIBAEBAAA Sandra Wilson N
AAAAAAAAIBFCBAAA Ruth Grantham N
AAAAAAAAIBHHAAAA Jennifer Ballard Y
AAAAAAAAICHFAAAA Linda Mccoy N
AAAAAAAAIDKFAAAA Michael Mack N
AAAAAAAAIJEMAAAA Charlie Cummings Y
AAAAAAAAIMHBAAAA Kathy Knowles N
AAAAAAAAIMHHBAAA Lillian Davidson Y
AAAAAAAAJDBLAAAA Melvin Taylor Y
AAAAAAAAJEKFBAAA Norma Burkholder N
AAAAAAAAJGMMAAAA Richard Larson Y
AAAAAAAAJIALAAAA Santos Gutierrez N
AAAAAAAAJKBNAAAA Julie Kern N
AAAAAAAAJMHLAAAA Wanda Ryan Y
AAAAAAAAJONHBAAA Warren Orozco N
AAAAAAAAJPINAAAA Rose Waite Y
AAAAAAAAKAECAAAA Milton Mackey N
AAAAAAAAKAPPAAAA Karen Parker Y
AAAAAAAAKJBKAAAA Georgia Scott N
AAAAAAAAKJBLAAAA Kerry Davis Y
AAAAAAAAKKGEAAAA Katie Dunbar N
AAAAAAAAKLHHBAAA Manuel Castaneda N
AAAAAAAAKNAKAAAA Gladys Banks N
AAAAAAAALFKKAAAA Ignacio Miller Y
AAAAAAAALHMCAAAA Brooke Nelson Y
AAAAAAAALIOPAAAA Derek Allen Y
AAAAAAAALJNCBAAA George Gamez Y
AAAAAAAAMDCAAAAA Louann Hamel Y
AAAAAAAAMFFLAAAA Margret Gray Y
AAAAAAAAMMOBBAAA Margaret Smith N
AAAAAAAANFBDBAAA Vernice Fernandez Y
AAAAAAAANGDBBAAA Carlos Jewell N
AAAAAAAANIPLAAAA Eric Lawrence Y
AAAAAAAANJAGAAAA Allen Hood Y
AAAAAAAANJHCBAAA Christopher Schreiber N
AAAAAAAANJOLAAAA Debra Underwood Y
AAAAAAAAOBADBAAA Elizabeth Burnham N
AAAAAAAAOCAJAAAA Jenna Staton N
AAAAAAAAOCLBBAAA
AAAAAAAAODMMAAAA Gayla Cline N
AAAAAAAAOFLCAAAA James Taylor N
AAAAAAAAOPDLAAAA Ann Pence N
AAAAAAAAPDFBAAAA Terrance Banks Y
AAAAAAAAPEHEBAAA Edith Molina Y
AAAAAAAAPFCLAAAA Felicia Neville N
AAAAAAAAPICEAAAA Jennifer Cortez Y
AAAAAAAAPJENAAAA Ashley Norton Y
AAAAAAAAPKBCBAAA Andrea White N
AAAAAAAAPKIKAAAA Wendy Horvath Y
AAAAAAAAPMMBBAAA Paul Jordan N
AAAAAAAAPPIBBAAA Candice Lee Y
Loading