Benchmark tool to test StarRocks using several benchmarks.
- python3
- python libraries: pymysql
Use command
pip3 install pymysql
to install.Use command
yum install python-pip
to install pip3 if the machine does not have pip3. - mysqlslap: This benchmark tool uses mysqlslap to test the StarRocks's performance
Use command
yum install mysql
to install mysqlslap.
bin
: directory for some scriptsconf
: directory for conf filesresult
: directory to store query resultssql
: directory for all SQL files, there will be some sub-directories for different benchmarkstpch
: tpch benchmark SQL files includingcreate
,load
andquery
ssb
: ssb benchmark SQL files includingcreate
,load
andquery
src
: directory for tool codesthirdparty
: directory to store third party modules, such as dbgen for tpch, ssb
All the scripts under bin
directory:
gen_data
: tools to gen data like tpch, ssb, ...- gen-tpch.sh: script to gen tpch data
- gen-ssb.sh: script to gen ssb data
- create_db_table.sh: script to create tables
- stream_load.sh: script to load data into StarRocks using
stream load
- broker_load.sh: script to load data into StarRocks using
broker load
(not finished yet) - flat_insert.sh: script to load data into StarRocks using
insert into
(not finished yet) - benchmark.sh: script to test the performance or check the result correctness
- Make sure the
Requirements
finished. - Compile the dbgen tool under
thirdparty
directory that you want.- tpch's
dbgen
binary is directly provided, we will addMakefile
later.
- tpch's
- Make sure a StarRocks cluster is ready,
and you know the configuration that will be used in
conf/starrocks.conf
file. - Choose the benchmark you want, follow the specified steps bellow.
not finished yet
-
Configure the StarRocks cluster info in file
conf/starrocks.conf
You should check and modify the IP, port, database info if needed.
You can change other parameters if know them well.
-
Create tables
# create tables for 100GB data ./bin/create_db_table.sh ddl_100
You can specify other directory name (under sql/tpch directory) in which there are
create table
SQL files. There are some subtle differences between the same table's SQL files under different directories, like: different bucket size, different column order, which are for performance only. You can directly usecreate table
SQL files under ddl_100 for smaller data, such as 1GB. -
Generate data
# generate 100GB data under the `data_100` directory ./bin/gen_data/gen-tpch.sh 100 data_100 # generate 1TB data under the `data_1T` directory ./bin/gen_data/gen-tpch.sh 1000 data_1T
You can change
100
to1
to gen 1G data quickly for test. Such as:./bin/gen_data/gen-tpch.sh 1 data_1G
You can use either absolute or relative directory path to store generated data. Such as:
./bin/gen_data/gen-tpch.sh 1 data/data_1G-2
This gen-tpch.sh script just wraps the tpch-dbgen tool for convenience.
You can run command
make
underthirdparty/tpch-dbgen
directory to gendbgen
binary, where the dbgen source version is 3.0.0 downloaded from tpc.org .You can also download the latest version of tpch-dbgen tool from tpc.org directly by yourself, or see more information from other web pages, like Data generation tool, etc.
-
Load data using stream load
# load 100GB data into StarRocks ./bin/stream_load.sh data_100
data_100
is the directory path with data you generated. You can either specify a absolute path or a relative path. -
Test the performance
./bin/benchmark.sh -v -p -d tpch
See more information with
./bin/benchmark.sh -h
-
Check the result
./bin/benchmark.sh -v -c -d tpch
Recently, you can check the result in the logs. (The expected result hasn't been put in the
result
directory yet)
It's for developers or testers. You can add in more benchmarks, including data gen tool, SQL query file, etc.
All SQL files are under the sql
directory.
There are several sub-directories for different benchmarks, one benchmark a directory.
Such as ssb
, tpch
, tpcds
, etc.
Under each benchmark directory (just take the tpch
directory for an example), there are serveral kinds of directories:
-
ddl*
: There is usually a***_create.sql
file to create all the tables. Different directories are for different data size with some differentcreate table
properties.See detail info in tpch-README
-
query
: There may be several sub-directories for different query purposes.Take the
ssb
benchmark for an example, there aressb
,ssb-flat
,ssb-low_cardinality
sub-directories, where thessb-flat
is for queries on the flatten tablelineorder_flat
, and thessb-low_cardinality
is for queries in low cardinality situation. -
insert
: We can insert data into a flatten wide table from other tables, mainly forssb
benchmark recently.
Tools to generate data for different benchmarks. A simple copy for each.
Add links here (TODO)