echo "Deploy Clickhouse"
read -s pswrd
yc managed-clickhouse cluster create \
--name otus-clickhouse- \
--shard-name shard1 \
--environment production \
--network-name my-yc-network \
--host type=clickhouse,zone-id=ru-central1-b,subnet-name=my-yc-subnet-b \
--clickhouse-resource-preset s2.small \
--clickhouse-disk-size 20 \
--clickhouse-disk-type network-ssd \
--database name=db1 \
--datalens-access \
--version 22.5 \
--enable-sql-user-management \
--service-account <account_id> \
--admin-password "$pswrd"
dbt init clickhouse_starschema
[packages]
dbt-core = "==1.0.4"
dbt-clickhouse = "==1.0.4"
[requires]
python_version = "3.9"
dbt_project.yml
require-dbt-version: ">=1.0.0"
packages.yml
packages:
- package: dbt-labs/codegen
version: 0.6.0
- package: dbt-labs/dbt_utils
version: 0.8.5
profiles.yml
clickhouse_starschema:
target: dev
outputs:
dev:
type: clickhouse
schema: db1
host: c-<ID>.rw.mdb.yandexcloud.net
port: 9440
user: <user>
password: <password>
secure: True
Scale 2 used.
ssh yc-user@<host>
git clone https://github.com/vadimtk/ssb-dbgen.git
cd ssb-dbgen
sudo apt install gcc
sudo make
./dbgen -s 2 -T a
ls -lh | grep .tbl
sudo apt install awscli
aws configure
aws --endpoint-url=https://storage.yandexcloud.net s3 ls
aws --endpoint-url=https://storage.yandexcloud.net s3 sync . s3://ssb-dbgen/ --exclude=* --include=*.tbl --acl=public-read
aws --endpoint-url=https://storage.yandexcloud.net s3 ls
aws --endpoint-url=https://storage.yandexcloud.net s3 ls ssb-dbgen
I've used var to define custom schema in macro:
dbt_project.yml
vars:
schema: 'db1'
I've used
dbt run-operation generate_model_yaml --args '{"model_name": "customers"}'
to generate .yml files. Also prepared additional model using dbt-utils pivot macro.
dbt compile
dbt debug
I've used additional macro based on dbt-utils surrogate_key one to generate surrogate and hash keys.
dbt test
dbt docs generate
dbt docs serve
Added dbt_utils.unique_combination_of_columns
test for (LO_ORDERKEY, LO_LINENUMBER)
combination.