Welcome to your new dbt project

Steps to recreate

Note: Notice the variable settings in the .env file.

Start the PostGres database and PostGresAdmin (webinterface to the database - see images further below):

docker-compose up

In another shell we start the python dev environment container (Python 3.9 on Debian 9 with pandas and dbt-postgres installed):

docker run -it --network=dv_tutorial_db-server-network --mount src="$(pwd)",target=/app,type=bind dv_tutorial_dbt bash

Now a command line in the dev container will open. In this container we run:

dbt init

You will be asked to provide some information:

name: dv_tutorial
choose 1 -> postgres
as shown below (or save the content to /root/.dbt/profiles.yml in the container):

dv_tutorial:
  outputs:
    dev:
      dbname: dv_test
      host: sqlNode1
      pass: abcd
      port: 5432
      schema: dbo
      threads: 1
      type: postgres
      user: dev
  target: dev

Database connection

Let us now check the database connection and setup:

dbt debug

This should show a working setup and database connection.

Getting some test data from Kaggle

Now copy the data files:

Products.csv
SalesOrderItems.csv

from Kaggle (Link) and save them into the seed directory.

Load the raw data into the database

Next we load the raw data in the database run:

dbt seed

This takes a moment and will insert the data into the database.

Note: Use raw stage sqls to generate a combined table from multiple sources. This allows to pull the data in one table and then use the staging sql statements to generate a staging table.

In the image below we can see that the raw tables are loaded into the database.

]

dbt allows you to run separate stage scripts so you can trace step by step what you are doing. If you want to run everything at once: dbt run.

Dependencies: automate-dv

Now you need to install the dependencies (packages.yml):

packages:
  - package: Datavault-UK/automate_dv
    version: 0.10.1

Creating the staging tables

# dbt run -s name_of_component
dbt run -s stg_product  --full-refresh # if you put a + in front of stg_product it also runs raw sql if you have that
dbt run -s stg_salesorderitems  --full-refresh

Loading Raw Vault

Creating hubs

dbt run -s hub_product  --full-refresh 
dbt run -s hub_salesorderitems  --full-refresh

Creating links

dbt run -s lnk_product_salesorderitem  --full-refresh

Creating satellites

dbt run -s sat_product_details  --full-refresh 
dbt run -s sat_salesorder_details  --full-refresh

Aggregating

As a test I tried to aggregate some simple data using a python model (models/aggregate/mean_weight_product_cat.py). In order for this to work I needed a third-party library dbt-fal[postgres].

NOTE: Install first dbt-fal[postgres] and then dbt-postgres, otherwise you have some version issues.

dbt auto doc generation

dbt can autogenerate docs.

dbt docs generate
dbt docs serve

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
docs/images		docs/images
dv_tutorial		dv_tutorial
.env		.env
.gitignore		.gitignore
PythonDbtDev.Dockerfile		PythonDbtDev.Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
init.sql		init.sql
profiles.yml		profiles.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Welcome to your new dbt project

Steps to recreate

Database connection

Getting some test data from Kaggle

Load the raw data into the database

Dependencies: automate-dv

Creating the staging tables

Loading Raw Vault

Creating hubs

Creating links

Creating satellites

Aggregating

dbt auto doc generation

Data lineage view of the docs

Implementation details

About

Uh oh!

Releases

Packages

Languages

tomerten/dv_tutorial_dbt_postgress

Folders and files

Latest commit

History

Repository files navigation

Welcome to your new dbt project

Steps to recreate

Database connection

Getting some test data from Kaggle

Load the raw data into the database

Dependencies: automate-dv

Creating the staging tables

Loading Raw Vault

Creating hubs

Creating links

Creating satellites

Aggregating

dbt auto doc generation

Data lineage view of the docs

Implementation details

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages