see https://app.clickup.com/t/863fw5kbr
Since I have this github repo linked to my dbt project all the folders that are used in the dbt project are in the root of the repo. That is why I copied all the content from the folders under week_4_analytics_engineering to the root of the repo. And that is why there are many folders in the root of the repo, when actually only the week_* folders should be there.
Check the "Prerequisites" below to see what you need to have in BQ in order before starting with the steps in this week.
Goal: Transforming the data loaded in DWH to Analytical Views developing a dbt project. Slides
We will build a project using dbt and a running data warehouse. By this stage of the course you should have already:
-
A running warehouse (BigQuery or postgres)
-
A set of running pipelines ingesting the project dataset (week 3 completed): Taxi Rides NY dataset
- Yellow taxi data - Years 2019 and 2020
- Green taxi data - Years 2019 and 2020
- fhv data - Year 2019.
#NOTE: to have this see week_4_analytics_engineering\prefect_flows\README.md
Note:
- A quick hack has been shared to load that data quicker, check instructions in week3/extras
- If you recieve an error stating "Permission denied while globbing file pattern." when attemting to run fact_trips.sql this video may be helpful in resolving the issue
🎥 Video
You will need to create a dbt cloud account using this link and connect to your warehouse following these instructions. More detailed instructions in dbt_cloud_setup.md
Optional: If you feel more comfortable developing locally you could use a local installation of dbt as well. You can follow the official dbt documentation or follow the dbt with BigQuery on Docker guide to setup dbt locally on docker. You will need to install the latest version (1.0) with the BigQuery adapter (dbt-bigquery).
As an alternative to the cloud, that require to have a cloud database, you will be able to run the project installing dbt locally.
You can follow the official dbt documentation or use a docker image from oficial dbt repo. You will need to install the latest version (1.0) with the postgres adapter (dbt-postgres).
After local installation you will have to set up the connection to PG in the profiles.yml
, you can find the templates here
- What is analytics engineering?
- ETL vs ELT
- Data modeling concepts (fact and dim tables)
🎥 Video
- Intro to dbt
🎥 Video
- Starting a new project with dbt init (dbt cloud and core)
- dbt cloud setup
- project.yml
🎥 Video
- Starting a new project with dbt init (dbt cloud and core)
- dbt core local setup
- profiles.yml
- project.yml
🎥 Video
- Anatomy of a dbt model: written code vs compiled Sources
- Materialisations: table, view, incremental, ephemeral
- Seeds, sources and ref
- Jinja and Macros
- Packages
- Variables
🎥 Video
Note: This video is shown entirely on dbt cloud IDE but the same steps can be followed locally on the IDE of your choice
- Tests
- Documentation
🎥 Video
Note: This video is shown entirely on dbt cloud IDE but the same steps can be followed locally on the IDE of your choice
- Deployment: development environment vs production
- dbt cloud: scheduler, sources and hosted documentation
🎥 Video
- Deployment: development environment vs production
- dbt cloud: scheduler, sources and hosted documentation
🎥 Video
- Google data studio
- Metabase (local installation)
More information here
Did you take notes? You can share them here.
- Notes by Alvaro Navas
- Sandy's DE learning blog
- Add your notes here (above this line)