Building a Mature dbt Project from Scratch

Hello! This is the companion repo to the 2021 Coalesce Talk - Building a Mature dbt Project from Scratch

Introduction

With the explosion in popularity of dbt, and the coinciding explosion in features and capabilities in the tool, it's natural for many of us to find ourselves unsure of where to start. Many people come across dbt through a recommendation of a particularly powerful feature that dbt can support, like complex macros or intricate incremental model logic, but it's both intimidating and unwise to dive directly into the deep end. Like with any tool, it's best to walk before you run, and learn how these features both complement and build on each other so you can be confident you've developed a strong, sustainable, and scalable dbt project.

Purpose of this Repo

The goal of this repository is to show a single dbt project at different lifecycle stages, showing opinionated view of when to introduce certain dbt features into your project. Each stage has a particular theme/purpose, and the listed feature sets connect to that learning goal. This is intended to be both a resource for new dbt users to use as a jumping off point for starting a new project from scratch, and a rubric for existing dbt users to peg their own use of dbt features against this model to find opportunities for growth.

In each stage listed below (and in the accompanying talk), you'll see:

A theme/purpose for the life stage
Features relevant to the stage (with links to the relevant dbt docs)
A picture of the DAG of the example project in that stage
Links to slack channels on the dbt Community Slack that would be of interest!

Some caveats and assumptions:

There are real life use cases where some features get introduced into projects out of the order described here, and that is perfectly reasonable. There are often very justifiable reasons to introduce more advanced dbt features earlier in the development cycle.
There is no sense of timescale in this presentation! Some teams may mature their project in weeks rather than months, depending on a wide range of factors. It's more important to think about how features build upon themselves (and each other) rather than how quickly they do so.
This presentation assumes familiarity and comfortability with git and version control, and that all of the projects are already managed in a repository

Projects

Each project is built on a mock data set of patients, doctors, claims, and other billing data. It was generated via the Mockaroo API. Huge hat-tip to @krevitt for building a sweet G-sheet x Mockaroo integration! In the 0-raw-data project, you can find the sample dataset this was built from, so you can load them into your warehouse and run each project to get a feel for how the functionality works!

Infancy

Congratulations! It's (sorta!) a DAG!!

This project represents truly the bare minimum needed to have dbt do anything of use. It's really only technically a dbt project, but is going to need a lot of hand holding to do anything useful and keep it alive.

Theme: 🍼 Bare Necessities 🧷

Features

Relevant Commands

dbt seed
dbt run

DAG

Relevant Community Slack Channels

#advice-dbt-for-beginners

Toddlerhood

This project is just starting to play with its blocks, and see how the world fits together. It can now handle multiple models, and it's able to see the difference between raw and transformed data.

Theme: 🟩 Building Blocks 🟦

Features

Models
- adds {{ ref() }} functionality! Modularize your model!
Sources
- uses {{ source() }} functionality, builds a layer of abstraction between source data and your transformations
dbt Macros
- Start to understand some of the key built-in macros that make dbt work.
Docs
- single model documentation for critical models
Tests
- last-mile testing for final reporting objects

Relevant Commands

dbt seed
dbt run
dbt test
dbt docs generate
dbt docs serve

DAG

Relevant Community Slack Channels

#advice-dbt-for-beginners
#advice-data-testing

Childhood

Now we're starting to let our project free into the world. Time to set some ground rules! You wouldn't send your project to school without a list of allergies, so it's time to let people know how they should be interacting with your project

Theme: 🏗️ Structure and Rules 📏

Features

Project Standards and Documentation
- not technically a dbt feature per se, but critical to scaling!
- README
- Style Guide
- Contribution Guide
- PR Template
Testing
- Standard minimum testing requirements
Docs
- Model-level descriptions for all models
- Deployed and shared widely
Materializations
- table
Deployment (after all of the above!)

Relevant Commands

dbt compile
dbt seed
dbt run
dbt test
dbt build
dbt docs generate
dbt docs serve

Relevant Community Slack Channels

#advice-dbt-for-beginners
#advice-data-testing
#advice-data-modeling

DAG

Adolescence

Look at your beautiful project, all grown up, about to go to prom. At this stage, your project is learning things fast, and is looking to figure out ways to work smarter not harder (so it can spend more time at 7/11 with their friends)

Theme: 🏋️ Growth and Optimization 🚀

Features

Sources
- Freshness
Packages
Materializations
- Incremental
- Ephemeral
Documentation
- column-level docs for key metrics/critical columns
Macros
- In-model SQL simplification
Custom Deployments (specific jobs)

Relevant Commands

dbt deps
dbt compile
dbt seed
dbt run
dbt test
dbt build
dbt docs generate
dbt docs serve

DAG

Relevant Community Slack Channels

#advice-dbt-for-beginners
#advice-data-testing
#advice-data-modeling
#advice-dbt-for-power-users
Relevant tool specific channels (i.e. #tools-looker, #tools-meltano)

Adulthood

By the time your project reaches adulthood, the basics of dbt should be humming along just fine, and that should buy it time to think back on its life, look inward, and fingure out how it fits into the world. How has your project grown and changed? How does it relate to the world around it?

Theme: 📓 Self Reflection 🔬

Features

Macros
- Operations for object management
Selectors/Tags
Custom Schema/Database Behavior
Custom Generalized Test
Hooks & Operations
Exposures
- For dbt Cloud users: unlocks status tiles

Relevant Commands

dbt deps
dbt compile
dbt source freshness
dbt seed
dbt run
dbt test
dbt build
dbt run-operation
dbt docs generate
dbt docs serve

DAG

Relevant Community Slack Channels

#advice-dbt-for-beginners
#advice-data-testing
#advice-data-modeling
#advice-dbt-for-power-users
Relevant tool specific channels (i.e. #tools-looker, #tools-meltano, #db-snowflake)
#towards-analytics-engineering
#metadata

These things are advanced level (middle aged?)!

Introspective Analyses on dbt-produced artifacts
- if Cloud: Metadata API
- if Core: dbt-artifacts package
- Project Health Metrics
  - Test Coverage
  - Model Runtimes

Omitted Features

Some features are not included in this project, not because they are unimportant, but because they generally are only used as-needed when the specifics of your data/project call for it.

Snapshots
Seeds (although the raw data project has a good example!)
Variables/Environment Variables
Analyses

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
0-raw-data		0-raw-data
1-infancy		1-infancy
2-toddlerhood		2-toddlerhood
3-childhood		3-childhood
4-adolescence		4-adolescence
5-adulthood		5-adulthood
README.md		README.md

dbt-labs/dbt-project-maturity

Folders and files

Latest commit

History

Repository files navigation

Building a Mature dbt Project from Scratch

Introduction

Purpose of this Repo

Some caveats and assumptions:

Projects

Infancy

Theme: 🍼 Bare Necessities 🧷

Features

Relevant Commands

DAG

Relevant Community Slack Channels

Toddlerhood

Theme: 🟩 Building Blocks 🟦

Features

Relevant Commands

DAG

Relevant Community Slack Channels

Childhood

Theme: 🏗️ Structure and Rules 📏

Features

Relevant Commands

Relevant Community Slack Channels

DAG

Adolescence

Theme: 🏋️ Growth and Optimization 🚀

Features

Relevant Commands

DAG

Relevant Community Slack Channels

Adulthood

Theme: 📓 Self Reflection 🔬

Features

Relevant Commands

DAG

Relevant Community Slack Channels

These things are advanced level (middle aged?)!

Omitted Features

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Packages