Skip to content

willweld/dbt-project-maturity

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

48 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Building a Mature dbt Project from Scratch

image

Hello! This is the companion repo to the 2021 Coalesce Talk - Building a Mature dbt Project from Scratch

Introduction

With the explosion in popularity of dbt, and the coinciding explosion in features and capabilities in the tool, it's natural for many of us to find ourselves unsure of where to start. Many people come across dbt through a recommendation of a particularly powerful feature that dbt can support, like complex macros or intricate incremental model logic, but it's both intimidating and unwise to dive directly into the deep end. Like with any tool, it's best to walk before you run, and learn how these features both complement and build on each other so you can be confident you've developed a strong, sustainable, and scalable dbt project.

Purpose of this Repo

The goal of this repository is to show a single dbt project at different lifecycle stages, showing opinionated view of when to introduce certain dbt features into your project. Each stage has a particular theme/purpose, and the listed feature sets connect to that learning goal. This is intended to be both a resource for new dbt users to use as a jumping off point for starting a new project from scratch, and a rubric for existing dbt users to peg their own use of dbt features against this model to find opportunities for growth.

In each stage listed below (and in the accompanying talk), you'll see:

  1. A theme/purpose for the life stage
  2. Features relevant to the stage (with links to the relevant dbt docs)
  3. A picture of the DAG of the example project in that stage
  4. Links to slack channels on the dbt Community Slack that would be of interest!

Some caveats and assumptions:

  • There are real life use cases where some features get introduced into projects out of the order described here, and that is perfectly reasonable. There are often very justifiable reasons to introduce more advanced dbt features earlier in the development cycle.
  • There is no sense of timescale in this presentation! Some teams may mature their project in weeks rather than months, depending on a wide range of factors. It's more important to think about how features build upon themselves (and each other) rather than how quickly they do so.
  • This presentation assumes familiarity and comfortability with git and version control, and that all of the projects are already managed in a repository

Projects

Each project is built on a mock data set of patients, doctors, claims, and other billing data. It was generated via the Mockaroo API. Huge hat-tip to @krevitt for building a sweet G-sheet x Mockaroo integration! In the 0-raw-data project, you can find the sample dataset this was built from, so you can load them into your warehouse and run each project to get a feel for how the functionality works!

Infancy

Congratulations! It's (sorta!) a DAG!!

This project represents truly the bare minimum needed to have dbt do anything of use. It's really only technically a dbt project, but is going to need a lot of hand holding to do anything useful and keep it alive.

Theme: 🍼 Bare Necessities 🧷

Features

Relevant Commands

  • dbt seed
  • dbt run

DAG

image

Relevant Community Slack Channels

  • #advice-dbt-for-beginners

Toddlerhood

This project is just starting to play with its blocks, and see how the world fits together. It can now handle multiple models, and it's able to see the difference between raw and transformed data.

Theme: 🟩 Building Blocks 🟦

Features

  • Models
    • adds {{ ref() }} functionality! Modularize your model!
  • Sources
    • uses {{ source() }} functionality, builds a layer of abstraction between source data and your transformations
  • dbt Macros
    • Start to understand some of the key built-in macros that make dbt work.
  • Docs
    • single model documentation for critical models
  • Tests
    • last-mile testing for final reporting objects

Relevant Commands

  • dbt seed
  • dbt run
  • dbt test
  • dbt docs generate
  • dbt docs serve

DAG

image

Relevant Community Slack Channels

  • #advice-dbt-for-beginners
  • #advice-data-testing

Childhood

Now we're starting to let our project free into the world. Time to set some ground rules! You wouldn't send your project to school without a list of allergies, so it's time to let people know how they should be interacting with your project

Theme: πŸ—οΈ Structure and Rules πŸ“

Features

Relevant Commands

  • dbt compile
  • dbt seed
  • dbt run
  • dbt test
  • dbt build
  • dbt docs generate
  • dbt docs serve

Relevant Community Slack Channels

  • #advice-dbt-for-beginners
  • #advice-data-testing
  • #advice-data-modeling

DAG

image

Adolescence

Look at your beautiful project, all grown up, about to go to prom. At this stage, your project is learning things fast, and is looking to figure out ways to work smarter not harder (so it can spend more time at 7/11 with their friends)

Theme: πŸ‹οΈ Growth and Optimization πŸš€

Features

Relevant Commands

  • dbt deps
  • dbt compile
  • dbt seed
  • dbt run
  • dbt test
  • dbt build
  • dbt docs generate
  • dbt docs serve

DAG

image

Relevant Community Slack Channels

  • #advice-dbt-for-beginners
  • #advice-data-testing
  • #advice-data-modeling
  • #advice-dbt-for-power-users
  • Relevant tool specific channels (i.e. #tools-looker, #tools-meltano)

Adulthood

By the time your project reaches adulthood, the basics of dbt should be humming along just fine, and that should buy it time to think back on its life, look inward, and fingure out how it fits into the world. How has your project grown and changed? How does it relate to the world around it?

Theme: πŸ““ Self Reflection πŸ”¬

Features

Relevant Commands

  • dbt deps
  • dbt compile
  • dbt source freshness
  • dbt seed
  • dbt run
  • dbt test
  • dbt build
  • dbt run-operation
  • dbt docs generate
  • dbt docs serve

DAG

image

Relevant Community Slack Channels

  • #advice-dbt-for-beginners
  • #advice-data-testing
  • #advice-data-modeling
  • #advice-dbt-for-power-users
  • Relevant tool specific channels (i.e. #tools-looker, #tools-meltano, #db-snowflake)
  • #towards-analytics-engineering
  • #metadata

These things are advanced level (middle aged?)!

Omitted Features

Some features are not included in this project, not because they are unimportant, but because they generally are only used as-needed when the specifics of your data/project call for it.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published