-
Notifications
You must be signed in to change notification settings - Fork 984
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unit testing Dev Blog post #5461
Merged
Merged
Changes from 2 commits
Commits
Show all changes
10 commits
Select commit
Hold shift + click to select a range
8ca20cb
Unit testing Dev Blog post
dbeatty10 7fbbe14
Merge branch 'current' into dbeatty/unit-testing-dev-blog
dbeatty10 013d1ed
Update the date
dbeatty10 9217104
Merge branch 'dbeatty/unit-testing-dev-blog' of https://github.com/db…
dbeatty10 23cbfbb
Updated image of terminal output
dbeatty10 c8d452b
Fix formatting for code example
dbeatty10 311e2bb
Update image of terminal output
dbeatty10 34fc30d
Update model name in the "hello world" example
dbeatty10 f2e95dd
Update model name in the "hello world" example
dbeatty10 b216b98
Merge branch 'current' into dbeatty/unit-testing-dev-blog
dbeatty10 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,248 @@ | ||
--- | ||
title: "Unit testing in dbt for test-driven development" | ||
description: "In dbt v1.8, we introduce support for unit testing. In this blog post, Doug will show how to use them" | ||
slug: announcing-unit-testing | ||
|
||
authors: [doug_beatty] | ||
|
||
tags: [analytics craft] | ||
hide_table_of_contents: false | ||
|
||
date: 2024-05-06 | ||
is_featured: true | ||
--- | ||
|
||
Do you ever have "bad data" dreams? Or am I the only one that has recurring nightmares? 😱 | ||
|
||
Here's the one I had last night: | ||
|
||
It began with a midnight bug hunt. A menacing insect creature has locked my colleagues in a dungeon, and they are pleading for my help to escape . Finding the key is elusive and always seems just beyond my grasp. The stress is palpable, a physical weight on my chest, as I raced against time to unlock them. | ||
|
||
Of course I wake up without actually having saved them, but I am relieved nonetheless. And I've had similar nightmares involving a heroic code refactor or the launch of a new model or feature. | ||
|
||
Good news: beginning in dbt v1.8, we're introducing a first-class unit testing framework that can handle each of the scenarios from my data nightmares. | ||
|
||
Before we dive into the details, let's take a quick look at how we got here. | ||
|
||
<!--truncate--> | ||
|
||
## Story of data quality in dbt | ||
|
||
The underlying reason behind my bad dreams is worry about unfortunate data quality that affects shared outcomes. | ||
|
||
One of the things I loved right away when I first started using dbt was that it had a first-class mechanism for asserting data quality on our full production data in the form of [data tests](https://docs.getdbt.com/docs/build/data-tests). | ||
|
||
I no longer had to worry about whether or not my primary key was actually unique, I could just add a dbt data test to assert that expectation! | ||
|
||
`dbt test` quickly became a beloved command, allowing me to run our full suite of data quality tests in production each day. And these same tests would run in CI and development. | ||
|
||
But while this mechanism is tremendously useful at a holistic level, it doesn't lend itself as well at the granular level. It was not designed to handle minimal test cases for a model with with fixed inputs and the expected output from those inputs. Nor was it designed to handle isolated test cases that can run simultaneously for the same model. | ||
|
||
So it doesn't meet the standard software engineering use-case of setting up and running individual test cases and other [desireable properties](https://tidyfirst.substack.com/p/desirable-unit-tests). | ||
|
||
## Introducing unit testing in dbt | ||
|
||
dbt version 1.8 marks the introduction of a built-in unit testing framework to extend the capabilities of software engineering best practices for analytics engineers. It allows for crafting isolated and repeatable [unit tests](https://en.wikipedia.org/wiki/Unit_testing) that are well-suited to execute during development and CI. They are useful in a variety of scenarios like responding to **bug reports**, confident **code refactoring,** and using [test-driven development](https://en.wikipedia.org/wiki/Test-driven_development) when adding **new features**. | ||
|
||
Let's dive into the details... | ||
|
||
## Hello, unit testing world | ||
|
||
<Lightbox src="/img/blog/2024-05-06-unit-testing/hello-world.png" width="50%" title="Hello unit testing world" /> | ||
|
||
A key way that I build self-confidence is starting out with the [simplest example possible](https://en.wikipedia.org/wiki/%22Hello,_World!%22_program). Once I've gotten the initial thing to work, then I can tweak it to take on more complicated use-cases (scroll down to the ["real world example"](#real-world-example) section below for something more realistic!). So here's a super simple example that you can use to get your feet wet. Afterwards, I'll explain more about each of the main components and how you can apply them to your own test cases. | ||
|
||
First, create this trivial model: | ||
|
||
```sql | ||
-- models/hello.sql | ||
|
||
select 'world' as hello | ||
``` | ||
|
||
Then, add a simple unit test for that model: | ||
|
||
```yaml | ||
# models/_properties.yml | ||
|
||
unit_tests: | ||
- name: test_hello_world | ||
|
||
# Always only one transformation to test | ||
model: hello | ||
|
||
# No inputs needed this time! | ||
# Most unit tests will have inputs -- see the "real world example" section below | ||
given: [] | ||
|
||
# Expected output can have zero to many rows | ||
expect: | ||
rows: | ||
- {hello: world} | ||
``` | ||
|
||
Finally, run the model and all its tests in a single command like this: | ||
|
||
```shell | ||
dbt build --select hello | ||
``` | ||
|
||
<Lightbox src="/img/blog/2024-05-06-unit-testing/unit-test-terminal-output.png" title="Terminal output of hello world unit test" /> | ||
|
||
Voilà! We can see that a single unit test ran and it passed. | ||
|
||
## Crafting Unit Tests | ||
|
||
After you've run your first "hello, world" unit test, you'll want to get started writing your own. There's two things that will help you be successful: | ||
|
||
1. How to think about a unit test conceptually | ||
2. How to actually craft your unit tests in YAML | ||
|
||
Here's a step-by-step guide for you to follow: | ||
|
||
### Organizing your thoughts | ||
|
||
1. **Identify your scenarios:** Which scenarios do you want to be more confident about? For each scenario, what is the relevant model? Consider edge cases: which inputs might be tricky for that model to handle correctly? This will identify your *model* and *given inputs*. | ||
2. **Define the success criteria:** What is the expected output for each scenario? Be specific. This will identify your *expected output*. | ||
|
||
### Writing your unit tests | ||
|
||
1. **Start with a "model-inputs-output" structure:** When running this *model*, given these test *inputs*, then expect this *output*. | ||
2. **Use meaningful descriptions:** They should clearly explain what the test is doing so collaborators and future developers can understand the purpose. | ||
3. **Test one behavior per test case:** This keeps tests focused and easier to debug. | ||
|
||
**Additional tips:** | ||
|
||
- **Think about maintainability:** Write tests that are easy to understand and update. | ||
- **Refactor tests as needed:** Keep them up-to-date with code changes. | ||
- **Practice test-driven development (TDD):** Write tests before writing code to guide your development process. | ||
- **Remember, unit testing is just one part of quality assurance.** Combine it with other testing methods like data tests and model contracts for a comprehensive approach. | ||
|
||
Next, I'll show you a brief example from the "real" world. | ||
|
||
## Real world example | ||
|
||
When we were trying out the developer experience and ergonomics of unit testing in dbt, we went to our trusty [Jaffle Shop repo](https://github.com/dbt-labs/jaffle-shop). We began to follow the framework above to **identify scenarios** and then define the **success criteria**. | ||
|
||
The first scenario we considered was counting the number of food items and drink items within an order. One natural edge case is an order without any drinks. Our success criteria in this case is for `count_drink_items` to be 0 in the `order_items_summary` model. | ||
|
||
To implement the unit test, we started by starting with a "model-inputs-output" (MIO) structure above. The relevant **model** was `orders` with **given** inputs were from `order_items` and `stg_orders`. In this case, we **expect** our output for order_id 2 to be `count_drink_items: 0`. | ||
|
||
Here's what the unit test YAML looked like: | ||
|
||
### Unit test YAML | ||
|
||
```yaml | ||
unit_tests: | ||
|
||
- name: test_order_items_count_drink_items_with_zero_drinks | ||
description: > | ||
Scenario: Order without any drinks | ||
When the `order_items_summary` table is built | ||
Given an order with nothing but 1 food item | ||
Then the count of drink items is 0 | ||
|
||
# Model | ||
model: order_items_summary | ||
|
||
# Inputs | ||
given: | ||
- input: ref('order_items') | ||
rows: | ||
- { | ||
order_id: 76, | ||
order_item_id: 3, | ||
is_drink_item: false, | ||
} | ||
- input: ref('stg_orders') | ||
rows: | ||
- { order_id: 76 } | ||
|
||
# Output | ||
expect: | ||
rows: | ||
- { | ||
order_id: 76, | ||
count_drink_items: 0, | ||
} | ||
``` | ||
|
||
Suffice it to say that when we ran the unit test for the first time, it failed! 💥 | ||
|
||
But it wasn't because we defined the unit test incorrectly – it was because we found a bug that we didn't know about previously. To get things back on the right path, we [opened a PR](https://github.com/dbt-labs/jaffle-shop/pull/12) that added the relevant unit test to confirm the bug as well as the bug fix. The good news is that by implementing the unit test, we were able to find a bug before someone else did. 😎 | ||
|
||
If you're curious about what the model looked like before and the code changes for the fix, here you go: | ||
|
||
### Original SQL code | ||
|
||
```sql | ||
with | ||
|
||
order_items as ( | ||
|
||
select * from {{ ref('order_items') }} | ||
|
||
) | ||
|
||
select | ||
order_id, | ||
|
||
sum(supply_cost) as order_cost, | ||
sum(product_price) as order_items_subtotal, | ||
count(order_item_id) as count_order_items, | ||
count( | ||
case | ||
when is_food_item then 1 | ||
else 0 | ||
end | ||
) as count_food_items, | ||
count( | ||
case | ||
when is_drink_item then 1 | ||
else 0 | ||
end | ||
) as count_drink_items | ||
|
||
from order_items | ||
|
||
group by 1 | ||
``` | ||
|
||
### SQL Code fix | ||
|
||
```diff | ||
17c17 | ||
< count( | ||
--- | ||
> sum( | ||
23c23 | ||
< count( | ||
--- | ||
> sum( | ||
``` | ||
|
||
### Caveats and pro-tips | ||
|
||
See the docs for [helpful information before you begin](https://docs.getdbt.com/docs/build/unit-tests#before-you-begin), including unit testing [incremental models](https://docs.getdbt.com/docs/build/unit-tests#unit-testing-incremental-models), [models that depend on ephemeral model(s)](https://docs.getdbt.com/docs/build/unit-tests#unit-testing-a-model-that-depend-on-ephemeral-models), and platform-specific considerations like `STRUCT`s in BigQuery. In many cases, the [`sql` format](https://docs.getdbt.com/reference/resource-properties/data-formats#sql) can help solve tricky edge cases that come up. | ||
|
||
Another advanced topic is overcoming issues when non-deterministic factors are involved, such as a current timestamp. To ensure that the output remains consistent regardless of when the test is run, you can set a fixed, predetermined value by using the [`overrides`](https://docs.getdbt.com/reference/resource-properties/unit-test-overrides) configuration. | ||
|
||
Before we wrap up, let's do a brief comparison of the different data quality capabilties in dbt and identify the situations where each would be most effective. | ||
|
||
## Unit tests vs. model contracts vs. data tests | ||
|
||
dbt has multiple complementary features that support data quality including [unit tests](https://docs.getdbt.com/docs/build/unit-tests), [model contracts](https://docs.getdbt.com/docs/collaborate/govern/model-contracts), and [data tests](https://docs.getdbt.com/docs/build/data-tests). Here's a table of how they compare and when you might use each: | ||
|
||
| Unit tests | Model contracts | Data tests | | ||
| --- | --- | --- | | ||
| Enforced before a resource node is materialized | Enforced while the resource node is materialized | Enforced after a resource node is materialized | | ||
| Blocks the attempt to build the resource | Blocks the building the resource node and downstream nodes | Blocks building of downstream nodes | | ||
| Rigid tests of the exact expected output for a single transformation | Tests the "shape" of the container (column names and data types) for a single data set | Flexible and can test assertions across multiple data sets, ranges of values, etc. | | ||
| Good for testing the precise values expected in the output | Good for enforcing the column names and data types that describe the "shape" of the data and specifying constraints like primary and foreign keys | Good for testing assertions other than equality (like ranges of acceptable values) or source data whose transformation is a black box | | ||
|
||
|
||
## Summary | ||
|
||
You're now ready to build your first unit tests with this new feature coming to dbt in v1.8! We're eager for you to try this out – let us know how it works for you by commenting in [this discussion](https://github.com/dbt-labs/dbt-core/discussions/8275) or [opening an issue](https://github.com/dbt-labs/dbt-core/issues/new/choose). | ||
|
||
There's more details about the syntax which you can access in our [documentation](https://docs.getdbt.com/docs/build/unit-tests). We hope this gives you the tools to boost your confidence in your data pipelines and sleep easier at night 😴 |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+288 KB
website/static/img/blog/2024-05-06-unit-testing/unit-test-terminal-output.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something about this command doesn't sit right - wondering if we should give a more realistic model name