diff --git a/website/docs/docs/build/snapshots.md b/website/docs/docs/build/snapshots.md index 82b5104fcef..bcb65bd7810 100644 --- a/website/docs/docs/build/snapshots.md +++ b/website/docs/docs/build/snapshots.md @@ -18,215 +18,197 @@ Snapshots implement [type-2 Slowly Changing Dimensions](https://en.wikipedia.org | id | status | updated_at | | -- | ------ | ---------- | -| 1 | pending | 2019-01-01 | +| 1 | pending | 2024-01-01 | Now, imagine that the order goes from "pending" to "shipped". That same record will now look like: | id | status | updated_at | | -- | ------ | ---------- | -| 1 | shipped | 2019-01-02 | +| 1 | shipped | 2024-01-02 | This order is now in the "shipped" state, but we've lost the information about when the order was last in the "pending" state. This makes it difficult (or impossible) to analyze how long it took for an order to ship. dbt can "snapshot" these changes to help you understand how values in a row change over time. Here's an example of a snapshot table for the previous example: | id | status | updated_at | dbt_valid_from | dbt_valid_to | | -- | ------ | ---------- | -------------- | ------------ | -| 1 | pending | 2019-01-01 | 2019-01-01 | 2019-01-02 | -| 1 | shipped | 2019-01-02 | 2019-01-02 | `null` | +| 1 | pending | 2024-01-01 | 2024-01-01 | 2024-01-02 | +| 1 | shipped | 2024-01-02 | 2024-01-02 | `null` | -In dbt, snapshots are `select` statements, defined within a snapshot block in a `.sql` file (typically in your `snapshots` directory). You'll also need to configure your snapshot to tell dbt how to detect record changes. - +## Configuring snapshots - +:::info Previewing or compiling snapshots in IDE not supported -```sql -{% snapshot orders_snapshot %} - -{{ - config( - target_database='analytics', - target_schema='snapshots', - unique_key='id', +It is not possible to "preview data" or "compile sql" for snapshots in dbt Cloud. Instead, [run the `dbt snapshot` command](#how-snapshots-work) in the IDE. - strategy='timestamp', - updated_at='updated_at', - ) -}} +::: -select * from {{ source('jaffle_shop', 'orders') }} + -{% endsnapshot %} -``` +- To configure snapshots in versions 1.8 and earlier, refer to [Configure snapshots in versions 1.8 and earlier](#configure-snapshots-in-versions-18-and-earlier). These versions use an older syntax where snapshots are defined within a snapshot block in a `.sql` file, typically located in your `snapshots` directory. +- Note that defining multiple resources in a single file can significantly slow down parsing and compilation. For faster and more efficient management, consider the updated snapshot YAML syntax, [available in Versionless](/docs/dbt-versions/versionless-cloud) or [dbt Core v1.9 and later](/docs/dbt-versions/core). - - - -```sql -{% snapshot orders_snapshot %} - -{{ - config( - unique_key='id', - schema='snapshots', - strategy='timestamp', - updated_at='updated_at', - ) -}} - -select * from {{ source('jaffle_shop', 'orders') }} - -{% endsnapshot %} +In dbt Cloud Versionless and dbt Core v1.9 and later, snapshots are configurations defined in YAML files (typically in your snapshots directory). You'll configure your snapshot to tell dbt how to detect record changes. + + + +```yaml +snapshots: + - name: orders_snapshot + relation: source('jaffle_shop', 'orders') + config: + schema: snapshots + database: analytics + unique_key: id + strategy: timestamp + updated_at: updated_at ``` - - -:::info Preview or Compile Snapshots in IDE - -It is not possible to "preview data" or "compile sql" for snapshots in dbt Cloud. Instead, run the `dbt snapshot` command in the IDE by completing the following steps. - -::: - -When you run the [`dbt snapshot` command](/reference/commands/snapshot): -* **On the first run:** dbt will create the initial snapshot table — this will be the result set of your `select` statement, with additional columns including `dbt_valid_from` and `dbt_valid_to`. All records will have a `dbt_valid_to = null`. -* **On subsequent runs:** dbt will check which records have changed or if any new records have been created: - - The `dbt_valid_to` column will be updated for any existing records that have changed - - The updated record and any new records will be inserted into the snapshot table. These records will now have `dbt_valid_to = null` +The following table outlines the configurations available for snapshots: -Snapshots can be referenced in downstream models the same way as referencing models — by using the [ref](/reference/dbt-jinja-functions/ref) function. +| Config | Description | Required? | Example | +| ------ | ----------- | --------- | ------- | +| [database](/reference/resource-configs/database) | Specify a custom database for the snapshot | No | analytics | +| [schema](/reference/resource-configs/schema) | Specify a custom schema for the snapshot | No | snapshots | +| [alias](/reference/resource-configs/alias) | Specify an alias for the snapshot | No | your_custom_snapshot | +| [strategy](/reference/resource-configs/strategy) | The snapshot strategy to use. Valid values: `timestamp` or `check` | Yes | timestamp | +| [unique_key](/reference/resource-configs/unique_key) | A column or expression for the record | Yes | id | +| [check_cols](/reference/resource-configs/check_cols) | If using the `check` strategy, then the columns to check | Only if using the `check` strategy | ["status"] | +| [updated_at](/reference/resource-configs/updated_at) | If using the `timestamp` strategy, the timestamp column to compare | Only if using the `timestamp` strategy | updated_at | +| [invalidate_hard_deletes](/reference/resource-configs/invalidate_hard_deletes) | Find hard deleted records in source and set `dbt_valid_to` to current time if the record no longer exists | No | True | -## Example +- In versions prior to v1.9, the `target_schema` (required) and `target_database` (optional) configurations defined a single schema or database to build a snapshot across users and environment. This created problems when testing or developing a snapshot, as there was no clear separation between development and production environments. In v1.9, `target_schema` became optional, allowing snapshots to be environment-aware. By default, without `target_schema` or `target_database` defined, snapshots now use the `generate_schema_name` or `generate_database_name` macros to determine where to build. Developers can still set a custom location with [`schema`](/reference/resource-configs/schema) and [`database`](/reference/resource-configs/database) configs, consistent with other resource types. +- A number of other configurations are also supported (for example, `tags` and `post-hook`). For the complete list, refer to [Snapshot configurations](/reference/snapshot-configs). +- You can configure snapshots from both the `dbt_project.yml` file and a `config` block. For more information, refer to the [configuration docs](/reference/snapshot-configs). -To add a snapshot to your project: -1. Create a file in your `snapshots` directory with a `.sql` file extension, e.g. `snapshots/orders.sql` -2. Use a `snapshot` block to define the start and end of a snapshot: +### Add a snapshot to your project - +To add a snapshot to your project follow these steps. For users on versions 1.8 and earlier, refer to [Configure snapshots in versions 1.8 and earlier](#configure-snapshots-in-versions-18-and-earlier). -```sql -{% snapshot orders_snapshot %} +1. Create a YAML file in your `snapshots` directory: `snapshots/orders_snapshot.yml` and add your configuration details. You can also configure your snapshot from your `dbt_project.yml` file ([docs](/reference/snapshot-configs)). -{% endsnapshot %} -``` + - + ```yaml + snapshots: + - name: orders_snapshot + relation: source('jaffle_shop', 'orders') + config: + schema: snapshots + database: analytics + unique_key: id + strategy: timestamp + updated_at: updated_at -3. Write a `select` statement within the snapshot block (tips for writing a good snapshot query are below). This select statement defines the results that you want to snapshot over time. You can use `sources` and `refs` here. + ``` + - +2. Since snapshots focus on configuration, the transformation logic is minimal. Typically, you'd select all data from the source. If you need to apply transformations (like filters, deduplication), it's best practice to define an ephemeral model and reference it in your snapshot configuration. -```sql -{% snapshot orders_snapshot %} + -select * from {{ source('jaffle_shop', 'orders') }} + ```yaml + {{ config(materialized='ephemeral') }} -{% endsnapshot %} -``` + select * from {{ source('jaffle_shop', 'orders') }} + ``` + - +3. Check whether the result set of your query includes a reliable timestamp column that indicates when a record was last updated. For our example, the `updated_at` column reliably indicates record changes, so we can use the `timestamp` strategy. If your query result set does not have a reliable timestamp, you'll need to instead use the `check` strategy — more details on this below. -4. Check whether the result set of your query includes a reliable timestamp column that indicates when a record was last updated. For our example, the `updated_at` column reliably indicates record changes, so we can use the `timestamp` strategy. If your query result set does not have a reliable timestamp, you'll need to instead use the `check` strategy — more details on this below. +4. Run the `dbt snapshot` [command](/reference/commands/snapshot) — for our example, a new table will be created at `analytics.snapshots.orders_snapshot`. The [`schema`](/reference/resource-configs/schema) config will utilize the `generate_schema_name` macro. -5. Add configurations to your snapshot using a `config` block (more details below). You can also configure your snapshot from your `dbt_project.yml` file ([docs](/reference/snapshot-configs)). + ``` + $ dbt snapshot + Running with dbt=1.9.0 - + 15:07:36 | Concurrency: 8 threads (target='dev') + 15:07:36 | + 15:07:36 | 1 of 1 START snapshot snapshots.orders_snapshot...... [RUN] + 15:07:36 | 1 of 1 OK snapshot snapshots.orders_snapshot..........[SELECT 3 in 1.82s] + 15:07:36 | + 15:07:36 | Finished running 1 snapshots in 0.68s. - + Completed successfully -```sql -{% snapshot orders_snapshot %} + Done. PASS=2 ERROR=0 SKIP=0 TOTAL=1 + ``` -{{ - config( - target_database='analytics', - target_schema='snapshots', - unique_key='id', +5. Inspect the results by selecting from the table dbt created (`analytics.snapshots.orders_snapshot`). After the first run, you should see the results of your query, plus the [snapshot meta fields](#snapshot-meta-fields) as described later on. - strategy='timestamp', - updated_at='updated_at', - ) -}} +6. Run the `dbt snapshot` command again and inspect the results. If any records have been updated, the snapshot should reflect this. -select * from {{ source('jaffle_shop', 'orders') }} +7. Select from the `snapshot` in downstream models using the `ref` function. -{% endsnapshot %} -``` + - + ```sql + select * from {{ ref('orders_snapshot') }} + ``` + -6. Run the `dbt snapshot` [command](/reference/commands/snapshot) — for our example a new table will be created at `analytics.snapshots.orders_snapshot`. You can change the `target_database` configuration, the `target_schema` configuration and the name of the snapshot (as defined in `{% snapshot .. %}`) will change how dbt names this table. +8. Snapshots are only useful if you run them frequently — schedule the `dbt snapshot` command to run regularly. - +### Configuration best practices - + -```sql -{% snapshot orders_snapshot %} - -{{ - config( - schema='snapshots', - unique_key='id', - strategy='timestamp', - updated_at='updated_at', - ) -}} +This strategy handles column additions and deletions better than the `check` strategy. -select * from {{ source('jaffle_shop', 'orders') }} + -{% endsnapshot %} -``` + - +The unique key is used by dbt to match rows up, so it's extremely important to make sure this key is actually unique! If you're snapshotting a source, I'd recommend adding a uniqueness test to your source ([example](https://github.com/dbt-labs/jaffle_shop/blob/8e7c853c858018180bef1756ec93e193d9958c5b/models/staging/schema.yml#L26)). + -6. Run the `dbt snapshot` [command](/reference/commands/snapshot) — for our example, a new table will be created at `analytics.snapshots.orders_snapshot`. The [`schema`](/reference/resource-configs/schema) config will utilize the `generate_schema_name` macro. + - + -``` -$ dbt snapshot -Running with dbt=1.8.0 +Snapshots cannot be rebuilt. As such, it's a good idea to put snapshots in a separate schema so end users know they are special. From there, you may want to set different privileges on your snapshots compared to your models, and even run them as a different user (or role, depending on your warehouse) to make it very difficult to drop a snapshot unless you really want to. -15:07:36 | Concurrency: 8 threads (target='dev') -15:07:36 | -15:07:36 | 1 of 1 START snapshot snapshots.orders_snapshot...... [RUN] -15:07:36 | 1 of 1 OK snapshot snapshots.orders_snapshot..........[SELECT 3 in 1.82s] -15:07:36 | -15:07:36 | Finished running 1 snapshots in 0.68s. + + -Completed successfully + -Done. PASS=2 ERROR=0 SKIP=0 TOTAL=1 -``` + -7. Inspect the results by selecting from the table dbt created. After the first run, you should see the results of your query, plus the [snapshot meta fields](#snapshot-meta-fields) as described below. +Snapshots can't be rebuilt. Because of this, it's a good idea to put snapshots in a separate schema so end users know they're special. From there, you may want to set different privileges on your snapshots compared to your models, and even run them as a different user (or role, depending on your warehouse) to make it very difficult to drop a snapshot unless you really want to. -8. Run the `snapshot` command again, and inspect the results. If any records have been updated, the snapshot should reflect this. + -9. Select from the `snapshot` in downstream models using the `ref` function. + - + If you need to clean or transform your data before snapshotting, create an ephemeral model (or a staging model) that applies the necessary transformations. Then, reference this model in your snapshot configuration. This approach keeps your snapshot definitions clean and allows you to test and run transformations separately. -```sql -select * from {{ ref('orders_snapshot') }} -``` + + - +### How snapshots work -10. Schedule the `snapshot` command to run regularly — snapshots are only useful if you run them frequently. +When you run the [`dbt snapshot` command](/reference/commands/snapshot): +* **On the first run:** dbt will create the initial snapshot table — this will be the result set of your `select` statement, with additional columns including `dbt_valid_from` and `dbt_valid_to`. All records will have a `dbt_valid_to = null`. +* **On subsequent runs:** dbt will check which records have changed or if any new records have been created: + - The `dbt_valid_to` column will be updated for any existing records that have changed + - The updated record and any new records will be inserted into the snapshot table. These records will now have `dbt_valid_to = null` +Snapshots can be referenced in downstream models the same way as referencing models — by using the [ref](/reference/dbt-jinja-functions/ref) function. ## Detecting row changes -Snapshot "strategies" define how dbt knows if a row has changed. There are two strategies built-in to dbt — `timestamp` and `check`. +Snapshot "strategies" define how dbt knows if a row has changed. There are two strategies built-in to dbt: +- [Timestamp](#timestamp-strategy-recommended) — Uses an `updated_at` column to determine if a row has changed. +- [Check](#check-strategy) — Compares a list of columns between their current and historical values to determine if a row has changed. ### Timestamp strategy (recommended) The `timestamp` strategy uses an `updated_at` field to determine if a row has changed. If the configured `updated_at` column for a row is more recent than the last time the snapshot ran, then dbt will invalidate the old record and record the new one. If the timestamps are unchanged, then dbt will not take any action. @@ -266,27 +248,19 @@ The `timestamp` strategy requires the following configurations: - - -```sql -{% snapshot orders_snapshot_timestamp %} - - {{ - config( - schema='snapshots', - strategy='timestamp', - unique_key='id', - updated_at='updated_at', - ) - }} - - select * from {{ source('jaffle_shop', 'orders') }} - -{% endsnapshot %} + + +```yaml +snapshots: + - name: orders_snapshot_timestamp + relation: source('jaffle_shop', 'orders') + config: + schema: snapshots + unique_key: id + strategy: timestamp + updated_at: updated_at ``` - - ### Check strategy @@ -298,15 +272,12 @@ The `check` strategy requires the following configurations: | ------ | ----------- | ------- | | check_cols | A list of columns to check for changes, or `all` to check all columns | `["name", "email"]` | - - :::caution check_cols = 'all' The `check` snapshot strategy can be configured to track changes to _all_ columns by supplying `check_cols = 'all'`. It is better to explicitly enumerate the columns that you want to check. Consider using a to condense many columns into a single column. ::: - **Example Usage** @@ -336,23 +307,19 @@ The `check` snapshot strategy can be configured to track changes to _all_ column - - -```sql -{% snapshot orders_snapshot_check %} - - {{ - config( - schema='snapshots', - strategy='check', - unique_key='id', - check_cols=['status', 'is_cancelled'], - ) - }} - - select * from {{ source('jaffle_shop', 'orders') }} - -{% endsnapshot %} + + +```yaml +snapshots: + - name: orders_snapshot_check + relation: source('jaffle_shop', 'orders') + config: + schema: snapshots + unique_key: id + strategy: check + check_cols: + - status + - is_cancelled ``` @@ -397,112 +364,42 @@ For this configuration to work with the `timestamp` strategy, the configured `up - - -```sql -{% snapshot orders_snapshot_hard_delete %} - - {{ - config( - schema='snapshots', - strategy='timestamp', - unique_key='id', - updated_at='updated_at', - invalidate_hard_deletes=True, - ) - }} - - select * from {{ source('jaffle_shop', 'orders') }} - -{% endsnapshot %} + + +```yaml +snapshots: + - name: orders_snapshot_hard_delete + relation: source('jaffle_shop', 'orders') + config: + schema: snapshots + unique_key: id + strategy: timestamp + updated_at: updated_at + invalidate_hard_deletes: true ``` -## Configuring snapshots -### Snapshot configurations -There are a number of snapshot-specific configurations: - - - -| Config | Description | Required? | Example | -| ------ | ----------- | --------- | ------- | -| [target_database](/reference/resource-configs/target_database) | The database that dbt should render the snapshot table into | No | analytics | -| [target_schema](/reference/resource-configs/target_schema) | The schema that dbt should render the snapshot table into | Yes | snapshots | -| [strategy](/reference/resource-configs/strategy) | The snapshot strategy to use. One of `timestamp` or `check` | Yes | timestamp | -| [unique_key](/reference/resource-configs/unique_key) | A column or expression for the record | Yes | id | -| [check_cols](/reference/resource-configs/check_cols) | If using the `check` strategy, then the columns to check | Only if using the `check` strategy | ["status"] | -| [updated_at](/reference/resource-configs/updated_at) | If using the `timestamp` strategy, the timestamp column to compare | Only if using the `timestamp` strategy | updated_at | -| [invalidate_hard_deletes](/reference/resource-configs/invalidate_hard_deletes) | Find hard deleted records in source, and set `dbt_valid_to` current time if no longer exists | No | True | - -A number of other configurations are also supported (e.g. `tags` and `post-hook`), check out the full list [here](/reference/snapshot-configs). - -Snapshots can be configured from both your `dbt_project.yml` file and a `config` block, check out the [configuration docs](/reference/snapshot-configs) for more information. - -Note: BigQuery users can use `target_project` and `target_dataset` as aliases for `target_database` and `target_schema`, respectively. - - - - - -| Config | Description | Required? | Example | -| ------ | ----------- | --------- | ------- | -| [database](/reference/resource-configs/database) | Specify a custom database for the snapshot | No | analytics | -| [schema](/reference/resource-configs/schema) | Specify a custom schema for the snapshot | No | snapshots | -| [alias](/reference/resource-configs/alias) | Specify an alias for the snapshot | No | your_custom_snapshot | -| [strategy](/reference/resource-configs/strategy) | The snapshot strategy to use. Valid values: `timestamp` or `check` | Yes | timestamp | -| [unique_key](/reference/resource-configs/unique_key) | A column or expression for the record | Yes | id | -| [check_cols](/reference/resource-configs/check_cols) | If using the `check` strategy, then the columns to check | Only if using the `check` strategy | ["status"] | -| [updated_at](/reference/resource-configs/updated_at) | If using the `timestamp` strategy, the timestamp column to compare | Only if using the `timestamp` strategy | updated_at | -| [invalidate_hard_deletes](/reference/resource-configs/invalidate_hard_deletes) | Find hard deleted records in source and set `dbt_valid_to` to current time if the record no longer exists | No | True | - -In versions prior to v1.9, the `target_schema` (required) and `target_database` (optional) configurations defined a single schema or database to build a snapshot into across users and environments. This created problems when testing or developing a snapshot, as there was no clear separation between development and production environments. In v1.9, support was added for environment-aware snapshots by making `target_schema` optional. Snapshots, by default with no `target_schema` or `target_database` config defined, now resolve the schema or database to build the snapshot into using the `generate_schema_name` or `generate_database_name` macros. Developers can optionally define a custom location for snapshots to build to with the [`schema`](/reference/resource-configs/schema) and [`database`](/reference/resource-configs/database) configs, as is consistent with other resource types. - -A number of other configurations are also supported (for example, `tags` and `post-hook`). For the complete list, refer to [Snapshot configurations](/reference/snapshot-configs). - -You can configure snapshots from both the `dbt_project.yml` file and a `config` block. For more information, refer to the [configuration docs](/reference/snapshot-configs). - - - -### Configuration best practices -#### Use the `timestamp` strategy where possible -This strategy handles column additions and deletions better than the `check` strategy. - -#### Ensure your unique key is really unique -The unique key is used by dbt to match rows up, so it's extremely important to make sure this key is actually unique! If you're snapshotting a source, I'd recommend adding a uniqueness test to your source ([example](https://github.com/dbt-labs/jaffle_shop/blob/8e7c853c858018180bef1756ec93e193d9958c5b/models/staging/schema.yml#L26)). - - - -#### Use a `target_schema` that is separate to your analytics schema -Snapshots cannot be rebuilt. As such, it's a good idea to put snapshots in a separate schema so end users know they are special. From there, you may want to set different privileges on your snapshots compared to your models, and even run them as a different user (or role, depending on your warehouse) to make it very difficult to drop a snapshot unless you really want to. - - - - - -#### Use a schema that is separate to your models' schema -Snapshots can't be rebuilt. Because of this, it's a good idea to put snapshots in a separate schema so end users know they're special. From there, you may want to set different privileges on your snapshots compared to your models, and even run them as a different user (or role, depending on your warehouse) to make it very difficult to drop a snapshot unless you really want to. - - - ## Snapshot query best practices -#### Snapshot source data. -Your models should then select from these snapshots, treating them like regular data sources. As much as possible, snapshot your source data in its raw form and use downstream models to clean up the data +This section outlines some best practices for writing snapshot queries: -#### Use the `source` function in your query. -This helps when understanding data lineage in your project. +- #### Snapshot source data + Your models should then select from these snapshots, treating them like regular data sources. As much as possible, snapshot your source data in its raw form and use downstream models to clean up the data -#### Include as many columns as possible. -In fact, go for `select *` if performance permits! Even if a column doesn't feel useful at the moment, it might be better to snapshot it in case it becomes useful – after all, you won't be able to recreate the column later. +- #### Use the `source` function in your query + This helps when understanding data lineage in your project. -#### Avoid joins in your snapshot query. -Joins can make it difficult to build a reliable `updated_at` timestamp. Instead, snapshot the two tables separately, and join them in downstream models. +- #### Include as many columns as possible + In fact, go for `select *` if performance permits! Even if a column doesn't feel useful at the moment, it might be better to snapshot it in case it becomes useful – after all, you won't be able to recreate the column later. -#### Limit the amount of transformation in your query. -If you apply business logic in a snapshot query, and this logic changes in the future, it can be impossible (or, at least, very difficult) to apply the change in logic to your snapshots. +- #### Avoid joins in your snapshot query + Joins can make it difficult to build a reliable `updated_at` timestamp. Instead, snapshot the two tables separately, and join them in downstream models. + +- #### Limit the amount of transformation in your query + If you apply business logic in a snapshot query, and this logic changes in the future, it can be impossible (or, at least, very difficult) to apply the change in logic to your snapshots. Basically – keep your query as simple as possible! Some reasonable exceptions to these recommendations include: * Selecting specific columns if the table is wide. @@ -526,30 +423,30 @@ For the `timestamp` strategy, the configured `updated_at` column is used to popu
Details for the timestamp strategy -Snapshot query results at `2019-01-01 11:00` +Snapshot query results at `2024-01-01 11:00` | id | status | updated_at | | -- | ------- | ---------------- | -| 1 | pending | 2019-01-01 10:47 | +| 1 | pending | 2024-01-01 10:47 | Snapshot results (note that `11:00` is not used anywhere): | id | status | updated_at | dbt_valid_from | dbt_valid_to | dbt_updated_at | | -- | ------- | ---------------- | ---------------- | ---------------- | ---------------- | -| 1 | pending | 2019-01-01 10:47 | 2019-01-01 10:47 | | 2019-01-01 10:47 | +| 1 | pending | 2024-01-01 10:47 | 2024-01-01 10:47 | | 2024-01-01 10:47 | -Query results at `2019-01-01 11:30`: +Query results at `2024-01-01 11:30`: | id | status | updated_at | | -- | ------- | ---------------- | -| 1 | shipped | 2019-01-01 11:05 | +| 1 | shipped | 2024-01-01 11:05 | Snapshot results (note that `11:30` is not used anywhere): | id | status | updated_at | dbt_valid_from | dbt_valid_to | dbt_updated_at | | -- | ------- | ---------------- | ---------------- | ---------------- | ---------------- | -| 1 | pending | 2019-01-01 10:47 | 2019-01-01 10:47 | 2019-01-01 11:05 | 2019-01-01 10:47 | -| 1 | shipped | 2019-01-01 11:05 | 2019-01-01 11:05 | | 2019-01-01 11:05 | +| 1 | pending | 2024-01-01 10:47 | 2024-01-01 10:47 | 2024-01-01 11:05 | 2024-01-01 10:47 | +| 1 | shipped | 2024-01-01 11:05 | 2024-01-01 11:05 | | 2024-01-01 11:05 |
@@ -560,7 +457,7 @@ For the `check` strategy, the current timestamp is used to populate each column.
Details for the check strategy -Snapshot query results at `2019-01-01 11:00` +Snapshot query results at `2024-01-01 11:00` | id | status | | -- | ------- | @@ -570,9 +467,9 @@ Snapshot results: | id | status | dbt_valid_from | dbt_valid_to | dbt_updated_at | | -- | ------- | ---------------- | ---------------- | ---------------- | -| 1 | pending | 2019-01-01 11:00 | | 2019-01-01 11:00 | +| 1 | pending | 2024-01-01 11:00 | | 2024-01-01 11:00 | -Query results at `2019-01-01 11:30`: +Query results at `2024-01-01 11:30`: | id | status | | -- | ------- | @@ -582,11 +479,191 @@ Snapshot results: | id | status | dbt_valid_from | dbt_valid_to | dbt_updated_at | | --- | ------- | ---------------- | ---------------- | ---------------- | -| 1 | pending | 2019-01-01 11:00 | 2019-01-01 11:30 | 2019-01-01 11:00 | -| 1 | shipped | 2019-01-01 11:30 | | 2019-01-01 11:30 | +| 1 | pending | 2024-01-01 11:00 | 2024-01-01 11:30 | 2024-01-01 11:00 | +| 1 | shipped | 2024-01-01 11:30 | | 2024-01-01 11:30 |
+## Configure snapshots in versions 1.8 and earlier + + + +This section is for users on dbt versions 1.8 and earlier. To configure snapshots in versions 1.9 and later, refer to [Configuring snapshots](#configuring-snapshots). The latest versions use an updated snapshot configuration syntax that optimizes performance. + + + + + +- In dbt versions 1.8 and earlier, snapshots are `select` statements, defined within a snapshot block in a `.sql` file (typically in your `snapshots` directory). You'll also need to configure your snapshot to tell dbt how to detect record changes. +- The earlier dbt versions use an older syntax that allows for defining multiple resources in a single file. This syntax can significantly slow down parsing and compilation. +- For faster and more efficient management, consider[ upgrading to Versionless](/docs/dbt-versions/versionless-cloud) or the [latest version of dbt Core](/docs/dbt-versions/core), which introduces an updated snapshot configuration syntax that optimizes performance. + +The following example shows how to configure a snapshot: + + + +```sql +{% snapshot orders_snapshot %} + +{{ + config( + target_database='analytics', + target_schema='snapshots', + unique_key='id', + + strategy='timestamp', + updated_at='updated_at', + ) +}} + +select * from {{ source('jaffle_shop', 'orders') }} + +{% endsnapshot %} +``` + + + +The following table outlines the configurations available for snapshots in versions 1.8 and earlier: + +| Config | Description | Required? | Example | +| ------ | ----------- | --------- | ------- | +| [target_database](/reference/resource-configs/target_database) | The database that dbt should render the snapshot table into | No | analytics | +| [target_schema](/reference/resource-configs/target_schema) | The schema that dbt should render the snapshot table into | Yes | snapshots | +| [strategy](/reference/resource-configs/strategy) | The snapshot strategy to use. One of `timestamp` or `check` | Yes | timestamp | +| [unique_key](/reference/resource-configs/unique_key) | A column or expression for the record | Yes | id | +| [check_cols](/reference/resource-configs/check_cols) | If using the `check` strategy, then the columns to check | Only if using the `check` strategy | ["status"] | +| [updated_at](/reference/resource-configs/updated_at) | If using the `timestamp` strategy, the timestamp column to compare | Only if using the `timestamp` strategy | updated_at | +| [invalidate_hard_deletes](/reference/resource-configs/invalidate_hard_deletes) | Find hard deleted records in source, and set `dbt_valid_to` current time if no longer exists | No | True | + +- A number of other configurations are also supported (e.g. `tags` and `post-hook`), check out the full list [here](/reference/snapshot-configs). +- Snapshots can be configured from both your `dbt_project.yml` file and a `config` block, check out the [configuration docs](/reference/snapshot-configs) for more information. +- Note: BigQuery users can use `target_project` and `target_dataset` as aliases for `target_database` and `target_schema`, respectively. + +### Configuration example + +To add a snapshot to your project: + +1. Create a file in your `snapshots` directory with a `.sql` file extension, e.g. `snapshots/orders.sql` +2. Use a `snapshot` block to define the start and end of a snapshot: + + + +```sql +{% snapshot orders_snapshot %} + +{% endsnapshot %} +``` + + + +3. Write a `select` statement within the snapshot block (tips for writing a good snapshot query are below). This select statement defines the results that you want to snapshot over time. You can use `sources` and `refs` here. + + + +```sql +{% snapshot orders_snapshot %} + +select * from {{ source('jaffle_shop', 'orders') }} + +{% endsnapshot %} +``` + + + +4. Check whether the result set of your query includes a reliable timestamp column that indicates when a record was last updated. For our example, the `updated_at` column reliably indicates record changes, so we can use the `timestamp` strategy. If your query result set does not have a reliable timestamp, you'll need to instead use the `check` strategy — more details on this below. + +5. Add configurations to your snapshot using a `config` block (more details below). You can also configure your snapshot from your `dbt_project.yml` file ([docs](/reference/snapshot-configs)). + + + + + +```sql +{% snapshot orders_snapshot %} + +{{ + config( + target_database='analytics', + target_schema='snapshots', + unique_key='id', + + strategy='timestamp', + updated_at='updated_at', + ) +}} + +select * from {{ source('jaffle_shop', 'orders') }} + +{% endsnapshot %} +``` + + + +6. Run the `dbt snapshot` [command](/reference/commands/snapshot) — for our example a new table will be created at `analytics.snapshots.orders_snapshot`. You can change the `target_database` configuration, the `target_schema` configuration and the name of the snapshot (as defined in `{% snapshot .. %}`) will change how dbt names this table. + + + + + + + +```sql +{% snapshot orders_snapshot %} + +{{ + config( + schema='snapshots', + unique_key='id', + strategy='timestamp', + updated_at='updated_at', + ) +}} + +select * from {{ source('jaffle_shop', 'orders') }} + +{% endsnapshot %} +``` + + + +6. Run the `dbt snapshot` [command](/reference/commands/snapshot) — for our example, a new table will be created at `analytics.snapshots.orders_snapshot`. The [`schema`](/reference/resource-configs/schema) config will utilize the `generate_schema_name` macro. + + + +``` +$ dbt snapshot +Running with dbt=1.8.0 + +15:07:36 | Concurrency: 8 threads (target='dev') +15:07:36 | +15:07:36 | 1 of 1 START snapshot snapshots.orders_snapshot...... [RUN] +15:07:36 | 1 of 1 OK snapshot snapshots.orders_snapshot..........[SELECT 3 in 1.82s] +15:07:36 | +15:07:36 | Finished running 1 snapshots in 0.68s. + +Completed successfully + +Done. PASS=2 ERROR=0 SKIP=0 TOTAL=1 +``` + +7. Inspect the results by selecting from the table dbt created. After the first run, you should see the results of your query, plus the [snapshot meta fields](#snapshot-meta-fields) as described earlier. + +8. Run the `dbt snapshot` command again, and inspect the results. If any records have been updated, the snapshot should reflect this. + +9. Select from the `snapshot` in downstream models using the `ref` function. + + + +```sql +select * from {{ ref('orders_snapshot') }} +``` + + + +10. Snapshots are only useful if you run them frequently — schedule the `snapshot` command to run regularly. + + + ## FAQs diff --git a/website/docs/docs/dbt-versions/release-notes.md b/website/docs/docs/dbt-versions/release-notes.md index 0f8867edc5a..d8ac119deb8 100644 --- a/website/docs/docs/dbt-versions/release-notes.md +++ b/website/docs/docs/dbt-versions/release-notes.md @@ -20,6 +20,9 @@ Release notes are grouped by month for both multi-tenant and virtual private clo ## October 2024 +- **New**: In dbt Cloud Versionless, [Snapshots](/docs/build/snapshots) have been updated to use YAML configuration files instead of SQL snapshot blocks. This new feature simplifies snapshot management and improves performance, and will soon be released in dbt Core 1.9. + - Who does this affect? New user on Versionless can define snapshots using the new YAML specification. Users upgrading to Versionless who use snapshots can keep their existing configuration or can choose to migrate their snapshot definitions to YAML. + - Users on dbt 1.8 and earlier: No action is needed; existing snapshots will continue to work as before. However, we recommend upgrading to Versionless to take advantage of the new snapshot features. - **Behavior change:** Set [`state_modified_compare_more_unrendered`](/reference/global-configs/behavior-changes#source-definitions-for-state) to true to reduce false positives for `state:modified` when configs differ between `dev` and `prod` environments. - **Behavior change:** Set the [`skip_nodes_if_on_run_start_fails`](/reference/global-configs/behavior-changes#failures-in-on-run-start-hooks) flag to `True` to skip all selected resources from running if there is a failure on an `on-run-start` hook. - **Enhancement**: In dbt Cloud Versionless, snapshots defined in SQL files can now use `config` defined in `schema.yml` YAML files. This update resolves the previous limitation that required snapshot properties to be defined exclusively in `dbt_project.yml` and/or a `config()` block within the SQL file. This enhancement will be included in the upcoming dbt Core v1.9 release. @@ -28,6 +31,7 @@ Read about the [order dbt infers columns can be used as primary key of a model]( - **New:** dbt Explorer now includes trust signal icons, which is currently available as a [Preview](/docs/dbt-versions/product-lifecycles#dbt-cloud). Trust signals offer a quick, at-a-glance view of data health when browsing your dbt models in Explorer. These icons indicate whether a model is **Healthy**, **Caution**, **Degraded**, or **Unknown**. For accurate health data, ensure the resource is up-to-date and has had a recent job run. Refer to [Trust signals](/docs/collaborate/explore-projects#trust-signals-for-resources) for more information. - **New:** Auto exposures are now available in Preview in dbt Cloud. Auto-exposures helps users understand how their models are used in downstream analytics tools to inform investments and reduce incidents. It imports and auto-generates exposures based on Tableau dashboards, with user-defined curation. To learn more, refer to [Auto exposures](/docs/collaborate/auto-exposures). + ## September 2024 - **New**: Use the new recommended syntax for [defining `foreign_key` constraints](/reference/resource-properties/constraints) using `refs`, available in dbt Cloud Versionless. This will soon be released in dbt Core v1.9. This new syntax will capture dependencies and works across different environments. diff --git a/website/docs/faqs/Snapshots/snapshot-target-is-not-a-snapshot-table.md b/website/docs/faqs/Snapshots/snapshot-target-is-not-a-snapshot-table.md index 5ce8f380008..0175588bf6f 100644 --- a/website/docs/faqs/Snapshots/snapshot-target-is-not-a-snapshot-table.md +++ b/website/docs/faqs/Snapshots/snapshot-target-is-not-a-snapshot-table.md @@ -27,3 +27,4 @@ A snapshot must have a materialized value of 'snapshot' This tells you to change your `materialized` config to `snapshot`. But when you make that change, you might encounter an error message saying that certain fields like `dbt_scd_id` are missing. This error happens because, previously, when dbt treated snapshots as tables, it didn't include the necessary [snapshot meta-fields](/docs/build/snapshots#snapshot-meta-fields) in your target table. Since those meta-fields don't exist, dbt correctly identifies that you're trying to create a snapshot in a table that isn't actually a snapshot. When this happens, you have to start from scratch — re-snapshotting your source data as if it was the first time by dropping your "snapshot" which isn't a real snapshot table. Then dbt snapshot will create a new snapshot and insert the snapshot meta-fields as expected. + diff --git a/website/docs/reference/configs-and-properties.md b/website/docs/reference/configs-and-properties.md index 20d762b7462..20892898a51 100644 --- a/website/docs/reference/configs-and-properties.md +++ b/website/docs/reference/configs-and-properties.md @@ -26,9 +26,18 @@ Whereas you can use **configurations** to: Depending on the resource type, configurations can be defined in the dbt project and also in an installed package by: + + +1. Using a [`config` property](/reference/resource-properties/config) in a `.yml` file in the `models/`, `snapshots/`, `seeds/`, `analyses`, or `tests/` directory +2. From the [`dbt_project.yml` file](dbt_project.yml), under the corresponding resource key (`models:`, `snapshots:`, `tests:`, etc) + + + + 1. Using a [`config()` Jinja macro](/reference/dbt-jinja-functions/config) within a `model`, `snapshot`, or `test` SQL file -2. Using a [`config` property](/reference/resource-properties/config) in a `.yml` file +2. Using a [`config` property](/reference/resource-properties/config) in a `.yml` file in the `models/`, `snapshots/`, `seeds/`, `analyses/`, or `tests/` directory. 3. From the [`dbt_project.yml` file](dbt_project.yml), under the corresponding resource key (`models:`, `snapshots:`, `tests:`, etc) + ### Config inheritance diff --git a/website/docs/reference/project-configs/snapshot-paths.md b/website/docs/reference/project-configs/snapshot-paths.md index 81b2759609d..8319833f1e6 100644 --- a/website/docs/reference/project-configs/snapshot-paths.md +++ b/website/docs/reference/project-configs/snapshot-paths.md @@ -12,7 +12,16 @@ snapshot-paths: [directorypath]
## Definition -Optionally specify a custom list of directories where [snapshots](/docs/build/snapshots) are located. Note that you cannot co-locate models and snapshots. + +Optionally specify a custom list of directories where [snapshots](/docs/build/snapshots) are located. + + +In [Versionless](/docs/dbt-versions/versionless-cloud) and on dbt v1.9 and higher, you can co-locate your snapshots with models if they are [defined using the latest YAML syntax](/docs/build/snapshots). + + + +Note that you cannot co-locate models and snapshots. However, in [Versionless](/docs/dbt-versions/versionless-cloud) and on dbt v1.9 and higher, you can co-locate your snapshots with models if they are [defined using the latest YAML syntax](/docs/build/snapshots). + ## Default By default, dbt will search for snapshots in the `snapshots` directory, i.e. `snapshot-paths: ["snapshots"]` diff --git a/website/docs/reference/resource-configs/check_cols.md b/website/docs/reference/resource-configs/check_cols.md index bd187409379..f7c6b85d372 100644 --- a/website/docs/reference/resource-configs/check_cols.md +++ b/website/docs/reference/resource-configs/check_cols.md @@ -3,6 +3,31 @@ resource_types: [snapshots] description: "Read this guide to understand the check_cols configuration in dbt." datatype: "[column_name] | all" --- + + + + + ```yml + snapshots: + - name: snapshot_name + relation: source('my_source', 'my_table') + config: + schema: string + unique_key: column_name_or_expression + strategy: check + check_cols: + - column_name + ``` + + + + + + +import SnapshotYaml from '/snippets/_snapshot-yaml-spec.md'; + + + ```jinja2 @@ -14,7 +39,7 @@ datatype: "[column_name] | all" ``` - + @@ -42,6 +67,29 @@ No default is provided. ### Check a list of columns for changes + + + + +```yaml +snapshots: + - name: orders_snapshot_check + relation: source('jaffle_shop', 'orders') + config: + schema: snapshots + unique_key: id + strategy: check + check_cols: + - status + - is_cancelled +``` + + +To select from this snapshot in a downstream model: `select * from {{ ref('orders_snapshot_check') }}` + + + + ```sql {% snapshot orders_snapshot_check %} @@ -58,8 +106,32 @@ No default is provided. {% endsnapshot %} ``` + + ### Check all columns for changes + + + + +```yaml +snapshots: + - name: orders_snapshot_check + relation: source('jaffle_shop', 'orders') + config: + schema: snapshots + unique_key: id + strategy: check + check_cols: + - all + ``` + + +To select from this snapshot in a downstream model: `select * from {{{ ref('orders_snapshot_check') }}` + + + + ```sql {% snapshot orders_snapshot_check %} @@ -75,3 +147,4 @@ No default is provided. {% endsnapshot %} ``` + diff --git a/website/docs/reference/resource-configs/invalidate_hard_deletes.md b/website/docs/reference/resource-configs/invalidate_hard_deletes.md index ba5b37c5d71..bdaec7e33a9 100644 --- a/website/docs/reference/resource-configs/invalidate_hard_deletes.md +++ b/website/docs/reference/resource-configs/invalidate_hard_deletes.md @@ -4,6 +4,31 @@ description: "Invalidate_hard_deletes - Read this in-depth guide to learn about datatype: column_name --- + + + + + +```yaml +snapshots: + - name: snapshot + relation: source('my_source', 'my_table') + [config](/reference/snapshot-configs): + strategy: timestamp + invalidate_hard_deletes: true | false +``` + + + + + + + + +import SnapshotYaml from '/snippets/_snapshot-yaml-spec.md'; + + + ```jinja2 @@ -17,6 +42,7 @@ datatype: column_name ``` + @@ -39,6 +65,26 @@ By default the feature is disabled. ## Example + + + +```yaml +snapshots: + - name: orders_snapshot + relation: source('jaffle_shop', 'orders') + config: + schema: snapshots + database: analytics + unique_key: id + strategy: timestamp + updated_at: updated_at + invalidate_hard_deletes: true + ``` + + + + + ```sql @@ -60,3 +106,4 @@ By default the feature is disabled. ``` + diff --git a/website/docs/reference/resource-configs/pre-hook-post-hook.md b/website/docs/reference/resource-configs/pre-hook-post-hook.md index e1e7d67f02e..ce818768134 100644 --- a/website/docs/reference/resource-configs/pre-hook-post-hook.md +++ b/website/docs/reference/resource-configs/pre-hook-post-hook.md @@ -109,6 +109,8 @@ snapshots: + + ```sql @@ -125,13 +127,14 @@ select ... ``` + - + ```yml snapshots: - name: [] - config: + [config](/reference/resource-properties/config): [pre_hook](/reference/resource-configs/pre-hook-post-hook): | [] [post_hook](/reference/resource-configs/pre-hook-post-hook): | [] ``` diff --git a/website/docs/reference/resource-configs/snapshot_name.md b/website/docs/reference/resource-configs/snapshot_name.md index bb4826a116b..62480ac3f84 100644 --- a/website/docs/reference/resource-configs/snapshot_name.md +++ b/website/docs/reference/resource-configs/snapshot_name.md @@ -2,6 +2,27 @@ description: "Snapshot-name - Read this in-depth guide to learn about configurations in dbt." --- + + + +```yaml +snapshots: + - name: snapshot_name + relation: source('my_source', 'my_table') + config: + schema: string + database: string + unique_key: column_name_or_expression + strategy: timestamp | check + updated_at: column_name # Required if strategy is 'timestamp' + +``` + + + + + + ```jinja2 @@ -13,9 +34,15 @@ description: "Snapshot-name - Read this in-depth guide to learn about configurat +import SnapshotYaml from '/snippets/_snapshot-yaml-spec.md'; + + + + + ## Description -The name of a snapshot, as defined in the `{% snapshot %}` block header. This name is used when selecting from a snapshot using the [`ref` function](/reference/dbt-jinja-functions/ref) +The name of a snapshot, which is used when selecting from a snapshot using the [`ref` function](/reference/dbt-jinja-functions/ref) This name must not conflict with the name of any other "refable" resource (models, seeds, other snapshots) defined in this project or package. @@ -24,6 +51,26 @@ The name does not need to match the file name. As a result, snapshot filenames d ## Examples ### Name a snapshot `order_snapshot` + + + + +```yaml +snapshots: + - name: order_snapshot + relation: source('my_source', 'my_table') + config: + schema: string + database: string + unique_key: column_name_or_expression + strategy: timestamp | check + updated_at: column_name # Required if strategy is 'timestamp' +``` + + + + + ```jinja2 @@ -35,6 +82,7 @@ The name does not need to match the file name. As a result, snapshot filenames d + To select from this snapshot in a downstream model: diff --git a/website/docs/reference/resource-configs/strategy.md b/website/docs/reference/resource-configs/strategy.md index b67feb64fbd..e2b2cac1c59 100644 --- a/website/docs/reference/resource-configs/strategy.md +++ b/website/docs/reference/resource-configs/strategy.md @@ -4,6 +4,13 @@ description: "Strategy - Read this in-depth guide to learn about configurations datatype: timestamp | check --- + + +import SnapshotYaml from '/snippets/_snapshot-yaml-spec.md'; + + + + + + + + + ```yaml + snapshots: + - [name: snapshot_name](/reference/resource-configs/snapshot_name): + relation: source('my_source', 'my_table') + config: + strategy: timestamp + updated_at: column_name + ``` + + + + + ```jinja2 @@ -30,6 +54,7 @@ select ... ``` + @@ -47,6 +72,22 @@ snapshots: + + + + + ```yaml + snapshots: + - [name: snapshot_name](/reference/resource-configs/snapshot_name): + relation: source('my_source', 'my_table') + config: + strategy: check + check_cols: [column_name] | "all" + ``` + + + + ```jinja2 @@ -62,6 +103,7 @@ snapshots: ``` + @@ -88,7 +130,25 @@ This is a **required configuration**. There is no default value. ## Examples ### Use the timestamp strategy + + + +```yaml +snapshots: + - name: orders_snapshot_timestamp + relation: source('jaffle_shop', 'orders') + config: + schema: snapshots + strategy: timestamp + unique_key: id + updated_at: updated_at + +``` + + + + ```sql @@ -109,10 +169,32 @@ This is a **required configuration**. There is no default value. ``` + ### Use the check strategy + + + +```yaml +snapshots: + - name: orders_snapshot_check + relation: source('jaffle_shop', 'orders') + config: + schema: snapshots + unique_key: id + strategy: check + check_cols: + - status + - is_cancelled + +``` + + + + + ```sql {% snapshot orders_snapshot_check %} @@ -129,6 +211,7 @@ This is a **required configuration**. There is no default value. {% endsnapshot %} ``` + ### Advanced: define and use custom snapshot strategy Behind the scenes, snapshot strategies are implemented as macros, named `snapshot__strategy` @@ -140,6 +223,23 @@ It's possible to implement your own snapshot strategy by adding a macro with the 1. Create a macro named `snapshot_timestamp_with_deletes_strategy`. Use the existing code as a guide and adjust as needed. 2. Use this strategy via the `strategy` configuration: + + + +```yaml +snapshots: + - name: my_custom_snapshot + relation: source('my_source', 'my_table') + config: + strategy: timestamp_with_deletes + updated_at: updated_at_column + unique_key: id +``` + + + + + ```jinja2 @@ -155,3 +255,4 @@ It's possible to implement your own snapshot strategy by adding a macro with the ``` + diff --git a/website/docs/reference/resource-configs/unique_key.md b/website/docs/reference/resource-configs/unique_key.md index 9ad3417fd5e..996e7148292 100644 --- a/website/docs/reference/resource-configs/unique_key.md +++ b/website/docs/reference/resource-configs/unique_key.md @@ -4,6 +4,29 @@ description: "Unique_key - Read this in-depth guide to learn about configuration datatype: column_name_or_expression --- + + + + + +```yaml +snapshots: + - name: orders_snapshot + relation: source('my_source', 'my_table') + [config](/reference/snapshot-configs): + unique_key: id + +``` + + + + + + +import SnapshotYaml from '/snippets/_snapshot-yaml-spec.md'; + + + ```jinja2 @@ -12,8 +35,8 @@ datatype: column_name_or_expression ) }} ``` - + @@ -29,6 +52,8 @@ snapshots: ## Description A column name or expression that is unique for the inputs of a snapshot. dbt uses this to match records between a result set and an existing snapshot, so that changes can be captured correctly. +In Versionless and dbt v1.9 and later, [snapshots](/docs/build/snapshots) are defined and configured in YAML files within your `snapshots/` directory. The `unique_key` is specified within the `config` block of your snapshot YAML file. + :::caution Providing a non-unique key will result in unexpected snapshot results. dbt **will not** test the uniqueness of this key, consider [testing](/blog/primary-key-testing#how-to-test-primary-keys-with-dbt) the source data to ensure that this key is indeed unique. @@ -41,6 +66,26 @@ This is a **required parameter**. No default is provided. ## Examples ### Use an `id` column as a unique key + + + + + +```yaml +snapshots: + - name: orders_snapshot + relation: source('jaffle_shop', 'orders') + config: + schema: snapshots + unique_key: id + strategy: timestamp + updated_at: updated_at + +``` + + + + ```jinja2 @@ -55,7 +100,9 @@ This is a **required parameter**. No default is provided. You can also write this in yaml. This might be a good idea if multiple snapshots share the same `unique_key` (though we prefer to apply this configuration in a config block, as above). + +You can also specify configurations in your `dbt_project.yml` file if multiple snapshots share the same `unique_key`: ```yml @@ -70,6 +117,25 @@ snapshots: ### Use a combination of two columns as a unique key This configuration accepts a valid column expression. As such, you can concatenate two columns together as a unique key if required. It's a good idea to use a separator (e.g. `'-'`) to ensure uniqueness. + + + + +```yaml +snapshots: + - name: transaction_items_snapshot + relation: source('erp', 'transactions') + config: + schema: snapshots + unique_key: "transaction_id || '-' || line_item_id" + strategy: timestamp + updated_at: updated_at + +``` + + + + @@ -93,10 +159,45 @@ from {{ source('erp', 'transactions') }} ``` + Though, it's probably a better idea to construct this column in your query and use that as the `unique_key`: + + + + +```yaml +snapshots: + - name: transaction_items_snapshot + relation: {{ ref('transaction_items_ephemeral') }} + config: + schema: snapshots + unique_key: id + strategy: timestamp + updated_at: updated_at +``` + + + + +```sql +{{ config(materialized='ephemeral') }} + +select + transaction_id || '-' || line_item_id as id, + * +from {{ source('erp', 'transactions') }} + +``` + + + +In this example, we create an ephemeral model `transaction_items_ephemeral` that creates an `id` column that can be used as the `unique_key` our snapshot configuration. + + + ```jinja2 @@ -121,3 +222,4 @@ from {{ source('erp', 'transactions') }} ``` + diff --git a/website/docs/reference/resource-configs/updated_at.md b/website/docs/reference/resource-configs/updated_at.md index c61b04264be..09122859e43 100644 --- a/website/docs/reference/resource-configs/updated_at.md +++ b/website/docs/reference/resource-configs/updated_at.md @@ -3,6 +3,29 @@ resource_types: [snapshots] description: "Updated_at - Read this in-depth guide to learn about configurations in dbt." datatype: column_name --- + + + + + + +```yaml +snapshots: + - name: snapshot + relation: source('my_source', 'my_table') + [config](/reference/snapshot-configs): + strategy: timestamp + updated_at: column_name +``` + + + + + +import SnapshotYaml from '/snippets/_snapshot-yaml-spec.md'; + + + ```jinja2 @@ -14,6 +37,7 @@ datatype: column_name ``` + @@ -37,7 +61,6 @@ You will get a warning if the data type of the `updated_at` column does not matc
- ## Description A column within the results of your snapshot query that represents when the record row was last updated. @@ -50,6 +73,25 @@ No default is provided. ## Examples ### Use a column name `updated_at` + + + + +```yaml +snapshots: + - name: orders_snapshot + relation: source('jaffle_shop', 'orders') + config: + schema: snapshots + unique_key: id + strategy: timestamp + updated_at: updated_at + +``` + + + + ```sql @@ -72,12 +114,55 @@ select * from {{ source('jaffle_shop', 'orders') }} ``` + ### Coalesce two columns to create a reliable `updated_at` column Consider a data source that only has an `updated_at` column filled in when a record is updated (so a `null` value indicates that the record hasn't been updated after it was created). Since the `updated_at` configuration only takes a column name, rather than an expression, you should update your snapshot query to include the coalesced column. + + + +1. Create an staging model to perform the transformation. + In your `models/` directory, create a SQL file that configures an staging model to coalesce the `updated_at` and `created_at` columns into a new column `updated_at_for_snapshot`. + + + + ```sql + select * coalesce (updated_at, created_at) as updated_at_for_snapshot + from {{ source('jaffle_shop', 'orders') }} + + ``` + + +2. Define the snapshot configuration in a YAML file. + In your `snapshots/` directory, create a YAML file that defines your snapshot and references the `updated_at_for_snapshot` staging model you just created. + + + + ```yaml + snapshots: + - name: orders_snapshot + relation: ref('staging_orders') + config: + schema: snapshots + unique_key: id + strategy: timestamp + updated_at: updated_at_for_snapshot + + ``` + + +3. Run `dbt snapshot` to execute the snapshot. + +Alternatively, you can also create an ephemeral model to performs the required transformations. Then, you reference this model in your snapshot's `relation` key. + + + + + + ```sql @@ -104,3 +189,4 @@ from {{ source('jaffle_shop', 'orders') }} ``` + diff --git a/website/docs/reference/snapshot-configs.md b/website/docs/reference/snapshot-configs.md index fd5969a4c8c..87063ce592e 100644 --- a/website/docs/reference/snapshot-configs.md +++ b/website/docs/reference/snapshot-configs.md @@ -24,15 +24,24 @@ Parts of a snapshot: + + +import SnapshotYaml from '/snippets/_snapshot-yaml-spec.md'; + + + + + + @@ -87,6 +96,8 @@ snapshots: + +Refer to [configuring snapshots](/docs/build/snapshots#configuring-snapshots) for the available configurations. @@ -108,33 +119,21 @@ snapshots: - - - + -```jinja - -{{ config( - [target_schema](/reference/resource-configs/target_schema)="", - [target_database](/reference/resource-configs/target_database)="", - [unique_key](/reference/resource-configs/unique_key)="", - [strategy](/reference/resource-configs/strategy)="timestamp" | "check", - [updated_at](/reference/resource-configs/updated_at)="", - [check_cols](/reference/resource-configs/check_cols)=[""] | "all" -) }} + -``` +Configurations can be applied to snapshots using [YAML syntax](/docs/build/snapshots), available in Versionless and dbt v1.9 and higher, in the the `snapshot` directory file. - + ```jinja {{ config( - [schema](/reference/resource-configs/schema)="", - [database](/reference/resource-configs/database)="", - [alias](/reference/resource-configs/alias)="", + [target_schema](/reference/resource-configs/target_schema)="", + [target_database](/reference/resource-configs/target_database)="", [unique_key](/reference/resource-configs/unique_key)="", [strategy](/reference/resource-configs/strategy)="timestamp" | "check", [updated_at](/reference/resource-configs/updated_at)="", @@ -142,7 +141,6 @@ snapshots: ) }} ``` - @@ -159,7 +157,7 @@ snapshots: defaultValue="project-yaml" values={[ { label: 'Project file', value: 'project-yaml', }, - { label: 'Property file', value: 'property-yaml', }, + { label: 'YAML file', value: 'property-yaml', }, { label: 'Config block', value: 'config', }, ] }> @@ -184,6 +182,30 @@ snapshots: + + + + +```yaml +version: 2 + +snapshots: + - name: [] + config: + [enabled](/reference/resource-configs/enabled): true | false + [tags](/reference/resource-configs/tags): | [] + [alias](/reference/resource-configs/alias): + [pre-hook](/reference/resource-configs/pre-hook-post-hook): | [] + [post-hook](/reference/resource-configs/pre-hook-post-hook): | [] + [persist_docs](/reference/resource-configs/persist_docs): {} + [grants](/reference/resource-configs/grants): {} +``` + + + + + + ```yaml @@ -191,6 +213,7 @@ version: 2 snapshots: - name: [] + relation: source('my_source', 'my_table') config: [enabled](/reference/resource-configs/enabled): true | false [tags](/reference/resource-configs/tags): | [] @@ -202,11 +225,19 @@ snapshots: ``` + + + +Configurations can be applied to snapshots using [YAML syntax](/docs/build/snapshots), available in Versionless and dbt v1.9 and higher, in the the `snapshot` directory file. + + + + ```jinja @@ -222,106 +253,140 @@ snapshots: ``` + + - ## Configuring snapshots -Snapshots can be configured in one of three ways: - -1. Using a `config` block within a snapshot -2. Using a `config` [resource property](/reference/model-properties) in a `.yml` file -3. From the `dbt_project.yml` file, under the `snapshots:` key. To apply a configuration to a snapshot, or directory of snapshots, define the resource path as nested dictionary keys. - -Snapshot configurations are applied hierarchically in the order above. - -### Examples -#### Apply configurations to all snapshots -To apply a configuration to all snapshots, including those in any installed [packages](/docs/build/packages), nest the configuration directly under the `snapshots` key: +Snapshots can be configured in multiple ways: - - -```yml - -snapshots: - +unique_key: id -``` - - + +1. Defined in YAML files using a `config` [resource property](/reference/model-properties), typically in your [snapshots directory](/reference/project-configs/snapshot-paths) (available in [Versionless](/docs/dbt-versions/versionless-cloud) or and dbt Core v1.9 and higher). +2. From the `dbt_project.yml` file, under the `snapshots:` key. To apply a configuration to a snapshot, or directory of snapshots, define the resource path as nested dictionary keys. + -#### Apply configurations to all snapshots in your project -To apply a configuration to all snapshots in your project only (for example, _excluding_ any snapshots in installed packages), provide your project name as part of the resource path. + -For a project named `jaffle_shop`: +1. Defined in YAML files using a `config` [resource property](/reference/model-properties), typically in your [snapshots directory](/reference/project-configs/snapshot-paths) (available in [Versionless](/docs/dbt-versions/versionless-cloud) or and dbt Core v1.9 and higher). +2. Using a `config` block within a snapshot defined in Jinja SQL +3. From the `dbt_project.yml` file, under the `snapshots:` key. To apply a configuration to a snapshot, or directory of snapshots, define the resource path as nested dictionary keys. - +Note that in Versionless and dbt v1.9 and later, snapshots are defined in an updated syntax using a YAML file within your `snapshots/` directory (as defined by the [`snapshot-paths` config](/reference/project-configs/snapshot-paths)). For faster and more efficient management, consider the updated snapshot YAML syntax, [available in Versionless](/docs/dbt-versions/versionless-cloud) or [dbt Core v1.9 and later](/docs/dbt-versions/core). -```yml + -snapshots: - jaffle_shop: - +unique_key: id -``` +Snapshot configurations are applied hierarchically in the order above with higher taking precedence. - +### Examples +The following examples demonstrate how to configure snapshots using the `dbt_project.yml` file, a `config` block within a snapshot, and a `.yml` file. -Similarly, you can use the name of an installed package to configure snapshots in that package. +- #### Apply configurations to all snapshots + To apply a configuration to all snapshots, including those in any installed [packages](/docs/build/packages), nest the configuration directly under the `snapshots` key: -#### Apply configurations to one snapshot only + -We recommend using `config` blocks if you need to apply a configuration to one snapshot only. + ```yml - + snapshots: + +unique_key: id + ``` -```sql -{% snapshot orders_snapshot %} - {{ - config( - unique_key='id', - strategy='timestamp', - updated_at='updated_at' - ) - }} - -- Pro-Tip: Use sources in snapshots! - select * from {{ source('jaffle_shop', 'orders') }} -{% endsnapshot %} -``` + - +- #### Apply configurations to all snapshots in your project + To apply a configuration to all snapshots in your project only (for example, _excluding_ any snapshots in installed packages), provide your project name as part of the resource path. -You can also use the full resource path (including the project name, and subdirectories) to configure an individual snapshot from your `dbt_project.yml` file. + For a project named `jaffle_shop`: -For a project named `jaffle_shop`, with a snapshot file within the `snapshots/postgres_app/` directory, where the snapshot is named `orders_snapshot` (as above), this would look like: + - + ```yml -```yml -snapshots: - jaffle_shop: - postgres_app: - orders_snapshot: + snapshots: + jaffle_shop: +unique_key: id - +strategy: timestamp - +updated_at: updated_at -``` - - - -You can also define some common configs in a snapshot's `config` block. We don't recommend this for a snapshot's required configuration, however. - - - -```yml -version: 2 - -snapshots: - - name: orders_snapshot - config: - persist_docs: - relation: true - columns: true -``` - - + ``` + + + + Similarly, you can use the name of an installed package to configure snapshots in that package. + +- #### Apply configurations to one snapshot only + + + Use `config` blocks if you need to apply a configuration to one snapshot only. + + + + ```sql + {% snapshot orders_snapshot %} + {{ + config( + unique_key='id', + strategy='timestamp', + updated_at='updated_at' + ) + }} + -- Pro-Tip: Use sources in snapshots! + select * from {{ source('jaffle_shop', 'orders') }} + {% endsnapshot %} + ``` + + + + + + + + ```yaml + snapshots: + - name: orders_snapshot + relation: source('jaffle_shop', 'orders') + config: + unique_key: id + strategy: timestamp + updated_at: updated_at + persist_docs: + relation: true + columns: true + ``` + + Pro-tip: Use sources in snapshots: `select * from {{ source('jaffle_shop', 'orders') }}` + + + You can also use the full resource path (including the project name, and subdirectories) to configure an individual snapshot from your `dbt_project.yml` file. + + For a project named `jaffle_shop`, with a snapshot file within the `snapshots/postgres_app/` directory, where the snapshot is named `orders_snapshot` (as above), this would look like: + + + + ```yml + snapshots: + jaffle_shop: + postgres_app: + orders_snapshot: + +unique_key: id + +strategy: timestamp + +updated_at: updated_at + ``` + + + + You can also define some common configs in a snapshot's `config` block. We don't recommend this for a snapshot's required configuration, however. + + + + ```yml + version: 2 + + snapshots: + - name: orders_snapshot + +persist_docs: + relation: true + columns: true + ``` + + diff --git a/website/docs/reference/snapshot-properties.md b/website/docs/reference/snapshot-properties.md index 49769af8f6d..d940a9f344c 100644 --- a/website/docs/reference/snapshot-properties.md +++ b/website/docs/reference/snapshot-properties.md @@ -3,12 +3,62 @@ title: Snapshot properties description: "Read this guide to learn about using source properties in dbt." --- + + +In Versionless and dbt v1.9 and later, snapshots are defined and configured in YAML files within your `snapshots/` directory (as defined by the [`snapshot-paths` config](/reference/project-configs/snapshot-paths)). Snapshot properties are declared within these YAML files, allowing you to define both the snapshot configurations and properties in one place. + + + + + Snapshots properties can be declared in `.yml` files in: -- your `snapshots/` directory (as defined by the [`snapshot-paths` config](/reference/project-configs/snapshot-paths)) +- your `snapshots/` directory (as defined by the [`snapshot-paths` config](/reference/project-configs/snapshot-paths)). - your `models/` directory (as defined by the [`model-paths` config](/reference/project-configs/model-paths)) +Note, in Versionless and dbt v1.9 and later, snapshots are defined in an updated syntax using a YAML file within your `snapshots/` directory (as defined by the [`snapshot-paths` config](/reference/project-configs/snapshot-paths)). For faster and more efficient management, consider the updated snapshot YAML syntax, [available in Versionless](/docs/dbt-versions/versionless-cloud) or [dbt Core v1.9 and later](/docs/dbt-versions/core). + + + We recommend that you put them in the `snapshots/` directory. You can name these files `whatever_you_want.yml`, and nest them arbitrarily deeply in subfolders within the `snapshots/` or `models/` directory. + + + + +```yml +version: 2 + +snapshots: + - name: + [description](/reference/resource-properties/description): + [meta](/reference/resource-configs/meta): {} + [docs](/reference/resource-configs/docs): + show: true | false + node_color: # Use name (such as node_color: purple) or hex code with quotes (such as node_color: "#cd7f32") + [config](/reference/resource-properties/config): + [](/reference/snapshot-configs): + [tests](/reference/resource-properties/data-tests): + - + - ... + columns: + - name: + [description](/reference/resource-properties/description): + [meta](/reference/resource-configs/meta): {} + [quote](/reference/resource-properties/quote): true | false + [tags](/reference/resource-configs/tags): [] + [tests](/reference/resource-properties/data-tests): + - + - ... # declare additional tests + - ... # declare properties of additional columns + + - name: ... # declare properties of additional snapshots + +``` + + + + + ```yml @@ -41,3 +91,4 @@ snapshots: ``` + diff --git a/website/snippets/_snapshot-yaml-spec.md b/website/snippets/_snapshot-yaml-spec.md new file mode 100644 index 00000000000..8bbdc6be72e --- /dev/null +++ b/website/snippets/_snapshot-yaml-spec.md @@ -0,0 +1,4 @@ +:::info Use the latest snapshot syntax + +In Versionless and dbt v1.9 and later, snapshots are defined in an updated syntax using a YAML file within your `snapshots/` directory (as defined by the [`snapshot-paths` config](/reference/project-configs/snapshot-paths)). For faster and more efficient management, consider the updated snapshot YAML syntax, [available in Versionless](/docs/dbt-versions/versionless-cloud) or [dbt Core v1.9 and later](/docs/dbt-versions/core). +:::