Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add time spine/granularity to model props, clarify time spine docs #6208

Merged
merged 5 commits into from
Oct 2, 2024
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion website/docs/docs/build/dimensions.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ Dimensions represent the non-aggregatable columns in your data set, which are th

Groups are defined within semantic models, alongside entities and measures, and correspond to non-aggregatable columns in your dbt model that provides categorical or time-based context. In SQL, dimensions is typically included in the GROUP BY clause.-->

All dimensions require a `name`, `type`, and can optionally include an `expr` parameter. The `name` for your Dimension must be unique wihtin the same semantic model.
All dimensions require a `name`, `type`, and can optionally include an `expr` parameter. The `name` for your Dimension must be unique within the same semantic model.

| Parameter | Description | Type |
| --------- | ----------- | ---- |
Expand Down
41 changes: 25 additions & 16 deletions website/docs/docs/build/metricflow-time-spine.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,8 @@ tags: [Metrics, Semantic Layer]

It's common in analytics engineering to have a date dimension or "time-spine" table as a base table for different types of time-based joins and aggregations. The structure of this table is typically a base column of daily or hourly dates, with additional columns for other time grains, like fiscal quarters, defined based on the base column. You can join other tables to the time spine on the base column to calculate metrics like revenue at a point in time, or to aggregate to a specific time grain.

MetricFlow requires you to define a time-spine table as a model-level configuration in the Semantic Layer for time-based joins and aggregations, such as cumulative metrics. This configuration informs dbt which model should be used for time range joins. It is especially useful for cumulative metrics or calculating time-based offsets. The time-spine model is joined to other tables when calculating certain types of metrics or dimensions. MetricFlow will join the time-spine model in the compiled SQL for the following types of metrics and dimensions:
MetricFlow requires you to define at least one dbt model which provides a time-spine, and then specify (in YAML) the columns to be used for time-based joins. MetricFlow will join against the time-spine model for the following types of metrics and dimensions:

- [Cumulative metrics](/docs/build/cumulative)
- [Metric offsets](/docs/build/derived#derived-metric-offset)
- [Conversion metrics](/docs/build/conversion)
Expand All @@ -19,20 +20,18 @@ To see the generated SQL for the metric and dimension types that use time-spine

## Configuring time-spine in YAML

- The time spine is a special model that tells dbt and MetricFlow how to use specific columns by defining their properties.
- The [`models` key](/reference/model-properties) for the time spine must be in your `models/` directory.
- Each time spine is a normal dbt model with extra configurations that tell dbt and MetricFlow how to use specific columns by defining their properties.
- You likely already have a calendar table in your project which you can use. If you don't, review the [example time-spine tables](#example-time-spine-tables) below for sample code.
mirnawong1 marked this conversation as resolved.
Show resolved Hide resolved
- You add the configurations under the `time_spine` key for that [model's properties](/reference/model-properties), just as you would add a description or tests.
- You only need to configure time-spine models that the Semantic Layer should recognize.
- At a minimum, define a time-spine table for a daily grain.
- You can optionally define a time-spine table for a different granularity, like hourly.
- Note that if you don’t have a date or calendar model in your project, you'll need to create one.
- You can optionally define additional time-spine tables for different granularities, like hourly. Review the [granularity considerations](#granularity-considerations) when deciding which tables to create.

- If you're looking to specify the grain of a time dimension so that MetricFlow can transform the underlying column to the required granularity, refer to the [Time granularity documentation](/docs/build/dimensions?dimension=time_gran)

If you already have a date dimension or time-spine table in your dbt project, you can point MetricFlow to this table by updating the `model` configuration to use this table in the Semantic Layer. This is a model-level configuration that tells dbt to use the model for time range joins in the Semantic Layer.

For example, given the following directory structure, you can create two time spine configurations, `time_spine_hourly` and `time_spine_daily`. MetricFlow supports granularities ranging from milliseconds to years. Refer to the [Dimensions page](/docs/build/dimensions?dimension=time_gran#time) (time_granularity tab) to find the full list of supported granularities.

:::tip
:::tip
Previously, you had to create a model called `metricflow_time_spine` in your dbt project. Now, if your project already includes a date dimension or time spine table, you can simply configure MetricFlow to use that table by updating the `model` setting in the Semantic Layer.

If you don’t have a date dimension table, you can still create one by using the code snippet below to build your time spine model.
Expand All @@ -46,34 +45,38 @@ If you don’t have a date dimension table, you can still create one by using th
```yaml
[models:](/reference/model-properties)
- name: time_spine_hourly
description: A date spine with one row per hour, ranging from 2020-01-01 to 2039-12-31.
time_spine:
standard_granularity_column: date_hour # column for the standard grain of your table
columns:
- name: date_hour
granularity: hour # set granularity at column-level for standard_granularity_column

- name: time_spine_daily
description: A date spine with one row per day, ranging from 2020-01-01 to 2039-12-31.
time_spine:
standard_granularity_column: date_day # column for the standard grain of your table
columns:
- name: date_day
granularity: day # set granularity at column-level for standard_granularity_column
```

</File>

For an example project, refer to our [Jaffle shop](https://github.com/dbt-labs/jaffle-sl-template/blob/main/models/marts/_models.yml) example. Note that the [`models` key](/reference/model-properties) in the time spine configuration must be placed in your `models/` directory.
For an example project, refer to our [Jaffle shop](https://github.com/dbt-labs/jaffle-sl-template/blob/main/models/marts/_models.yml) example.

Now, break down the configuration above. It's pointing to a model called `time_spine_daily`. It sets the time spine configurations under the `time_spine` key. The `standard_granularity_column` is the lowest grain of the table, in this case, it's hourly. It needs to reference a column defined under the columns key, in this case, `date_hour`. Use the `standard_granularity_column` as the join key for the time spine table when joining tables in MetricFlow. Here, the granularity of the `standard_granularity_column` is set at the column level, in this case, `hour`.
Now, break down the configuration above. It's pointing to a model called `time_spine_daily`, and all the configuration is colocated with the rest of the [model's properties](/reference/model-properties). It sets the time spine configurations under the `time_spine` key. The `standard_granularity_column` is the lowest grain of the table, in this case, it's hourly. It needs to reference a column defined under the columns key, in this case, `date_hour`. Use the `standard_granularity_column` as the join key for the time spine table when joining tables in MetricFlow. Here, the granularity of the `standard_granularity_column` is set at the column level, in this case, `hour`.

### Considerations when choosing which granularities to create{#granularity-considerations}

If you need to create a time spine table from scratch, you can do so by adding the following code to your dbt project.
The example creates a time spine at a daily grain and an hourly grain. A few things to note when creating time spine models:
* MetricFlow will use the time spine with the largest compatible granularity for a given query to ensure the most efficient query possible. For example, if you have a time spine at a monthly grain, and query a dimension at a monthly grain, MetricFlow will use the monthly time spine. If you only have a daily time spine, MetricFlow will use the daily time spine and date_trunc to month.
* You can add a time spine for each granularity you intend to use if query efficiency is more important to you than configuration time, or storage constraints. For most engines, the query performance difference should be minimal and transforming your time spine to a coarser grain at query time shouldn't add significant overhead to your queries.
* We recommend having a time spine at the finest grain used in any of your dimensions to avoid unexpected errors. i.e., if you have dimensions at an hourly grain, you should have a time spine at an hourly grain.
- MetricFlow will use the time spine with the largest compatible granularity for a given query to ensure the most efficient query possible. For example, if you have a time spine at a monthly grain, and query a dimension at a monthly grain, MetricFlow will use the monthly time spine. If you only have a daily time spine, MetricFlow will use the daily time spine and date_trunc to month.
- You can add a time spine for each granularity you intend to use if query efficiency is more important to you than configuration time, or storage constraints. For most engines, the query performance difference should be minimal and transforming your time spine to a coarser grain at query time shouldn't add significant overhead to your queries.
- We recommend having a time spine at the finest grain used in any of your dimensions to avoid unexpected errors. i.e., if you have dimensions at an hourly grain, you should have a time spine at an hourly grain.
mirnawong1 marked this conversation as resolved.
Show resolved Hide resolved

## Example time-spine tables

### Daily

<File name="metricflow_time_spine.sql">

<VersionBlock lastVersion="1.6">
Expand Down Expand Up @@ -140,9 +143,11 @@ select * from final
where date_day > dateadd(year, -4, current_timestamp())
and date_hour < dateadd(day, 30, current_timestamp())
```

</VersionBlock>

### Daily (BigQuery)

Use this model if you're using BigQuery. BigQuery supports `DATE()` instead of `TO_DATE()`:
<VersionBlock lastVersion="1.6">

Expand Down Expand Up @@ -170,6 +175,7 @@ from final
where date_day > dateadd(year, -4, current_timestamp())
and date_hour < dateadd(day, 30, current_timestamp())
```

</File>
</VersionBlock>

Expand Down Expand Up @@ -200,12 +206,14 @@ from final
where date_day > dateadd(year, -4, current_timestamp())
and date_hour < dateadd(day, 30, current_timestamp())
```

</File>
</VersionBlock>

</File>

### Hourly
### Hourly

<File name='time_spine_hourly.sql'>

```sql
Expand Down Expand Up @@ -237,4 +245,5 @@ select * from final
where date_day > dateadd(year, -4, current_timestamp())
and date_hour < dateadd(day, 30, current_timestamp())
```

</File>
10 changes: 8 additions & 2 deletions website/docs/reference/model-properties.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@
title: Model properties
---

Models properties can be declared in `.yml` files in your `models/` directory (as defined by the [`model-paths` config](/reference/project-configs/model-paths)).
Models properties can be declared in `.yml` files in your `models/` directory (as defined by the [`model-paths` config](/reference/project-configs/model-paths)).

You can name these files `whatever_you_want.yml`, and nest them arbitrarily deeply in subfolders within the `models/` directory. The [MetricFlow time spine](/docs/build/metricflow-time-spine) is a model property that tells dbt and MetricFlow how to use specific columns by defining their properties.
You can name these files `whatever_you_want.yml`, and nest them arbitrarily deeply in subfolders within the `models/` directory.

<File name='models/<filename>.yml'>

Expand Down Expand Up @@ -38,9 +38,15 @@ models:
- <test>
- ... # declare additional data tests
[tags](/reference/resource-configs/tags): [<string>]

# only required in conjunction with time_spine key
granularity: <[any supported time granularity](/docs/build/dimensions?dimension=time_gran)>

- name: ... # declare properties of additional columns

[time_spine](/docs/build/metricflow-time-spine):
standard_granularity_column: <column_name>

[versions](/reference/resource-properties/versions):
- [v](/reference/resource-properties/versions#v): <version_identifier> # required
[defined_in](/reference/resource-properties/versions#defined-in): <definition_file_name>
Expand Down
Loading