Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metrics package #1

Merged
merged 72 commits into from
Feb 9, 2022
Merged
Show file tree
Hide file tree
Changes from 62 commits
Commits
Show all changes
72 commits
Select commit Hold shift + click to select a range
a7e26b9
dbt-client compatibility changes
joellabes Nov 23, 2021
8ea2f71
Create helper macros for primary and secondary aggregation
joellabes Nov 23, 2021
25dde87
average rename
joellabes Nov 23, 2021
21e5f59
Add comment discussing difference between primary and secondary calc …
joellabes Nov 23, 2021
c0dc68f
Hide original file from dbt
joellabes Nov 24, 2021
e1e9a9d
Swap out dynamic macro dispatch for hardcoded list that actually works
joellabes Nov 24, 2021
c6c0773
Move `how` to PoP macro as that’s the only place it happens
joellabes Nov 24, 2021
287fa1a
Move debug macros, change helpers to use consistent signature
joellabes Dec 16, 2021
1adec6f
Pull get_metric out to its own macro
joellabes Dec 16, 2021
9e8bd89
Swap in a proper table
joellabes Dec 16, 2021
f868024
Update dbt_project.yml, add slack metric def
joellabes Dec 16, 2021
a045ff8
Change metric calculation to support arbitrary calendar properties
joellabes Dec 16, 2021
f93eae8
Protect against not execute
joellabes Jan 7, 2022
015cab5
Metrics namespacing
joellabes Jan 7, 2022
10191b4
Access relations without using ref
joellabes Jan 7, 2022
279c02a
Import utils
joellabes Jan 7, 2022
49d0ada
Count * if no expression provided
joellabes Jan 7, 2022
c3fff85
Remove redundant files and add to gitignore
joellabes Jan 14, 2022
e027aaa
Remove redundant aggregate references in secondary calcs
joellabes Jan 14, 2022
f851c69
whitespace management and remove obsolete TODOs
joellabes Jan 14, 2022
7a60975
Remove redundant metric_name arg (available on the main metric object)
joellabes Jan 14, 2022
6984d00
Moving stuff around
joellabes Jan 14, 2022
24b4116
Pull through date columns needed for secondary calcs
joellabes Jan 16, 2022
23573bb
Give aliases to calculations, protect against pulling a whole table f…
joellabes Jan 17, 2022
4ab2f4f
Everythig uses refs now, but it is very bad
joellabes Jan 18, 2022
0733b14
Rename metrics file
joellabes Jan 18, 2022
3f19e8d
Splitting into smaller files
joellabes Jan 18, 2022
395ebb4
add todo
joellabes Jan 18, 2022
6f17670
Forgot to bring through calculation alias
joellabes Jan 18, 2022
717bb5e
validate metrics queries make sense (legal grains, aggregates)
joellabes Jan 18, 2022
4fbfa6e
Moving stuff around, run legality tests
joellabes Jan 18, 2022
904a64a
Protect against missing key
joellabes Jan 18, 2022
de5ac01
Use joiner for prettier error message
joellabes Jan 18, 2022
2c21342
Goodbye debug file
joellabes Jan 18, 2022
b2b4d48
Actually rename debug file
joellabes Jan 18, 2022
b969f8b
swap one todo for another
joellabes Jan 18, 2022
5f0723a
Swap out loop for fancy one-liner
joellabes Jan 18, 2022
5f2884f
Add builtin calendar
joellabes Jan 19, 2022
2f910be
swap out fancy one-liner for a good-old-fashioned loop
joellabes Jan 19, 2022
022f2ff
Add integration tests project
joellabes Jan 19, 2022
95ac758
Protect against missing meta configs
joellabes Jan 19, 2022
06c4f87
Remove duplicate average key
joellabes Jan 19, 2022
35c6c9b
Add defaults to metric call
joellabes Jan 19, 2022
a94e8f2
Write README
joellabes Jan 19, 2022
b31e045
warnings about experimental behaviour
joellabes Jan 19, 2022
706181c
add secondary calcs shoutout
joellabes Jan 19, 2022
d2e15fa
Broader utils support
joellabes Jan 19, 2022
55f8763
Cleanup todos, protect against whitespace sql agg
joellabes Jan 19, 2022
5cf6bad
readme tweaks
joellabes Jan 19, 2022
71fb06f
Update get_metric_sql.sql
joellabes Jan 19, 2022
46a2cc9
Cross-db support
jtcohen6 Jan 19, 2022
a7e8940
Add ci, circle for now
jtcohen6 Jan 19, 2022
6617d7d
Update get_metric_sql.sql
joellabes Jan 20, 2022
cd7ba4b
Revert "Add ci, circle for now"
jtcohen6 Jan 20, 2022
b7337cf
Merge pull request #2 from dbt-labs/jerco/add-ci
joellabes Jan 20, 2022
f70b2a9
Set up end to end testing with GHA (#3)
Jan 24, 2022
e808975
Remove star macro
joellabes Jan 24, 2022
013ff11
Merge branch 'main' of https://github.com/dbt-labs/dbt_metrics
joellabes Jan 24, 2022
4c4c835
Remove debug command
joellabes Jan 24, 2022
cc6ffa3
Properly remove star macro
joellabes Jan 24, 2022
52f9885
Add comment expanding on problem
joellabes Feb 7, 2022
8705369
Add support for min aggregate (#4)
joellabes Feb 8, 2022
92ae4d6
fix a bunch of badly named stuff
joellabes Feb 9, 2022
fb5ce6e
meta accessing isn't dependent on index anymore
joellabes Feb 9, 2022
1b17247
Update readme to contain info on secondary calcs
joellabes Feb 9, 2022
5e22595
tweak TOC builder file
joellabes Feb 9, 2022
6d5338c
Update create-table-of-contents.yml
joellabes Feb 9, 2022
adf47f9
Update README.md
joellabes Feb 9, 2022
9af8c19
Auto update table of contents
joellabes Feb 9, 2022
153f132
Update README.md
joellabes Feb 9, 2022
3b62d07
Merge branch 'main' of https://github.com/dbt-labs/dbt_metrics
joellabes Feb 9, 2022
29e63c6
Auto update table of contents
joellabes Feb 9, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 35 additions & 0 deletions .github/actions/end-to-end-test/action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
name: "End to end testing"
description: "Set up profile and run dbt with test project"
inputs:
dbt-project:
description: "Location of test project"
required: false
default: "integration_tests"
dbt-target:
description: "Name of target to use when running dbt"
required: true
database-adapter-package:
description: "Name of database adapter to install"
required: true
runs:
using: "composite"
steps:
- name: Install python dependencies
shell: bash
run: |
pip install --user --upgrade pip
pip --version
pip install --pre ${{ inputs.database-adapter-package }}

- name: Setup dbt profile
shell: bash
run: |
mkdir -p $HOME/.dbt
cp ${{ github.action_path }}/sample.profiles.yml $HOME/.dbt/profiles.yml

- name: Run dbt
shell: bash
run: |
cd ${{ inputs.dbt-project }}
dbt deps --target ${{ inputs.dbt-target }}
dbt build --target ${{ inputs.dbt-target }} --full-refresh
48 changes: 48 additions & 0 deletions .github/actions/end-to-end-test/sample.profiles.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# HEY! This file is used in the dbt-utils integrations tests with GHA.
# You should __NEVER__ check credentials into version control. Thanks for reading :)

config:
send_anonymous_usage_stats: False
use_colors: True

dbt_metrics_integration_tests:
target: postgres
outputs:
postgres:
type: postgres
host: "{{ env_var('POSTGRES_TEST_HOST') }}"
user: "{{ env_var('POSTGRES_TEST_USER') }}"
pass: "{{ env_var('POSTGRES_TEST_PASSWORD') }}"
port: "{{ env_var('POSTGRES_TEST_PORT') | as_number }}"
dbname: "{{ env_var('POSTGRES_TEST_DB') }}"
schema: dbt_metrics_integration_tests
threads: 5

redshift:
type: redshift
host: "{{ env_var('REDSHIFT_TEST_HOST') }}"
user: "{{ env_var('REDSHIFT_TEST_USER') }}"
pass: "{{ env_var('REDSHIFT_TEST_PASS') }}"
dbname: "{{ env_var('REDSHIFT_TEST_DBNAME') }}"
port: "{{ env_var('REDSHIFT_TEST_PORT') | as_number }}"
schema: dbt_metrics_integration_tests
threads: 5

bigquery:
type: bigquery
method: service-account
keyfile: "{{ env_var('BIGQUERY_SERVICE_KEY_PATH') }}"
project: "{{ env_var('BIGQUERY_TEST_DATABASE') }}"
schema: dbt_metrics_integration_tests
threads: 10

snowflake:
type: snowflake
account: "{{ env_var('SNOWFLAKE_TEST_ACCOUNT') }}"
user: "{{ env_var('SNOWFLAKE_TEST_USER') }}"
password: "{{ env_var('SNOWFLAKE_TEST_PASSWORD') }}"
role: "{{ env_var('SNOWFLAKE_TEST_ROLE') }}"
database: "{{ env_var('SNOWFLAKE_TEST_DATABASE') }}"
warehouse: "{{ env_var('SNOWFLAKE_TEST_WAREHOUSE') }}"
schema: dbt_metrics_integration_tests
threads: 10
125 changes: 125 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
name: Continuous Integration

on:
push:
branches:
- "main"
pull_request:

jobs:
postgres:
runs-on: ubuntu-latest

# set up env vars so that we can use them to start an instance of postgres
env:
POSTGRES_TEST_USER: postgres
POSTGRES_TEST_PASSWORD: postgres
POSTGRES_TEST_DB: gha_test
POSTGRES_TEST_PORT: 5432
POSTGRES_TEST_HOST: localhost

services:
postgres:
image: postgres
env:
POSTGRES_USER: ${{ env.POSTGRES_TEST_USER }}
POSTGRES_PASSWORD: ${{ env.POSTGRES_TEST_PASSWORD }}
POSTGRES_DB: ${{ env.POSTGRES_TEST_DB }}
ports:
- 5432:5432
options: >-
--health-cmd pg_isready
--health-interval 10s
--health-timeout 5s
--health-retries 5

steps:
- name: Check out the repository
uses: actions/checkout@v2

- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: 3.9

- uses: ./.github/actions/end-to-end-test
with:
dbt-target: postgres
database-adapter-package: dbt-postgres

snowflake:
needs: postgres
runs-on: ubuntu-latest
steps:
- name: Check out the repository
uses: actions/checkout@v2

- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: 3.9

- uses: ./.github/actions/end-to-end-test
env:
SNOWFLAKE_TEST_ACCOUNT: ${{ secrets.SNOWFLAKE_TEST_ACCOUNT }}
SNOWFLAKE_TEST_USER: ${{ secrets.SNOWFLAKE_TEST_USER }}
SNOWFLAKE_TEST_PASSWORD: ${{ secrets.SNOWFLAKE_TEST_PASSWORD }}
SNOWFLAKE_TEST_ROLE: ${{ secrets.SNOWFLAKE_TEST_ROLE }}
SNOWFLAKE_TEST_DATABASE: ${{ secrets.SNOWFLAKE_TEST_DATABASE }}
SNOWFLAKE_TEST_WAREHOUSE: ${{ secrets.SNOWFLAKE_TEST_WAREHOUSE }}
with:
dbt-target: snowflake
database-adapter-package: dbt-snowflake

redshift:
needs: postgres
runs-on: ubuntu-latest
steps:
- name: Check out the repository
uses: actions/checkout@v2

- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: 3.9

- uses: ./.github/actions/end-to-end-test
env:
REDSHIFT_TEST_HOST: ${{ secrets.REDSHIFT_TEST_HOST }}
REDSHIFT_TEST_USER: ${{ secrets.REDSHIFT_TEST_USER }}
REDSHIFT_TEST_PASS: ${{ secrets.REDSHIFT_TEST_PASS }}
REDSHIFT_TEST_DBNAME: ${{ secrets.REDSHIFT_TEST_DBNAME }}
REDSHIFT_TEST_PORT: ${{ secrets.REDSHIFT_TEST_PORT }}
with:
dbt-target: redshift
database-adapter-package: dbt-redshift

bigquery:
needs: postgres
runs-on: ubuntu-latest
steps:
- name: Check out the repository
uses: actions/checkout@v2

- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: 3.9

- name: Set up service key file
id: keyfile
env:
BIGQUERY_TEST_SERVICE_ACCOUNT_JSON: ${{ secrets.BIGQUERY_TEST_SERVICE_ACCOUNT_JSON }}
run: |
mkdir -p $HOME/.dbt
KEYFILE_PATH=$HOME/.dbt/bigquery-service-key.json
echo $BIGQUERY_TEST_SERVICE_ACCOUNT_JSON > $KEYFILE_PATH
echo ::set-output name=path::$KEYFILE_PATH

- uses: ./.github/actions/end-to-end-test
env:
BIGQUERY_SERVICE_KEY_PATH: ${{ steps.keyfile.outputs.path }}
BIGQUERY_TEST_DATABASE: ${{ secrets.BIGQUERY_TEST_DATABASE }}
with:
dbt-target: bigquery
database-adapter-package: dbt-bigquery
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@

target/
dbt_modules/
dbt_packages/
logs/
.DS_Store
103 changes: 102 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,103 @@
# dbt_metrics

This repo calculates metrics.
## About
This dbt package generates queries based on [metrics](https://docs.getdbt.com/docs/building-a-dbt-project/metrics), introduced to dbt Core in v1.0.

## :warning: A note on `ref`s
To enable the dynamic referencing of models necessary for macro queries through the dbt Server, queries generated by this package do not participate in the DAG and `ref`'d nodes will not necessarily be built before they are accessed. Refer to the docs on [forcing dependencies](https://docs.getdbt.com/reference/dbt-jinja-functions/ref#forcing-dependencies) for more details.
joellabes marked this conversation as resolved.
Show resolved Hide resolved

## Usage
Access metrics like any other macro:
```
select *
from {{ metrics.metric(
metric_name='new_customers',
grain='week',
dimensions=['plan', 'country'],
secondary_calcs=[
{
"type": "period_to_date",
"aggregate": "sum",
"period": "year",
"alias": "ytd_sum"
},
{
"type": "period_over_period",
"lag": 1,
"how": "ratio",
},
{
"type": "rolling",
"window": 3,
"aggregate": "average"
}
]
) }}
```

## Secondary calculations
_Documentation tk once the terminology has stabilised_
joellabes marked this conversation as resolved.
Show resolved Hide resolved

## Customisation
Most behaviour in the package can be overridden or customised.

### Calendar
The package comes with a basic calendar table, running between 2010-01-01 and 2029-12-31 inclusive. You can replace it with any custom calendar table which meets the following requirements:
- non-ephemeral (i.e. materialized as a table or view)
- contains a `date_day` column.
- It should additionally contain the following columns: `date_week`, `date_month`, `date_quarter`, `date_year`, or equivalents.
- Additional date columns need to be prefixed with `date_`, e.g. `date_4_5_4_month` for a 4-5-4 retail calendar date set. Dimensions can have any name (see [dimensions on calendar tables](#dimensions-on-calendar-tables)).

To do this, set the value of the `dbt_metrics_calendar_model` variable in your `dbt_project.yml` file:
```
config-version: 2
[...]
vars:
dbt_metrics_calendar_model: ref('my_custom_table')
joellabes marked this conversation as resolved.
Show resolved Hide resolved
```

### Time Grains
The package protects against nonsensical secondary calculations, such as a month-to-date aggregate of data which has been rolled up to the quarter. If you customise your calendar (for example by adding a [4-5-4 retail calendar](https://nrf.com/resources/4-5-4-calendar) month), you will need to override the `get_grain_order()` macro. In that case, you might remove `month` and replace it with `month_4_5_4`. All date columns must be prefixed with `date_` in the table, but this is not necessary in the model config.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just putting a placeholder to come back to this once i've scanned further down this PR :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@drewbanin if you had anything else to say on the topic, now's the time 🕐


### Custom aggregations
To create a custom primary aggregation (as exposed through the `type` config of a metric), create a macro of the form `metric_my_aggregate(expression)`, then override the `aggregate_primary_metric(aggregate, expression)` macro to add it to the dispatch list. The package also protects against nonsensical secondary calculations such as an average of an average; you will need to override the `get_metric_allowlist()` macro to both add your new aggregate to to the existing aggregations' allowlists, and to make an allowlist for your new aggregation:
```
{% do return ({
"average": ['max', 'min'],
"count": ['max', 'min', 'average', 'my_new_aggregate'],
[...]
"my_new_aggregate": ['max', 'min', 'sum', 'average', 'my_new_aggregate']
}) %}
```

To create a custom secondary aggregation (as exposed through the `secondary_calcs` parameter in the `metric` macro), create a macro of the form `metric_secondary_calculations_my_calculation(metric_name, dims, config)`, then override the `metric_secondary_calculations(metric_name, dims, config)` macro to add it to the dispatch list.

### Secondary calculation column aliases
Aliases can be passed into the `calcs` object. If no alias is provided, one will be automatically generated. To modify the existing alias logic, or add support for a custom secondary calculation, override `secondary_calculation_alias(calc_config, grain)`.

## 🧪 Experimental behaviour
:warning: This behaviour is subject to change in future versions of dbt Core and this package.

### Dimensions on calendar tables
You may want to aggregate metrics by a dimension in your custom calendar table, for example `is_weekend`. _In addition to_ the primary `dimensions` list, add the following `meta` properties to your metric, with the model's dimensions first and the calendar's dimensions second:
```
meta:
dimensions:
joellabes marked this conversation as resolved.
Show resolved Hide resolved
- type: model
columns:
- plan
- country
- type: calendar
columns:
- is_weekend
```
You can then access the additional dimensions as normal:
```
select *
from {{ metrics.metric(
metric_name='new_customers',
grain='week',
dimensions=['plan', 'country', 'is_weekend'],
secondary_calcs=[]
) }}
```
8 changes: 4 additions & 4 deletions dbt_project.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,19 +7,19 @@ version: '1.0.0'
config-version: 2

# This setting configures which "profile" dbt uses for this project.
profile: 'default'
profile: 'user'

# These configurations specify where dbt should look for different types of files.
# The `source-paths` config, for example, states that models in this project can be
# found in the "models/" directory. You probably won't need to change these!
model-paths: ["models"]
analysis-paths: ["analysis"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["data"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]

target-path: "target" # directory which will store compiled SQL files
clean-targets: # directories to be removed by `dbt clean`
- "target"
- "dbt_modules"
- "dbt_packages"
4 changes: 4 additions & 0 deletions integration_tests/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@

target/
dbt_packages/
logs/
16 changes: 16 additions & 0 deletions integration_tests/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
Welcome to your new dbt project!

### Using the starter project

Try running the following commands:

- dbt run
- dbt test

### Resources:

- Learn more about dbt [in the docs](https://docs.getdbt.com/docs/introduction)
- Check out [Discourse](https://discourse.getdbt.com/) for commonly asked questions and answers
- Join the [chat](https://community.getdbt.com/) on Slack for live discussions and support
- Find [dbt events](https://events.getdbt.com) near you
- Check out [the blog](https://blog.getdbt.com/) for the latest news on dbt's development and best practices
File renamed without changes.
22 changes: 22 additions & 0 deletions integration_tests/dbt_project.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Name your project! Project names should contain only lowercase characters
# and underscores. A good package name should reflect your organization's
# name or the intended use of these models
name: "dbt_metrics_integration_tests"
version: "1.0.0"
config-version: 2

# This setting configures which "profile" dbt uses for this project.
profile: "dbt_metrics_integration_tests"

model-paths: ["models"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]

target-path: "target"
clean-targets:
- "target"
- "dbt_packages"
- "logs"
Empty file.
Loading