Skip to content

Commit

Permalink
Allow manifest refresh only if full refresh flag is also set (#34)
Browse files Browse the repository at this point in the history
* Rename unified_cluser_by macro

* Only allow manifest refresh if full-refresh flag is set
  • Loading branch information
georgewoodhead authored Feb 15, 2024
1 parent 32636a3 commit db05acf
Show file tree
Hide file tree
Showing 14 changed files with 29 additions and 21 deletions.
4 changes: 4 additions & 0 deletions CHANGELOG
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,15 @@ snowplow-unified 0.2.1 (2024-02-XX)
## Summary
XXX

## 🚨 Breaking Changes 🚨
We have changed the behavior of the `allow_refresh` macro so now if `snowplow__allow_refresh` is set to `true` it will only refresh the manifest models if the `--full-refresh` flag is also set. If you require the old behavior where it would refresh the manifest models on an incremental run when `snowplow__allow_refresh` was set to `true`, please overwrite this macro. See the [Overriding Macros](https://docs.snowplow.io/docs/modeling-your-data/modeling-your-data-with-dbt/dbt-operation/macros-and-keys/#overriding-macros) guide for more details.

## Features
- Add new passthrough aggregations to the views, sessions, and users table, enabled using `snowplow__view/session/user_aggregations`
- Reorder and add some additional context fields to derived tables (non-breaking change)
- Add `snowplow__custom_sql` to allow adding custom sql to the `snowplow_unified_base_events_this_run` and `snowplow_unified_events_this_run` models
- Add macro to define cluster-by for tables to allow users to overwrite this if required
- Add check for `--full-refresh` flag before allowing refresh of manifest models when `snowplow__allow_refresh` is set to `true`.

## Fixes
- Fix a bug where if you ran the package in a period with no data, and had list all events enabled, the package would error rather than complete
Expand Down
4 changes: 2 additions & 2 deletions docs/markdown/snowplow_unified_macros_docs.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,7 @@ The sql to extract the columns from the yauaa context, or these columns as nulls
This macro is used to determine if a full-refresh is allowed (depending on the environment), using the `snowplow__allow_refresh` variable.

#### Returns
`snowplow__allow_refresh` if environment is not `dev`, `none` otherwise.
`snowplow__allow_refresh` if environment is not `dev`, `none` otherwise. Returns `none` if the `--full-refresh` flag is not present.

{% endraw %}
{% enddocs %}
Expand Down Expand Up @@ -243,7 +243,7 @@ The sql needed to make the warehosue specific transformations to retrieve the co
{% endraw %}
{% enddocs %}

{% docs macro_cluster_by_values %}
{% docs macro_get_cluster_by_values %}
{% raw %}

A macro to manage the cluster by fields for various models in the package.
Expand Down
14 changes: 9 additions & 5 deletions macros/allow_refresh.sql
Original file line number Diff line number Diff line change
Expand Up @@ -13,11 +13,15 @@ You may obtain a copy of the Snowplow Personal and Academic License Version 1.0

{% macro default__allow_refresh() %}

{% set allow_refresh = snowplow_utils.get_value_by_target(
dev_value=none,
default_value=var('snowplow__allow_refresh'),
dev_target_name=var('snowplow__dev_target_name')
) %}
{% if flags.FULL_REFRESH == True %}
{% set allow_refresh = snowplow_utils.get_value_by_target(
dev_value=none,
default_value=var('snowplow__allow_refresh'),
dev_target_name=var('snowplow__dev_target_name')
) %}
{% else %}
{% set allow_refresh = none %}
{% endif %}

{{ return(allow_refresh) }}

Expand Down
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
{% macro unified_cluser_by(model) %}
{{ return(adapter.dispatch('unified_cluser_by', 'snowplow_unified')(model)) }}
{% macro get_cluster_by_values(model) %}
{{ return(adapter.dispatch('get_cluster_by_values', 'snowplow_unified')(model)) }}
{% endmacro %}


{% macro default__unified_cluser_by(model) %}
{% macro default__get_cluster_by_values(model) %}
{% if model == 'lifecycle_manifest' %}
{{ return(snowplow_utils.get_value_by_target_type(bigquery_val=["session_identifier"], snowflake_val=["to_date(start_tstamp)"])) }}
{% elif model == 'app_errors' %}
Expand Down
4 changes: 2 additions & 2 deletions macros/macros.yml
Original file line number Diff line number Diff line change
Expand Up @@ -99,8 +99,8 @@ macros:
description: '{{ doc("macro_event_counts_string_query") }}'
- name: conversion_query
description: '{{ doc("macro_conversion_query") }}'
- name: cluster_by_values
description: '{{ doc("macro_cluster_by_values") }}'
- name: get_cluster_by_values
description: '{{ doc("macro_get_cluster_by_values") }}'
arguments:
- name: model
type: string
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ You may obtain a copy of the Snowplow Personal and Academic License Version 1.0
"field": "start_tstamp",
"data_type": "timestamp"
}, databricks_val='start_tstamp_date'),
cluster_by=snowplow_unified.unified_cluser_by('lifecycle_manifest'),
cluster_by=snowplow_unified.get_cluster_by_values('lifecycle_manifest'),
full_refresh=snowplow_unified.allow_refresh(),
tags=["manifest"],
sql_header=snowplow_utils.set_query_tag(var('snowplow__query_tag', 'snowplow_dbt')),
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ You may obtain a copy of the Snowplow Personal and Academic License Version 1.0
"field": "derived_tstamp",
"data_type": "timestamp"
}, databricks_val='derived_tstamp_date'),
cluster_by=snowplow_unified.unified_cluser_by('app_errors'),
cluster_by=snowplow_unified.get_cluster_by_values('app_errors'),
tags=["derived"],
enabled=var("snowplow__enable_app_errors", false),
sql_header=snowplow_utils.set_query_tag(var('snowplow__query_tag', 'snowplow_dbt')),
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ You may obtain a copy of the Snowplow Personal and Academic License Version 1.0
"field": "derived_tstamp",
"data_type": "timestamp"
}, databricks_val = 'derived_tstamp_date'),
cluster_by=snowplow_unified.unified_cluser_by('consent_log'),
cluster_by=snowplow_unified.get_cluster_by_values('consent_log'),
sql_header=snowplow_utils.set_query_tag(var('snowplow__query_tag', 'snowplow_dbt')),
tblproperties={
'delta.autoOptimize.optimizeWrite' : 'true',
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ You may obtain a copy of the Snowplow Personal and Academic License Version 1.0
"field": "cv_tstamp",
"data_type": "timestamp"
}, databricks_val='cv_tstamp_date'),
cluster_by=snowplow_unified.unified_cluser_by('conversions'),
cluster_by=snowplow_unified.get_cluster_by_values('conversions'),
tags=["derived"],
post_hook="{{ snowplow_unified.stitch_user_identifiers(
enabled=var('snowplow__conversion_stitching')
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ You may obtain a copy of the Snowplow Personal and Academic License Version 1.0
"field": "derived_tstamp",
"data_type": "timestamp"
}, databricks_val = 'derived_tstamp_date'),
cluster_by=snowplow_unified.unified_cluser_by('web_vitals'),
cluster_by=snowplow_unified.get_cluster_by_values('web_vitals'),
sql_header=snowplow_utils.set_query_tag(var('snowplow__query_tag', 'snowplow_dbt')),
tblproperties={
'delta.autoOptimize.optimizeWrite' : 'true',
Expand Down
2 changes: 1 addition & 1 deletion models/sessions/snowplow_unified_sessions.sql
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ You may obtain a copy of the Snowplow Personal and Academic License Version 1.0
"field": "start_tstamp",
"data_type": "timestamp"
}, databricks_val='start_tstamp_date'),
cluster_by=snowplow_unified.unified_cluser_by('sessions'),
cluster_by=snowplow_unified.get_cluster_by_values('sessions'),
tags=["derived"],
post_hook="{{ snowplow_unified.stitch_user_identifiers(
enabled=var('snowplow__session_stitching')
Expand Down
2 changes: 1 addition & 1 deletion models/users/scratch/snowplow_unified_users_aggs.sql
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ You may obtain a copy of the Snowplow Personal and Academic License Version 1.0
"field": "start_tstamp",
"data_type": "timestamp"
}),
cluster_by=snowplow_unified.unified_cluser_by('users_aggs'),
cluster_by=snowplow_unified.get_cluster_by_values('users_aggs'),
sort='user_identifier',
dist='user_identifier',
sql_header=snowplow_utils.set_query_tag(var('snowplow__query_tag', 'snowplow_dbt'))
Expand Down
2 changes: 1 addition & 1 deletion models/users/snowplow_unified_users.sql
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ You may obtain a copy of the Snowplow Personal and Academic License Version 1.0
post_hook="{{ snowplow_unified.stitch_user_identifiers(
enabled=var('snowplow__session_stitching')
) }}",
cluster_by=snowplow_unified.unified_cluser_by('users'),
cluster_by=snowplow_unified.get_cluster_by_values('users'),
tags=["derived"],
sql_header=snowplow_utils.set_query_tag(var('snowplow__query_tag', 'snowplow_dbt')),
tblproperties={
Expand Down
2 changes: 1 addition & 1 deletion models/views/snowplow_unified_views.sql
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ You may obtain a copy of the Snowplow Personal and Academic License Version 1.0
"field": "start_tstamp",
"data_type": "timestamp"
}, databricks_val='start_tstamp_date'),
cluster_by=snowplow_unified.unified_cluser_by('views'),
cluster_by=snowplow_unified.get_cluster_by_values('views'),
tags=["derived"],
post_hook="{{ snowplow_unified.stitch_user_identifiers(
enabled=var('snowplow__view_stitching')
Expand Down

0 comments on commit db05acf

Please sign in to comment.