Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add table expiry #230

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions integration_tests/models/plugins/bigquery/bigquery_external.yml
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,16 @@ sources:
columns: *cols-of-the-people
tests: *equal-to-the-people

- name: people_csv_schema_auto_detect_expiration
external:
hours_to_expiration: 24
location: 'gs://dbt-external-tables-testing/csv/*'
options:
format: csv
skip_leading_rows: 1
hive_partition_uri_prefix: 'gs://dbt-external-tables-testing/csv'
tests: *equal-to-the-people

# - name: people_json_unpartitioned
# external: &json-people
# location: 'gs://dbt-external-tables-testing/json/*'
Expand Down
8 changes: 6 additions & 2 deletions macros/plugins/bigquery/create_external_table.sql
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,13 @@
{%- set external = source_node.external -%}
{%- set partitions = external.partitions -%}
{%- set options = external.options -%}
{%- set non_string_options = ['max_staleness'] %}
{%- set hours_to_expiration = external.get('hours_to_expiration') -%}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i believe this is all that is needed

Suggested change
{%- set hours_to_expiration = external.get('hours_to_expiration') -%}
{%- set hours_to_expiration = external.hours_to_expiration -%}

{%- set non_string_options = ['max_staleness', 'hours_to_expiration'] %}

{% if options is mapping and options.get('connection_name', none) %}
{% set connection_name = options.pop('connection_name') %}
{% endif %}

{%- set uris = [] -%}
{%- if options is mapping and options.get('uris', none) -%}
{%- set uris = external.options.get('uris') -%}
Expand Down Expand Up @@ -46,5 +47,8 @@
{%- endif -%}
{%- endfor -%}
{%- endif -%}
{%- if hours_to_expiration -%}
, expiration_timestamp = TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL {{hours_to_expiration}} hour)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we integrate this into the already existing enumeration of options.items() (L39-L47)? perhaps something like the below?

this solution isn't very clean either though, imho. I thought to use the fancy jinja loop.first to exclude the leading comma in the first example, but expiration_timestamp will never be first in the list of options as uris are a required option. 😵 . perhaps we could refactor so that the currently unused external field is where uris are given??

            {%- if options is mapping -%}
            {%- for key, value in options.items() if key != 'uris' %}
                {%- if value is string -%}
                    , {{key}} = '{{value}}'
                {%- else -%}
                    {%- if key == "hours_to_expiration" -%}
                        , expiration_timestamp = TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL {{hours_to_expiration}} hour)
                    {%- else -%}
                        , {{key}} = {{value}}
                {%- endif -%}
            {%- endfor -%}
            {%- endif -%}

{%- endif -%}
)
{% endmacro %}
9 changes: 9 additions & 0 deletions sample_sources/bigquery.yml
Original file line number Diff line number Diff line change
Expand Up @@ -58,3 +58,12 @@ sources:
- 'gs://bucket_a/path/*'
- 'gs://bucket_b/path/*'
- 'gs://bucket_c/more/specific/path/file.csv'

# you can use BigQuery table expiry if you want to automatically delete tables after a period of time
- name: table_expiry
external:
location: 'gs://bucket/path/*'
hours_to_expiration: 24
options:
format: csv
skip_leading_rows: 1
Loading