Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

persist view column comments #866

Closed
6 changes: 6 additions & 0 deletions .changes/unreleased/Features-20230817-130731.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
kind: Features
body: Persist Column level comments when creating views
time: 2023-08-17T13:07:31.6812862Z
custom:
Author: jurasan
Issue: CT-764
20 changes: 20 additions & 0 deletions dbt/include/spark/macros/adapters.sql
Original file line number Diff line number Diff line change
Expand Up @@ -229,9 +229,29 @@
{% endfor %}
{% endmacro %}

{% macro get_column_comment_sql(column_name, column_dict) -%}
{% if column_name in column_dict and column_dict[column_name]["description"] -%}
{% set column_comment_clause = "comment '" ~ column_dict[column_name]["description"] ~ "'" %}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filter with

| replace("'", "\\'")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this what you mean? could TLDR's the filter's purpose for me?

Suggested change
{% set column_comment_clause = "comment '" ~ column_dict[column_name]["description"] ~ "'" %}
{% set column_comment_clause = "comment '" ~ column_dict[column_name]["description"] ~ "'" | replace("'", "\\'")%}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its to escape single-quotes to allow them to be used in the comments.

Copy link

@benc-db benc-db Sep 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need it to operate only on the description, I think here you have it operating either on the whole string or on the final ' (I'm not the best at Jinja). In my local copy I moved the extraction and filtering to separate 'set' statement.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{% set escaped_description = column_dict[column_name]["description"] | replace("'", "\\'") %}
{% set column_comment_clause = "comment '" ~ escaped_description ~ "'" %}

{%- endif -%}
{{ adapter.quote(column_name) }} {{ column_comment_clause }}
mikealfare marked this conversation as resolved.
Show resolved Hide resolved
{% endmacro %}

{% macro get_persist_docs_column_list(model_columns, query_columns) %}
{% for column_name in query_columns %}
{{ get_column_comment_sql(column_name, model_columns) }}
{{- ", " if not loop.last else "" }}
mikealfare marked this conversation as resolved.
Show resolved Hide resolved
{% endfor %}
{% endmacro %}

{% macro spark__create_view_as(relation, sql) -%}
create or replace view {{ relation }}
{% if config.persist_column_docs() -%}
{% set model_columns = model.columns %}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are the model columns coming from?

{% set query_columns = get_columns_in_query(sql) %}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that the SQL of the view executed when creating the view with persist_docs.columns = true?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issuing the get_columns_in_query query against one of the largest table internally at Databricks returned in 148 ms. It is sufficiently performant.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regardless of the time, does it execute the SQL to get the columns?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It executes a limit 0.

(
{{ get_persist_docs_column_list(model_columns, query_columns) }}
)
{% endif %}
{{ comment_clause() }}
{%- set contract_config = config.get('contract') -%}
{%- if contract_config.enforced -%}
Expand Down