The future of vars #4938

jtcohen6 · 2022-03-23T15:31:17Z

Thanks to @gshank @joellabes @ChenyuLInx for a great discussion yesterday about all things vars! I'll do my best to summarize below.

Prompted by requests like:

vars should be definable outside dbt_project.yml (Defining vars, folder-level configs outside dbt_project.yml #2955)
vars should be available for resource configs defined in dbt_project.yml ([CT-369] [Feature] Use variables declared in dbt_project.yml file in the file #4873)
vars (defined in dbt_project.yml) can be set to Jinja-rendered expressions (target-based Jinja rendering of vars in dbt_project.yml #3105, Variables in project.yml that use jinja templating are not properly rendering when referenced in packages #3658)
sources should be able to use macros to determine env-specific (and source-specific) namespacing ([CT-260] [Bug] generate_database_name does not apply to sources #4753)

When to recommend...?

vars
env vars
target-based logic
"pure" macros that return static values (available to set configs + template runtime code in .sql files, but not in .yml files)
generate_x_name macros (database/schema/alias of "executable" nodes, but not for sources)

Do we have any strong recommendations around when to use vars, and when to use env vars? I don't think so, at least not well-documented. Here are my own strong-ish opinions:

Env vars

Secrets — including credentials (if you need to check profiles.yml into version control), or git-related info in packages.yml
Any time of environment-based configuration that reliably differs between dev/prod/CI, e.g. a different source database per environment
"Run modes" that are going to differ in a predictable way

Vars

Configuring packages, using levers that the package maintainer has documented and made available, through well-understood means that you never
"Run modes," with sensible defaults that hold >95% of the time, that it may make occasional sense to "override" with arbitrary / unpredictable values (via CLI --vars)

One small note: dbt is smarter about partial parsing today with env vars (an actual partial re-parse) than with vars (will trigger a full re-parse, whether the change is in --vars or in dbt_project.yml). This is a temporary technical limitation only, and we plan to reach eventual parity.

Other types of config

target-based Jinja conditional expressions are nice when you want to surface the logic very close to where it's being used: {{ 3 if target.name == 'dev' else 100 }} days of data for a backfill, for instance.
generate_x_name macros have a leg up on the others, in that they have access to everything: all vars, env vars, target info, and the full node specification of the model/seed/snapshot/test. So if you have highly specific rules about what goes where (if X config, Y tag, Z materialization type, ...), the macro is the right place to encode that "rule." This logic just can't be (re)used anywhere else

Forward-looking vibes

While some of these changes are not mutually exclusive, I sensed two

Vibe 1: Vars are exactly like env vars.

You can (already) define both on the CLI:

$ dbt run --vars 'MY_VAR: True'
$ MY_VAR=True dbt run

They should available in all the same places and rendering contexts. That includes profiles.yml + packages.yml (places where only --vars are available today).

Vars can be defined in a .yml or .env file, in addition to dbt_project.yml. They're parsed first, regardless of where they're defined—even before profile information.

Vibe 2: Vars are a way to DRY up your project

Vars save and render static Jinja expressions, which can take env vars + target values as their inputs.

Following the example in #4753, imagine you've got:

sources:
  - name: source_one
    database: |
      {%- if 'other_country' in target.database -%} prod__other_country__source_one
      {%- elif target.name == 'prod' -%} prod__main_country__source_one
      {%- else -%} {{ target.database }}__source_one
      {%- endif -%}
    ...
  - name: source_two
    database: |
      {%- if 'other_country' in target.database -%} prod__other_country__source_two
      {%- elif target.name == 'prod' -%} prod__main_country__source_two
      {%- else -%} {{ target.database }}__source_two
      {%- endif -%}

We've proposed solving that with better source configurations (#3662), crucially the ability to configure multiple sources at once. But look again: Those configs are just slightly different.

The immediate instinct is to replace this with a macro, until you realize that macros aren't supported in .yml files today—at this point, primarily for reasons of rendering speed / parsing performance. And we can't tell a "pure" macro (static inputs + outputs) from a "dirty" one.

Instead, what if you could "save" a reusable Jinja expression / static snippet, in the form of a rendered var?

# dbt_project.yml
vars:
  my_project:
    source_database_prefix: |
      {%- if 'other_country' in target.database -%} prod__other_country__source_two
      {%- elif target.name == 'prod' -%} prod__main_country__source_two
      {%- else -%} {{ target.database }}__source_two
      {%- endif -%}

sources:
  - name: source_one
    database: {{ var('source_database_prefix') }}__source_one
    ...
  - name: source_two
    database: {{ var('source_database_prefix') }}__source_two

This feels much more intuitive for vars that are set in dbt_project.yml, versus the ones passed in for --vars. In fact, it feels like a bit of a disjuncture between them. Do we need a third kind of var...?

The text was updated successfully, but these errors were encountered:

joellabes · 2022-03-24T04:12:31Z

Vibes-wise, I'm most attached to the problems solved by Vibe 2, but I disagree with the approach to get there:

Do we need a third kind of var...?

What about a second type of macro? We could take inspiration from Redshift UDFs (and likely other things) which can attach hints (VOLATILE | STABLE | IMMUTABLE) to indicate whether the planner can recycle the same output for a given input.

It would be cool to be able to do something like this:

{% macro immutable generate_database_prefix() %}
      {%- if 'other_country' in target.database -%} prod__other_country
      {%- elif target.name == 'prod' -%} prod__main_country
      {%- else -%} {{ target.database }}
      {%- endif -%}
{% endmacro %}

sources:
  - name: source_one
    database: {{ generate_database_prefix() | trim }}__source_one
    ...
  - name: source_two
    database: {{ generate_database_prefix() | trim }}__source_two

and only allow this subset of macros in yaml files, where they would have access to target and env_vars/variables defined by --vars. Any macro claiming to be immutable that tried to do anything to the database would be rejected.

Functionally, this would be identical to the variables envisaged above. I think it would aid in a cleaner mental model though - conceptually I think of vars as constant, exported values, not as snippets of code in their own right.

I could swing either way on whether we allow things like {{ run_started_at }} - at that point you're looking at STABLE as opposed to IMMUTABLE in redshiftese - I'm not averse to it but I can't think of a use case.

rumbin · 2022-07-21T07:05:59Z

Regarding Vibe 2 I suggest the following extension on vars:

Folder-level variables

Imagine that we could define variables in the dbt_project.yml in models → <my_project> → <model_folder_name>.
This would be a very useful way to configure, e.g., mart schemas of which many technically similar ones exist by just defining mart/schema-specific variables which are then picked up by the models of this folder/schema.

Example:

vars:
  # global vars:
  test_lake_load_ts_predicate: "(_lake_load_ts > (current_timestamp - interval '3 days'))"
  test_mod_ts_predicate: "(_mod_ts > (current_timestamp - interval '3 days'))"

  # package-specific vars:
  ###### ↓ current implementation ↓ ######
  dbt_artifacts:
    dbt_artifacts_database: "lp_lake"
    dbt_artifacts_schema: "raw_dbt_artifacts"
    dbt_artifacts_table: "lp_prodb"
  ###### ↑ current implementation ↑ ######

models:

  # general model settings:
  +persist_docs:
    relation: true
    columns: true
  +post-hook:
    - "{{ dbt_snow_mask.apply_masking_policy() }}"

  # packages:
  dbt_artifacts:
    # NOTE: To me it feels like a good idea to put package-specific variables
    # rather here than in the global vars block above!
    ###### ↓ proposed change ↓ ######
    vars:
        dbt_artifacts_database: "lp_lake"
        dbt_artifacts_schema: "raw_dbt_artifacts"
        dbt_artifacts_table: "lp_prodb"
    ###### ↑ proposed change ↑ ######
    +schema: core_dbt_artifacts
    staging:
      +schema: orig_dbt_artifacts

  # this project:
  this_dbt_project:

    mart_whatever:
      +schema: mart_whatever
      description:  A generic data mart

    ###### ↓ proposed change ↓ ######
    vars:
        source_prefix: "whatever"
        database_name: "whatever_2"
    ###### ↑ proposed change ↑ ######

Side note: As you can see, I also propose to put package-specific vars rather (or alternatively/additionally) into the models section of the specific package.

What do you all think about this?

ValdarT · 2022-10-13T11:57:22Z

Would be nice if one could use secret env vars also in models (i.e., outside of profiles.yml or packages.yml).

My use case is creating a temp view over a JDBC connection in a pre-hook to access some data and I would need a way to securely pass the credentials.

jtcohen6 · 2022-10-28T13:38:58Z

I'm going to turn this into a GitHub discussion, since we're using those more frequently — for Bigger Improvements, in addition to Very Big Ideas.

We don't have immediate next steps, or concrete plans of when to schedule this work, but it's feedback (and confusion) that we get pretty consistently. By the time of dbt v2, I'd like to see us have better, more consistent, more intuitive answers for conditional configuration.

jtcohen6 added vars Team:Language labels Mar 23, 2022

jtcohen6 mentioned this issue Apr 21, 2022

[CT-328] [Feature] Package Enabled Flag #4837

Closed

1 task

jtcohen6 mentioned this issue Jun 1, 2022

[CT-699] [Bug] Source freshness enabled config doesn't read variables #5311

Closed

1 task

jtcohen6 mentioned this issue Jun 14, 2022

[CT-746] [Bug] Variables not getting into packages #5368

Open

1 task

rumbin mentioned this issue Aug 2, 2022

[CT-278] [Feature] Allow meta property to be accessible at compile time #4776

Closed

1 task

nolan-jardine mentioned this issue Oct 20, 2022

[CT-1377] [Feature] Add functionality to use separate files for project variables #6110

Closed

3 tasks

dbt-labs locked and limited conversation to collaborators Oct 28, 2022

jtcohen6 converted this issue into discussion #6170 Oct 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

The future of vars #4938

The future of vars #4938

jtcohen6 commented Mar 23, 2022

joellabes commented Mar 24, 2022 •

edited

Loading

rumbin commented Jul 21, 2022 •

edited

Loading

ValdarT commented Oct 13, 2022

jtcohen6 commented Oct 28, 2022

This issue was moved to a discussion.

This issue was moved to a discussion.

The future of vars #4938

The future of vars #4938

Comments

jtcohen6 commented Mar 23, 2022

When to recommend...?

Env vars

Vars

Other types of config

Forward-looking vibes

Vibe 1: Vars are exactly like env vars.

Vibe 2: Vars are a way to DRY up your project

joellabes commented Mar 24, 2022 • edited Loading

rumbin commented Jul 21, 2022 • edited Loading

ValdarT commented Oct 13, 2022

jtcohen6 commented Oct 28, 2022

This issue was moved to a discussion.

joellabes commented Mar 24, 2022 •

edited

Loading

rumbin commented Jul 21, 2022 •

edited

Loading