Rephrase the two criteria for resolving `ref` calls with the `--defer` flag #6332

dbeatty10 · 2024-10-22T00:31:52Z

Contributions

I have read the contribution docs, and understand what's expected of me.

Link to the page on docs.getdbt.com requiring updates

https://docs.getdbt.com/reference/node-selection/defer#usage

What part(s) of the page would you like to see updated?

Our current documentation reads like this:

When the --defer flag is provided, dbt will resolve ref calls differently depending on two criteria:

Is the referenced node included in the model selection criteria of the current run?

Does the referenced node exist as a database object in the current environment?

If the answer to both is no—a node is not included and it does not exist as a database object in the current environment—references to it will use the other namespace instead, provided by the state manifest.

You can optionally skip the second criterion by passing the --favor-state flag. If passed, dbt will favor using the node defined in your --state namespace, even if the node exists in the current target.

The logic isn't easy to follow, especially when --favor-state is in the mix.

So I propose using one of the following variants instead (which all say the same thing in slightly different ways):

Option 1

With --defer, dbt decides between the target namespace or the state manifest’s namespace. By default, it uses the target namespace. But if the node doesn’t exist in the database or --favor-state is set, and the node is not included in the selected nodes, it uses the state manifest.

Option 2

When --defer is used, dbt resolves ref calls like this:

If the node isn’t included in the selected nodes and doesn’t exist in the database (or --favor-state is set), the state manifest is used.
Otherwise, it defaults to the target namespace.

Option 3

With --defer, dbt chooses the state manifest if:

The node isn’t in the selected nodes and
It doesn’t exist in the database (or --favor-state is used).

Otherwise, it defaults to the target namespace.

Option 4

With --defer, if the node isn’t included and doesn’t exist in the database (or --favor-state is set), dbt uses the state manifest. Otherwise, it defaults to target.

Option 5

By default, dbt uses the target namespace to resolve ref calls.

But with --defer, dbt will use the state manifest instead if:

The node isn’t in the selected nodes and
It doesn’t exist in the database (or --favor-state is used).

Option 6

With --defer, dbt will use the state manifest to resolve ref calls if:

The node isn’t selected and
It doesn’t exist in the database (or --favor-state is used).

Otherwise it uses the default target namespace as normal.

Option 7

For the selected nodes, dbt always uses the default target namespace to resolve ref calls, no matter what.

But for any nodes not included in the selection, dbt will use the state manifest instead if:

--defer is used and
--favor-state is used (or the node doesn’t exist in the database)

Additional information

Key intuition

The value prop of deferral is allowing to safely run a subset of the DAG without the time and cost of having to first build all the upstream parents.

When running a subset of your DAG (like CI), there might be references to database objects that don't exist and won't be built by the command. To avoid database errors, --defer is handy for creating a fall-back for these refs -- you don't need to bother creating a missing database object if you have state that you can defer to. (But if you always want to use the state manifest rather than checking if the database object exists or not, then add --favor-state.)

Why check for existence in the database?

If the object already exists in the database, then there is no need to defer to a separate state manifest -- just use the database object that already exists.

(For use cases where the state manifest is preferred even when the database object exists, use --favor-state.)

Why check if the node is included in the selection or not?

If the node is included in the selection, then it will be built and able to be referenced safely because it will exist in the database.

It's only the nodes not included in the selection that need to be worried about and handled.

References that might not exist in the database because they aren't in the selection

There are two different options for choosing how to handle these references:

favor the target namespace, but only if the database object exists otherwise use the state manifest (default)
favor the state manifest (--favor-state)

This most closely aligns with the phrasing in Option 7.

Questions

There are two questions a user might ask:

Why did dbt use a particular fully qualified database name? (e.g., dbt-core #10836)
How do I get dbt to use a particular fully qualified database name?

Ideally, our documentation would allow someone to answer either of those questions. We might need to phase the key ideas in two different ways to accomplish this. But we might be able to answer both with a single brief explanation as well. 🤷

The text was updated successfully, but these errors were encountered:

dbeatty10 added content Improvements or additions to content improvement Use this when an area of the docs needs improvement as it's currently unclear labels Oct 22, 2024

dbeatty10 mentioned this issue Oct 22, 2024

ref and source functions both return a Relation object #6331

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rephrase the two criteria for resolving `ref` calls with the `--defer` flag #6332

Rephrase the two criteria for resolving `ref` calls with the `--defer` flag #6332

dbeatty10 commented Oct 22, 2024 •

edited

Loading

Rephrase the two criteria for resolving ref calls with the --defer flag #6332

Rephrase the two criteria for resolving ref calls with the --defer flag #6332

Comments

dbeatty10 commented Oct 22, 2024 • edited Loading

Contributions

Link to the page on docs.getdbt.com requiring updates

What part(s) of the page would you like to see updated?

Option 1

Option 2

Option 3

Option 4

Option 5

Option 6

Option 7

Additional information

Key intuition

Why check for existence in the database?

Why check if the node is included in the selection or not?

References that might not exist in the database because they aren't in the selection

Questions

Rephrase the two criteria for resolving `ref` calls with the `--defer` flag #6332

Rephrase the two criteria for resolving `ref` calls with the `--defer` flag #6332

dbeatty10 commented Oct 22, 2024 •

edited

Loading