Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-201] Reconcile configs + properties for sources #3662

Closed
jtcohen6 opened this issue Jul 30, 2021 · 16 comments · Fixed by #5008
Closed

[CT-201] Reconcile configs + properties for sources #3662

jtcohen6 opened this issue Jul 30, 2021 · 16 comments · Fixed by #5008
Assignees
Labels
enhancement New feature or request jira stale Issues that have gone stale

Comments

@jtcohen6
Copy link
Contributor

Follow-up to #2401, #3616

Let's convert some source properties into configs that can be set inside config: blocks and within dbt_project.yml:

# models/src_whatever.yml
version: 2
sources:
  - name: my_source
    description: ... # not a config
    config:
      enabled: ...
      quoting: {dict}
      freshness: {dict}
      loader: ...
      loaded_at_field: ...
      database: ... # or 'project' in dbt-bigquery
      schema: ... # or 'dataset' in dbt-bigquery
      identifier: ... # this is like an alias for alias
      meta: {dict}
      tags: ...
    tables:
      - name: my_src_table
        config: # all the same stuff as above. these take precedence for specific tables
        description: ... # not a config
        tests: ... # not a config
        columns: ... # not a config
# dbt_project.yml
sources:
  project_name:
    subdirectory:
      +database: raw
      +loader: fivetran
      another_subdirectory:
        +enabled: false

For backwards compatibility, we should still support setting these as top-level properties:

sources:
  - name: my_source
    loaded_at_field: updated_at

But raise an error if the same config is set in both places (even if it's with the same value):

sources:
  - name: my_source
    loaded_at_field: updated_at
    config:
      loaded_at_field: updated_at

Notes

  • We're thinking that descriptions shouldn't be configs. They're rendered with a different context (docs), and they don't make sense to set hierarchically. The same goes for tests and columns—this would be quite tricky to figure out, since tests actually generate new nodes, rather than adding properties to the existing node.

  • As far as the manifest / graph.sources context, I'm open to suggestions. For backwards compatibility, we'd want to store things like loaded_at_field in both node.config and node-level keys. But I think there's a valid argument for removing this as a top-level key and only storing it in node.config, so long as we communicate clearly that such a move is taking place.

  • For better state:modified comparisons, we'd want to store the un-rendered version of these configs in node.unrendered_config, regardless of whether they're set in dbt_project.yml or models/src_whatever.yml. Original issue for this is Store the unresolved form of source database representations #2744.

@jtcohen6 jtcohen6 added the enhancement New feature or request label Jul 30, 2021
@jtcohen6 jtcohen6 mentioned this issue Jul 30, 2021
4 tasks
@gshank gshank self-assigned this Oct 4, 2021
@gshank gshank added the 1.0.0 Issues related to the 1.0.0 release of dbt label Oct 4, 2021
@jtcohen6 jtcohen6 added Team: Language and removed 1.0.0 Issues related to the 1.0.0 release of dbt labels Nov 11, 2021
@jtcohen6 jtcohen6 added this to the v1.1.0 milestone Jan 27, 2022
@jtcohen6 jtcohen6 added the jira label Feb 9, 2022
@github-actions github-actions bot changed the title Reconcile configs + properties for sources [CT-201] Reconcile configs + properties for sources Feb 9, 2022
@alexrosenfeld10
Copy link
Contributor

This is great! Just commenting here to show my support of this feature.

@leahwicz
Copy link
Contributor

Minimal config for sources so needs to be expanded. No SQL for them. Only defined in YAML

Complications:
Can override sources in other projects.
Where are the configs are? Source level?
Lots of testing will be involved (matrix of scenarios)

@nathaniel-may nathaniel-may assigned nathaniel-may and unassigned gshank Mar 1, 2022
@jtcohen6
Copy link
Contributor Author

jtcohen6 commented Mar 2, 2022

Context:

  • We already support one configuration for sources, the enabled config in dbt_project.yml
  • Sources support override, which allows users to redefine a source, with new properties, that take precedence over the same properties for the source with the same name defined in a package.

In summary, the goal of this issue:

  • Make it possible to define source properties in dbt_project.yml.
  • Make it possible to enable/disable a source table within its .yml file definition

Main exit criteria:

  • Sources and source tables, defined in .yml files, accept a new property, config:
  • That config property accepts enabled: true|false, which has the effect of enabling or disabling the source
  • It's also possible to define existing properties within that config property, i.e. as configs as instead. These configs can be defined at the project, source, and source-table level—and they are inherited/overridden in that order of specificity.
  • All source configs that currently exist as node-level attributes, are copied over to those node-level attributes in manifest.json. This offers backwards compatibility for metadata use cases (such as dbt-docs) that depend on accessing those attributes at the top-level. If users want fine-grained control over config resolution, they should define these as configs.

Totally fair question around how to handle a source overrides. This might be out of scope for the initial effort, but it's worth thinking through, since we want to eventually support property overrides for other resource types, too: #4157.

I think configurations should be resolved first, and then override second, but that's not a strongly held view—as long as we can document consistent behavior. To make this concrete:

# dbt_project.yml
sources:
  my_package:
    my_src_name:
      +database: db_one
sources:
  - name: my_src_name
    overrides: my_package
    database: db_two

I would expect both configurations to be resolved, then the override applied, such that my_src_name ends up pointing to db_two.

@nathaniel-may
Copy link
Contributor

First meeting with @emmyoop:

Next Steps

  • example project with new desired set up (no override, so we can add it later)
  • set up an integration test with new behavior and no override. (check manifest for desired result)
  • set up an integration test with new behavior and an override (check manifest for desired result)
  • follow parser.py::read_files until we identify key points to change

Questions

  • None yet, will likely come up with more after these steps.

@nathaniel-may
Copy link
Contributor

Current state of work:

  • Relevant tests migrated to new test framework Convert source tests #4935
  • Tests for desired behavior written and reviewed initial pass at source config tests w/o overrides #4960
  • Add the configs to source config class (easy)
  • Catch errors if there are duplicate properties and configs (prior art for doing this with metadata)
  • Duplicate the configs so they are present as both configs and properties for backwards compatibility.
  • [unknown] Configs have metadata on them, and we need to figure out what the right metadata fields are for these. (model_config.py)
  • There a lot of logic treating these as properties and not configs. We need to log a tech debt ticket that outlines this logic and that it should be moved to read theses as configs. (some examples of this logic lives in parser/schemas.py)
  • Write a detailed ticket for how to add the correct source override logic as outlined above.

There are still potentially more unknown unknowns here and because of that we're not confident this list is exhaustive yet.

Updated estimation: 13
CC: @leahwicz @jtcohen6

@nathaniel-may
Copy link
Contributor

Current state of work:

  • Relevant tests migrated to new test framework Convert source tests #4935
  • Tests for desired behavior written and reviewed initial pass at source config tests w/o overrides #4960
  • Add the enabled config to source config class
  • Add the rest of the configs to source config class (just uncomment them)
  • Merge table-level configs onto source-level configs in a more robust way to handle more than just enabled. (see this PR comment)
  • Catch errors if there are duplicate properties and configs (prior art for doing this with metadata) <- this isn't done because enabled can't be set as a property.
  • Duplicate the configs so they are present as both configs and properties for backwards compatibility.
  • [unknown] Configs have metadata on them, and we need to figure out what the right metadata fields are for these. (model_config.py)
  • There a lot of logic treating these as properties and not configs. We need to log a tech debt ticket that outlines this logic and that it should be moved to read theses as configs. (some examples of this logic lives in parser/schemas.py)
  • Write a detailed ticket for how to add the correct source override logic as outlined above.

@nathaniel-may
Copy link
Contributor

re-opening since #5008 only completed the first part of this ticket.

@nathaniel-may nathaniel-may reopened this Apr 18, 2022
@jtcohen6 jtcohen6 removed this from the v1.1 milestone Apr 25, 2022
@alexrosenfeld10
Copy link
Contributor

alexrosenfeld10 commented Apr 28, 2022

Seems like this is no longer targeted for 1.1? Just wondering if there is an expected or estimated release for this

@jtcohen6
Copy link
Contributor Author

@alexrosenfeld10 Totally fair question. v1.1 will include the work completed by #5008: supporting config.enabled defined in each source's .yml definition. That's what we were able to get done with the time and appetite we had to work on this. Unfortunately, I don't have an estimate for when we'll be able to prioritize remaining steps, since we need to move onto other initiatives.

@alexrosenfeld10
Copy link
Contributor

Fair enough, I get it - always lots to do only so much time. I am bummed though because that doesn't actually resolve the original issue that I opened related to this

@github-actions
Copy link
Contributor

github-actions bot commented Nov 1, 2022

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

@github-actions github-actions bot added the stale Issues that have gone stale label Nov 1, 2022
@emmyoop emmyoop removed the stale Issues that have gone stale label Nov 1, 2022
@Gunnnn
Copy link

Gunnnn commented Dec 1, 2022

Please make possible to add "config:" section to my sources.yml on source table level to store information!
Configs are reachable programmatically on runtime with {{ config.get('my_config_key1') }} and i need to get watermark_field for my source:
image

@github-actions
Copy link
Contributor

github-actions bot commented Jun 1, 2023

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

@github-actions github-actions bot added the stale Issues that have gone stale label Jun 1, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Jun 9, 2023

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jun 9, 2023
@VDFaller
Copy link

I'd love even if we only got database/schema support at the dbt_project.yml file level for sources.

@dbrtly
Copy link
Contributor

dbrtly commented Jan 28, 2024

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request jira stale Issues that have gone stale
Projects
None yet
9 participants