Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Package Model Overrides #4157

Closed
1 task done
fivetran-joemarkiewicz opened this issue Oct 28, 2021 · 5 comments
Closed
1 task done

[Feature] Package Model Overrides #4157

fivetran-joemarkiewicz opened this issue Oct 28, 2021 · 5 comments
Labels
enhancement New feature or request packages Functionality for interacting with installed packages stale Issues that have gone stale

Comments

@fivetran-joemarkiewicz
Copy link
Contributor

fivetran-joemarkiewicz commented Oct 28, 2021

Is there an existing feature request for this?

  • I have searched the existing issues

Describe the Feature

I have had my eyes on the thread for dbt supporting metrics and it has really got me thinking about how dbt packages can play a role in the support of metrics within dbt.

Currently I have chatted with and have had my eyes on what @owlas has done with Lightdash for supporting metrics using the meta tag in the schema.yml where these metrics are then defined. You can see how this has been done on a forked version of the Fivetran dbt_hubspot package.

While this is awesome, and I only see untapped potential where more metric support is integrated into the shema.yml I can't stop myself from thinking how dbt packages may provide more support to analysts through metrics. I am thinking at the moment that it would be great to take a similar approach linked above where general starter metrics are defined within the package to help analysts leverage what is already defined within the package. However, if a user then decides they want to change or build out more functionality in regards to their metrics then it would be great if they had the possibility to "override" the schema.yml to fit their specific metric use case.

My thought is that a similar functionality to the source override overrides config could be integrated into the schema.yml for models. This way users can take full advantage of the package contents, while also building out their own future metric definitions as they see fit.

For example a user could add the below to their root project to override the base dbt_hubspot marketing.yml instead of needing to fork the repo:

# root project new_schema.yml
...
version: 2

models:

  - name: hubspot__email_sends
    overrides: hubspot ## This could override the yml in the hubspot package
    description: Each record represents a SENT email event.
    columns:
      - name: _fivetran_synced
        description: '{{ doc("_fivetran_synced") }}'

      - name: bcc_emails
        description: The 'cc' field of the email message.

      - name: cc_emails
        description: The 'bcc' field of the email message.

      - name: email_subject
        description: The subject line of the email message.

      - name: event_id
        description: The ID of the event.
        tests:
          - unique
          - not_null
        meta:
          metrics:
            total_unique_emails_sent:
              type: count_distinct
              description: Count the number of sent email events
            total_unique_emails_bounced:
              type: count_distinct
              sql: "IF(${was_bounced}, ${event_id}, NULL)"
              description: Counts the number of emails bounced (of the emails sent)
            total_unique_emails_clicked:
              type: count_distinct
              sql: "IF(${was_clicked}, ${event_id}, NULL)"
              description: Counts the number of emails clicked (of the emails sent)
           

The above obviously is just an example, but this is what I am thinking of to allow users to take advantage of metric definitions within dbt packages.

Describe alternatives you've considered

An alternative I have thought about is using variables instead of a schema.yml override. However, I have found in the past that variables being called within the schema.yml can get messy with the yml parsing and could pose more headaches than advantages.

I am open to hearing your thoughts on this matter though!

Who will this benefit?

All dbt package users who want to leverage and customize metrics defined within dbt packages.

Are you interested in contributing this feature?

Yes I would love to contribute if possible

Anything else?

I am super excited about metrics being supported in dbt and hope I can help contribute by integrating functionality into future dbt packages 😸

@fivetran-joemarkiewicz fivetran-joemarkiewicz added enhancement New feature or request triage labels Oct 28, 2021
@jtcohen6 jtcohen6 added packages Functionality for interacting with installed packages and removed triage labels Nov 17, 2021
@jtcohen6
Copy link
Contributor

@fivetran-joemarkiewicz Agree big-time! I'd really like to see dbt support this.

As it turns out, this is also the critical blocker that we need to overcome if we want to support namespacing for dbt resources (#1269), a.k.a. models with the same name in different packages. It's easy enough to use two-argument ref() to disambiguate references. The hard part is, dbt doesn't let you define properties for the same-named resource twice. I think it's highly likely that we tackle this after releasing v1. It's an important consideration as we work to support larger organizations that split their deployments across multiple projects, with upstream projects installed as "internal packages" in downstream projects.

The way we've implemented overrides is quite complex under the hood, especially for partial parsing. I could also see us taking the opportunity to rationalize the way we patch node properties, including existing source overrides, without changing the user-facing syntax.

As a semantic point only: Instead of overrides, I think we might simply call this property package_name, and normalize the idea that your root project can define properties for whichever resource in whichever package it pleases. Of course, it resolves to the current project name by default.

Last thought for now: The inheritance order could get very tricky very quickly if a package tries to override properties for another's packages resources. We might just want to disallow that for the first cut of this. There's some prior art in the work we've done for macro dispatch (configuring search_order for each namespace), but I'd resist the urge to mix together unrelated complexities, until we find an overwhelming need to solve for this edge case.

@github-actions
Copy link
Contributor

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

@github-actions github-actions bot added the stale Issues that have gone stale label May 18, 2022
@github-actions
Copy link
Contributor

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest; add a comment to notify the maintainers.

@ataft
Copy link

ataft commented Jan 31, 2024

@jtcohen6 Has there been any movement on Model YAML overrides? I like @fivetran-joemarkiewicz 's idea for implementing this. Or, if a downstream project has a model with the same name, just use that model definition instead of the parent project's definition. This feature would really open up the ability to build templates that could be overridden based on individual needs.

@sdebruyn
Copy link
Contributor

sdebruyn commented Nov 7, 2024

These automatic stale issue things on feature requests are just annoying. This has not been solved properly yet AFAIK.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request packages Functionality for interacting with installed packages stale Issues that have gone stale
Projects
None yet
Development

No branches or pull requests

4 participants