-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CT-260] [Bug] generate_database_name
does not apply to sources
#4753
Comments
generate_database_name
does not apply to sourcesgenerate_database_name
does not apply to sources
After working on this for a while I finally have come to a workaround:
|
btw, I'm using
We also give each engineer their own, like this: Is there a better practice for doing this? I know you can use environment variables, but I didn't find it to be much cleaner because I also have another db, let's call it My other option with env vars leaves me having to set the db in each yml file / in each model config, which is what led me to the override macro approach - it's DRYer and less error prone (someone could forget to set the db in the yml file / each model, but not with a macro override). |
For my 2c I think this is working as expected. The |
@alexrosenfeld10 probably useful to include the config from your source and model files in the issue since this could just be a config issue. It's not clear from the information given if you are calling the macro in the source config. |
That's quite a generalization to make! It's super common to EL something into one schema and then do transformations on it in another schema. I don't see any reason why the expectation should be that those are in separate databases. Moreover, if they are in separate databases, then that's even more reason for such an override to exist. @nathan-protempo Here's my source config: #4753 (comment). The idea is that I can DRY the database qualifier up by putting it at the source level, not the model levels. You might suggest to set that in the profile, and I thought about that as well. That's not useful to me because my project selects from multiple databases (controlled by
Thanks for helping to think this through. I realize it's possible I might have missed a good option, if so, I'm all ears. If not, I think this is a change that the dbt Labs team should consider. It doesn't have to be the same override even, a separate one just for sources would work well enough. |
@alexrosenfeld10 - Yes in hindsight that is just me generalising my experience and of course there is no reason you couldn't have source tables in a different schema of the same database. I tried to do some local testing with your custom database macro and it generated errors as the initial I note that your macro code uses the "target" function - there's an example on the doc page for that function for using target.name to override the source database, I wonder if that might help? |
@nathan-protempo I think the environment-specific problem could potentially be the direction of the Anyway, I've changed my approach a little bit. It's still the same core ideas though. Here's what I do now (it works pretty well): {% macro generate_schema_name(custom_schema_name=none, node=none) -%}
{%- if custom_schema_name is not none and node.resource_type in ['seed', 'model'] -%}
{%- set error_message -%}
{{ node.resource_type | capitalize }} '{{ node.unique_id }}' has a schema configured. This is not allowed because the it will be derived from the directory structure.
{%- endset -%}
{{ exceptions.raise_compiler_error(error_message) }}
{%- endif -%}
{#
Because we have many models with the same name in different schemas, we have to follow the
schema_name__model_name.sql naming convention.
dbt enforces that model filenames are unique, and does not support schema names in the ref() function.
See https://discourse.getdbt.com/t/extracting-schema-and-model-names-from-the-filename/575 and
https://discourse.getdbt.com/t/preventing-name-conflicts/221 for more information.
#}
{{ node.name.split('__')[0] | trim }}
{%- endmacro %}
{% macro generate_alias_name(custom_alias_name=none, node=none) -%}
{%- if custom_alias_name is not none and node.resource_type in ['seed', 'model'] -%}
{%- set error_message -%}
{{ node.resource_type | capitalize }} '{{ node.unique_id }}' has an alias configured. This is not allowed because the it will be derived from the file name.
{%- endset -%}
{{ exceptions.raise_compiler_error(error_message) }}
{%- endif -%}
{{ node.name.split('__')[1] }}
{%- endmacro %}
{% macro generate_database_name(custom_database_name=none, node=none) -%}
{%- if custom_database_name is not none and node.resource_type in ['seed', 'model'] -%}
{%- set error_message -%}
{{ node.resource_type | capitalize }} '{{ node.unique_id }}' has a database configured. This is not allowed because the it will be derived from the directory structure and target name.
{%- endset -%}
{{ exceptions.raise_compiler_error(error_message) }}
{%- endif -%}
{% set path = node.path %}
{% set database = path.split('/')[0] %}
{{ database }}{{ '_' + target.name if not target.name.startswith('prod') }}
{%- endmacro %} Note that this requires your databases to follow the convention of With this approach, you can very clearly see why I want the overrides to apply to sources as well. Instead, I have to do this in my sources: sources:
- name: blah
database: "reporting{{ '_' + target.name if not target.name.startswith('prod') }}" # for sources in other_org, I have to change it to "other_org{{ ... same exact conditional here ... }}"
schema: schema_name
tables: [ ... ] |
That example is cool, I didn't know you could multi-line and use full |
Jumping in with three quick thoughts, since there's already some great conversation happening here:
# not possible today
sources:
- name: blah
database: "{{ generate_database_name('reporting') }}"
schema: schema_name
tables: [ ... ]
# dbt_project.yml
sources:
my_project:
all_my_sources:
# logic is still duplicated, but compact - not scattered across different files
reporting_sources:
+database: "reporting{{ '_' + target.name if not target.name.startswith('prod') }}"
other_org_sources:
+database: "other_org{{ '_' + target.name if not target.name.startswith('prod') }}" |
Good thoughts, thanks @jtcohen6. Of those three I am most a fan of the third option. It's not uncommon for people to use the same physical instance of a cloud database with separate logical ( My main question for you is, why isn't there a 4th option, a Similarly, you did say that dbt "doesn't support |
Sorry, yes, no way to do this today. I meant option 2 to be an illustration of how you could someday reenable this, for a particular source, in the same way users have done with snapshots. So option 4 might be, an optional configuration that "enrolls" sources into For the short term, I still like the third option most. We're planning to start work on that soon, with the aim of including it in v1.1. After that capability is unlocked, we can see if there remains a significant unmet need for more dynamic, macro-based configuration. |
Cool. The thing I like about having a way to enroll sources into I'm a fan of 3, but per the above I still think a more global, configurable option would be great |
FWIW, I think anyone using something like this: https://discourse.getdbt.com/t/extracting-schema-and-model-names-from-the-filename/575 would be impacted by the same / a similar problem(s), at some level somewhere in their project. |
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days. |
Commenting to keep this open |
I've just come across this behaviour too and was struggling to understand why sources weren't changing their database value even though I configured the macro to do so. In my case I want any source that uses a
For now the below Jinja logic works, it's just not very user friendly for developers when it comes to adding a new source. Would love for this to be a flag or setting in a future version.
|
I have the same workaround in place. Mine's a tiny bit different, posting in case you prefer: version: 2
sources:
- name: foo
description: The foo data sources
database: "reporting{{ '_' + target.database if not target.database.startswith('prod') }}"
schema: foo
tables:
- name: a_table |
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days. |
Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers. |
👋 We do need to programmatically define sources for various security and compliance reasons. Is this a feature we can request or is there an update on this? |
Is there an existing issue for this?
Current Behavior
When using sources, the
generate_database_name
override doesn't apply to sources.Expected Behavior
Sources would take the overridden database name, just like models.
This is either a very big oversight, or for some reason it's intentional. It's quite a blocker for me though. I've tried to get around it by adding some jinja in my
sources
yaml block, but no luck so far. I'm even trying to call macros from there, which I don't think is possible.I suppose leaving
database
explicitly unset might be an OK workaround, as it would take the database of the session which most of the time would be correct.Steps To Reproduce
generate_database_name
override macro. Here's an example:dbt compile
Relevant log output
No response
Environment
The text was updated successfully, but these errors were encountered: