Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Call verify_database only to selected schemas #67

Closed
2 tasks done
pevidex opened this issue Apr 22, 2024 · 3 comments
Closed
2 tasks done

[Bug] Call verify_database only to selected schemas #67

pevidex opened this issue Apr 22, 2024 · 3 comments
Labels
enhancement New feature or request

Comments

@pevidex
Copy link

pevidex commented Apr 22, 2024

Is this a new bug?

  • I believe this is a new bug
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

dbt fails to run if two models have the same schema and table names even though they target different databases.

Expected Behavior

A successful dbt run as the models are not referencing each other and they target different databases.

Steps To Reproduce

  • Create a dbt project with two independent models having the same alias and schema names. Use a different database for each. First model would be materialized into "db1".some_schema.some_table and the second into "db2".some_schema.some_table.
  • Run dbt for one of the models, i.e.:
    dbt run --select first_model
  • Check it fails with:
Runtime Error
  ERROR: Cross-db references not allowed in postgres (something vs postgres)

Relevant log output

Runtime Error
  ERROR: Cross-db references not allowed in postgres (something vs postgres)


### Environment

```markdown
- OS: macOS 14.4
- Python: 3.8.13
- dbt-postgres: 1.7.11

Additional Context

By looking at this dbt-core pr I can infer multi-db is somehow supported in one single project. However, the caching behavior in the postgres adapter is not letting it as it assumes one db per project. The critical part seems to live here where all node entries in the manifest are getting matched against the expected database, even if the dbt run command doesn't target those entries (models). Using --cache-selected-only doesn't bypass this issue since it will fetch all relations from the manifest here anyway and a few steps further verify_database will be called against these. The only way I made this work was to call --no-populate-cache because it will completely stop this verify_database step from happening. However, that doesn't look ideal.

I'm opening this issue because I've noticed we can change _link_cached_relations to accept an optional cache_schemas parameter and that would be utilized here since those schemas already exist in that context. If verify_database would be called only against the selected models the dbt run would run just fine.

@pevidex pevidex added bug Something isn't working triage labels Apr 22, 2024
@dataders
Copy link
Contributor

hi @pevidex! yeah cross-database queries are disabled in postgres because it isn't support (yet).

So this is really an enhancement rather than a bug. Can you share the postgres service and/or extensions you're using where you expect it to be working?

If there's significant interest/demand for supporting multi-db scenarios, this is something that we could do.

At the very least, we could enable a parameter similar to dbt-redshifts ra3_node (docs), which is an override for the .verify_database method you reference.

@dataders dataders added enhancement New feature or request awaiting_response and removed bug Something isn't working triage labels Apr 25, 2024
@pevidex
Copy link
Author

pevidex commented Apr 26, 2024

So this is really an enhancement rather than a bug.

Hey, @dataders, thanks for the answer.

I've signaled this as a bug because, even with --cache-selected-only enabled, we're calling verify_database against all nodes loaded in the manifest. I assumed this is not the intended behavior but I may be missing some context and I noticed dbt-redshift has the same behaviour. So maybe enhancement fits it better.

Can you share the postgres service and/or extensions you're using where you expect it to be working?

I'm not doing any cross-db reference in my specific use case, just using a simple postgres instance. Due to project constraints, I've to use a single dbt project for two sets of models that target different dbs but don't reference each other. I select the target dbs using the dbt_project.yml +database keyword (docs). I managed to workaround it by using --no-populate-cache. I was not expecting this to work just by disabling this feature so that's why I reported it.

At the very least, we could enable a parameter similar to dbt-redshifts ra3_node (docs), which is an override for the .verify_database method you reference.

That could work but I think that feature would suit better in situations where real cross-db references are happening. In this case, I was thinking more on how to call .verify_database against the selected models only.

@amychen1776
Copy link

I'm so glad you found a workaround! Unfortunately at this time, we are unable to prioritize this for our roadmap so I'm going to close this issue for now.

@amychen1776 amychen1776 closed this as not planned Won't fix, can't repro, duplicate, stale Aug 2, 2024
@amychen1776 amychen1776 removed the triage label Aug 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants