Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to use the relations cache in run-operations #2529

Closed
mdesmet opened this issue Jun 10, 2020 · 6 comments
Closed

Ability to use the relations cache in run-operations #2529

mdesmet opened this issue Jun 10, 2020 · 6 comments
Labels
adapter_caching Issues related to the adapter's relation cache enhancement New feature or request stale Issues that have gone stale

Comments

@mdesmet
Copy link
Contributor

mdesmet commented Jun 10, 2020

Describe the feature

The relations cache enables to quickly check if objects exist in the configured, however it's not available when doing dbt run-operation. This means that for every relations usage a schema query is launched towards the database, which is not very efficient.

image

Describe alternatives you've considered

We can use db run instead, however we would like to use run-operation to enable all things related to CI/CD and this is not necessary during development.

Additional context

Any/All dbs

Who will this benefit?

Users want simpler automation, build and CI scripts

@mdesmet mdesmet added enhancement New feature or request triage labels Jun 10, 2020
@drewbanin drewbanin removed the triage label Jun 12, 2020
@drewbanin
Copy link
Contributor

@beckjake do you think there's a good/sensible way to expose methods for populating the cache for a given schema in user-space?

I could have sworn we had an issue for this, but I can't find it anywhere. The other approach I've been thinking about is lazily-loading the adapter cache. dbt would not build the cache up-front. Instead, dbt would fetch results from the db when get_relation() is called, caching the results as it goes. I think the tricky part there would be blocking concurrent calls to get_relation() so that we don't run the same queries many times over if multiple models are running in parallel.

I can open a new issue for this idea if you think it's tractable. I think I'd prefer dbt managing the cache via calls to get_relation and it's ilk rather than exposing cache-manipulation methods directly in the context.

@beckjake
Copy link
Contributor

do you think there's a good/sensible way to expose methods for populating the cache for a given schema in user-space?

I think we should just expose a context function that calls adapter.set_relations_cache(manifest). Optionally we could no-op it outside run-operation.

It would be nice to lazy-load the full schema in get_relation where the schema isn't cached, but it's a bit of a project. In addition to the cache-locking issue, I think there's some nuance about marking schemas as being "partially loaded" in the case where a relation references outside its schema.

@github-actions
Copy link
Contributor

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

@github-actions github-actions bot added the stale Issues that have gone stale label Nov 10, 2022
@github-actions
Copy link
Contributor

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest; add a comment to notify the maintainers.

@krishbox
Copy link

👋 I would like to revitalise this issue, very keen to help as well. The use case we have is to perform operations within a materialisation and then have adapter.get_relation or load_relation be aware of this new relation in the cache even before the materialisation returns the relations at the end.

@jtcohen6
Copy link
Contributor

I ended up re-proposing the idea of "lazily loading" the adapter cache in #6240, and we ended up implementing it as part of #7307. Now if there's a cache miss during get_relation, we will populate the cache for that schema.

We haven't done the other piece of Drew's proposal from a few years ago, which is totally avoiding the upfront cache population by default, unless users opt into it with --no-populate-cache.

There may be some dragons / race conditions in this yet, but we haven't seen problems come up in practice since the v1.5 release.

In particular, this is really only an issue on Postgres-like databases, where views are bound to tables that might be cascade-dropped:

in the case where a relation references outside its schema

@jtcohen6 jtcohen6 added adapter_caching Issues related to the adapter's relation cache and removed triage labels Jun 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
adapter_caching Issues related to the adapter's relation cache enhancement New feature or request stale Issues that have gone stale
Projects
None yet
Development

No branches or pull requests

5 participants