Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-458] Catalog queries should filter on specific relations in busy schemas #4997

Closed
Tracked by #8316
jtcohen6 opened this issue Apr 5, 2022 · 6 comments
Closed
Tracked by #8316
Assignees
Labels
enhancement New feature or request Impact: Adapters performance Team:Adapters Issues designated for the adapter area of the code

Comments

@jtcohen6
Copy link
Contributor

jtcohen6 commented Apr 5, 2022

Related:

To enable this, _get_catalog_schemas would need to return a specific set of tables, rather than a set of database-schema combos (SchemaSearchMap):

def _get_catalog_schemas(self, manifest: Manifest) -> SchemaSearchMap:
"""Get a mapping of each node's "information_schema" relations to a
set of all schemas expected in that information_schema.
There may be keys that are technically duplicates on the database side,
for example all of '"foo", 'foo', '"FOO"' and 'FOO' could coexist as
databases, and values could overlap as appropriate. All values are
lowercase strings.
"""
info_schema_name_map = SchemaSearchMap()
nodes: Iterator[CompileResultNode] = chain(
[
node
for node in manifest.nodes.values()
if (node.is_relational and not node.is_ephemeral_model)
],
manifest.sources.values(),
)
for node in nodes:
relation = self.Relation.create_from(self.config, node)
info_schema_name_map.add(relation)
# result is a map whose keys are information_schema Relations without
# identifiers that have appropriate database prefixes, and whose values
# are sets of lowercase schema names that are valid members of those
# databases
return info_schema_name_map

Given that we'll need to interpolate table names into actual queries, maybe we could have a rule like: If a database-schema has >100 objects in it, just catalog the entire schema; any less, pass the specific object names as filters.

We see this issue crop up on several adapters, including most recently Redshift. Catalog generation is different from cache generation in two ways:

  • The cache needs an exhaustive list of the objects in a schema in order to reliably say whether the object exists or not. The stakes are lower for cataloging.
  • We catalog source tables, which commonly live in shared schemas, and in ways that dbt developers cannot easily control/change
@github-actions github-actions bot changed the title Catalog queries should filter on specific relations in busy schemas [CT-458] Catalog queries should filter on specific relations in busy schemas Apr 5, 2022
@github-actions
Copy link
Contributor

github-actions bot commented Oct 3, 2022

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

@github-actions github-actions bot added the stale Issues that have gone stale label Oct 3, 2022
@jtcohen6 jtcohen6 added Team:Adapters Issues designated for the adapter area of the code enhancement New feature or request and removed stale Issues that have gone stale Team:Execution labels Oct 6, 2022
@github-actions
Copy link
Contributor

github-actions bot commented Apr 5, 2023

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

@github-actions github-actions bot added the stale Issues that have gone stale label Apr 5, 2023
@github-actions
Copy link
Contributor

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Apr 12, 2023
@jeremyyeo
Copy link
Contributor

Reopening on behalf of enterprise customer.

@graciegoheen
Copy link
Contributor

Closing this out, since we completed the dbt-core implementation!

Adapter implementations are ongoing :)

@aerielsoriano
Copy link

Reopening on behalf of enterprise customer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Impact: Adapters performance Team:Adapters Issues designated for the adapter area of the code
Projects
None yet
Development

No branches or pull requests

7 participants