[CT-458] Catalog queries should filter on specific relations in busy schemas #4997

jtcohen6 · 2022-04-05T17:19:26Z

Lines 313 to 338 in 7f953a6

    
               def _get_catalog_schemas(self, manifest: Manifest) -> SchemaSearchMap: 
        
                   """Get a mapping of each node's "information_schema" relations to a 
        
                   set of all schemas expected in that information_schema. 
        
                   There may be keys that are technically duplicates on the database side, 
        
                   for example all of '"foo", 'foo', '"FOO"' and 'FOO' could coexist as 
        
                   databases, and values could overlap as appropriate. All values are 
        
                   lowercase strings. 
        
                   """ 
        
                   info_schema_name_map = SchemaSearchMap() 
        
                   nodes: Iterator[CompileResultNode] = chain( 
        
                       [ 
        
                           node 
        
                           for node in manifest.nodes.values() 
        
                           if (node.is_relational and not node.is_ephemeral_model) 
        
                       ], 
        
                       manifest.sources.values(), 
        
                   ) 
        
                   for node in nodes: 
        
                       relation = self.Relation.create_from(self.config, node) 
        
                       info_schema_name_map.add(relation) 
        
                   # result is a map whose keys are information_schema Relations without 
        
                   # identifiers that have appropriate database prefixes, and whose values 
        
                   # are sets of lowercase schema names that are valid members of those 
        
                   # databases 
        
                   return info_schema_name_map

Given that we'll need to interpolate table names into actual queries, maybe we could have a rule like: If a database-schema has >100 objects in it, just catalog the entire schema; any less, pass the specific object names as filters.

We see this issue crop up on several adapters, including most recently Redshift. Catalog generation is different from cache generation in two ways:

The cache needs an exhaustive list of the objects in a schema in order to reliably say whether the object exists or not. The stakes are lower for cataloging.
We catalog source tables, which commonly live in shared schemas, and in ways that dbt developers cannot easily control/change

The text was updated successfully, but these errors were encountered:

github-actions · 2022-10-03T02:13:45Z

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

github-actions · 2023-04-05T01:46:13Z

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

github-actions · 2023-04-12T01:52:37Z

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.

jeremyyeo · 2023-06-15T02:05:19Z

Reopening on behalf of enterprise customer.

graciegoheen · 2023-10-02T14:59:59Z

Closing this out, since we completed the dbt-core implementation!

Adapter implementations are ongoing :)

aerielsoriano · 2024-07-22T23:46:56Z

Reopening on behalf of enterprise customer.

jtcohen6 added performance Team:Execution labels Apr 5, 2022

github-actions bot changed the title ~~Catalog queries should filter on specific relations in busy schemas~~ [CT-458] Catalog queries should filter on specific relations in busy schemas Apr 5, 2022

jtcohen6 mentioned this issue Apr 29, 2022

[CT-202] Workaround for some limitations due to list_relations_without_caching method dbt-labs/dbt-spark#228

Open

jtcohen6 mentioned this issue Aug 23, 2022

[CT-779] Listing tables on big datasets on every compile can be abnormally long (and incomplete?) dbt-labs/dbt-bigquery#205

Closed

github-actions bot added the stale Issues that have gone stale label Oct 3, 2022

jtcohen6 added Team:Adapters Issues designated for the adapter area of the code enhancement New feature or request and removed stale Issues that have gone stale Team:Execution labels Oct 6, 2022

jtcohen6 mentioned this issue Oct 6, 2022

[CT-1303] Respect node selection in catalog queries run by docs generate #6014

Closed

github-actions bot added the stale Issues that have gone stale label Apr 5, 2023

jtcohen6 mentioned this issue Apr 11, 2023

Add option to skip relation cache population #7307

Merged

6 tasks

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Apr 12, 2023

jeremyyeo reopened this Jun 15, 2023

github-actions bot removed the stale Issues that have gone stale label Jun 16, 2023

peterallenwebb mentioned this issue Aug 8, 2023

[Epic] Applied State (part 1) #8316

Closed

jtcohen6 assigned peterallenwebb Aug 16, 2023

martynydbt added the Impact: Adapters label Aug 22, 2023

mikealfare self-assigned this Aug 22, 2023

mikealfare mentioned this issue Aug 29, 2023

ADAP-865: Parameterize where clause, add option to supply list of relations dbt-labs/dbt-snowflake#758

Merged

4 tasks

dbeatty10 mentioned this issue Aug 29, 2023

[CT-3012] [Bug] dbt docs generate takes a lot of time #8452

Closed

2 tasks

peterallenwebb mentioned this issue Aug 30, 2023

[CT-3050] [Applied State] Implement Core Changes for Filtered Catalog Queries #8521

Closed

graciegoheen closed this as completed Oct 2, 2023

aerielsoriano reopened this Jul 22, 2024

aerielsoriano closed this as completed Jul 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CT-458] Catalog queries should filter on specific relations in busy schemas #4997

[CT-458] Catalog queries should filter on specific relations in busy schemas #4997

jtcohen6 commented Apr 5, 2022 •

edited by peterallenwebb

Loading

github-actions bot commented Oct 3, 2022

github-actions bot commented Apr 5, 2023

github-actions bot commented Apr 12, 2023

jeremyyeo commented Jun 15, 2023

graciegoheen commented Oct 2, 2023

aerielsoriano commented Jul 22, 2024

[CT-458] Catalog queries should filter on specific relations in busy schemas #4997

[CT-458] Catalog queries should filter on specific relations in busy schemas #4997

Comments

jtcohen6 commented Apr 5, 2022 • edited by peterallenwebb Loading

github-actions bot commented Oct 3, 2022

github-actions bot commented Apr 5, 2023

github-actions bot commented Apr 12, 2023

jeremyyeo commented Jun 15, 2023

graciegoheen commented Oct 2, 2023

aerielsoriano commented Jul 22, 2024

jtcohen6 commented Apr 5, 2022 •

edited by peterallenwebb

Loading