[CT-385] Limit catalog generation to specific relations #300

jtcohen6 · 2022-03-18T11:27:12Z

Context: #228, #296

Currently, catalog generation uses the same SQL as cache population: show table extended in <databasename> like '*'. There are good reasons to want to change this query:

to be faster during cache population
to be more targeted during catalog generation, and avoid grabbing metadata for irrelevant tables

Proposal

I think the change here will look like:

Decoupling the methods/macros used for cache population + catalog generation (currently both call list_relations_without_caching)
Including information about specific relation names (rather than just schema names) from _get_catalog_schemas
Passing those specific relation names into the new catalog macro, for use in the like rel_1|rel_2|rel_3|... predicate (rather than just like '*')

dbt-spark/dbt/adapters/spark/impl.py

Line 292 in d7f1d38

schema_map = self._get_catalog_schemas(manifest)

dbt-spark/dbt/adapters/spark/impl.py

Line 323 in d7f1d38

for relation in self.list_relations(database, schema):

Alternative

Revisit the change in #160. Before that change, we had to run describe extended for every single table. While generally much slower, that approach had some advantages:

The output was consistent across file types ([CT-279] Delta tables missing column schema in show table extended #295)
It includes column-level comments
If only documenting a single table from a massive schema, it's faster to describe that one table than show many tables in the schema (hopefully solved by the refinement proposed in this issue)

The text was updated successfully, but these errors were encountered:

github-actions · 2022-09-15T02:13:59Z

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

github-actions · 2023-06-07T02:05:00Z

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

VShkaberda · 2023-08-01T13:16:57Z

After reading #228 I propose to reopen the issue and implement like rel_1|rel_2|rel_3|... predicate: In our case show table extended in {{ relation }} like '*' leads to failing dbt docs generate due to the fact that dbt does not have access rights to all objects in the source schema.
As I can see, this demands changes not only in _get_cache_schemas() but also in schema_relation in list_relations().
I think I'll test changes it and will propose a merge thereafter.

github-actions bot changed the title ~~Limit catalog generation to specific relations~~ [CT-385] Limit catalog generation to specific relations Mar 18, 2022

jtcohen6 mentioned this issue Mar 18, 2022

[CT-202] Workaround for some limitations due to list_relations_without_caching method #228

Open

jtcohen6 mentioned this issue Apr 5, 2022

[CT-458] Catalog queries should filter on specific relations in busy schemas dbt-labs/dbt-core#4997

Closed

jtcohen6 mentioned this issue May 3, 2022

[CT-594] Support for Iceberg sort order #343

Closed

github-actions bot added the Stale label Sep 15, 2022

jtcohen6 removed the Stale label Sep 15, 2022

Fleid added the enhancement New feature or request label Dec 9, 2022

github-actions bot added the Stale label Jun 7, 2023

github-actions bot closed this as completed Jun 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CT-385] Limit catalog generation to specific relations #300

[CT-385] Limit catalog generation to specific relations #300

jtcohen6 commented Mar 18, 2022

github-actions bot commented Sep 15, 2022

github-actions bot commented Jun 7, 2023

VShkaberda commented Aug 1, 2023 •

edited

Loading

[CT-385] Limit catalog generation to specific relations #300

[CT-385] Limit catalog generation to specific relations #300

Comments

jtcohen6 commented Mar 18, 2022

Proposal

Alternative

github-actions bot commented Sep 15, 2022

github-actions bot commented Jun 7, 2023

VShkaberda commented Aug 1, 2023 • edited Loading

VShkaberda commented Aug 1, 2023 •

edited

Loading