Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use model alias for the CTE identifier generated during ephemeral materialization #10290

Conversation

jeancochrane
Copy link
Contributor

@jeancochrane jeancochrane commented Jun 10, 2024

Resolves #5273. Reimplementation of #5488 to match the new split between adapter and core repos.

See dbt-labs/dbt-adapters#236 for the corresponding adapter PR.

Problem

Unit tests and ephemeral materialization do not currently support referencing models that use special characters like dots in file names, even if these special characters are sanitized using model aliases.

Solution

Prioritize using the model alias (if specified) via model.identifier when creating CTE identifiers for unit tests and ephemeral materialization.

This change requires a corresponding change to dbt-adapters to fully fix the bug, which is implemented in this PR: dbt-labs/dbt-adapters#236 This second PR is necessary because the adapter is used to construct the CTE identifier at the time of CTE reference, while the core code is used to construct the CTE identifier at the time of CTE definition.

Checklist

  • I have read the contributing guide and understand what's expected of me
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • This PR has no interface changes (e.g. macros, cli, logs, json artifacts, config files, adapter interface, etc) or this PR has already received feedback and approval from Product or DX
  • This PR includes type annotations for new and modified functions

@jeancochrane jeancochrane requested a review from a team as a code owner June 10, 2024 20:03
Copy link

cla-bot bot commented Jun 10, 2024

Thanks for your pull request, and welcome to our community! We require contributors to sign our Contributor License Agreement and we don't seem to have your signature on file. Check out this article for more information on why we have a CLA.

In order for us to review and merge your code, please submit the Individual Contributor License Agreement form attached above above. If you have questions about the CLA, or if you believe you've received this message in error, please reach out through a comment on this PR.

CLA has not been signed by users: @jeancochrane

Copy link
Contributor

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide.

@github-actions github-actions bot added the community This PR is from a community member label Jun 10, 2024
Copy link

cla-bot bot commented Jun 10, 2024

Thanks for your pull request, and welcome to our community! We require contributors to sign our Contributor License Agreement and we don't seem to have your signature on file. Check out this article for more information on why we have a CLA.

In order for us to review and merge your code, please submit the Individual Contributor License Agreement form attached above above. If you have questions about the CLA, or if you believe you've received this message in error, please reach out through a comment on this PR.

CLA has not been signed by users: @jeancochrane

@@ -371,7 +371,7 @@ def _recursively_prepend_ctes(

_extend_prepended_ctes(prepended_ctes, new_prepended_ctes)

new_cte_name = self.add_ephemeral_prefix(cte_model.name)
new_cte_name = self.add_ephemeral_prefix(cte_model.alias)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It strikes me as a bit fragile that this line of code and the BaseRelation.add_ephemeral_prefix() implementation in dbt-adapters are both responsible for generating the CTE identifier for ephemeral models; ideally this bugfix could have been made in one PR against one of either dbt-core or dbt-adapters without requiring PRs against both repos. I wonder if perhaps there's a way to transfer responsibility over this call to the adapter? Happy to make that change if you can think of a reasonable approach.

Comment on lines 204 to 205
# Model 2 is ephemeral, so it should not show up in the results list
assert len(results) == 2
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope this is the right place to test support for ephemeral models with dots in their names, but let me know if there's a better option!

@jeancochrane
Copy link
Contributor Author

@MichelleArk @QMalcolm @gshank Sorry for the wide tag radius, I'm not sure who is best equipped to help here! This should be ready for review, but cla-bot seems convinced that I haven't signed the CLA even though I have. (I may have done so incorrectly, but I'm not sure how to confirm 😅 ) Any idea how I can move forward here?

@QMalcolm
Copy link
Contributor

QMalcolm commented Jun 18, 2024

Howdy @jeancochrane!

Sorry for the noise from the CLA bot 🙃 I've looked at our CLA database and the issue is that we filter out any submitted usernames that have the prefix @. Unfortunately, I don't have edit access to the underlying database, just view access. Could you submit your CLA again but without the @ prefix on your username?

@jeancochrane
Copy link
Contributor Author

Done, thanks for the tip @QMalcolm!

@QMalcolm
Copy link
Contributor

Now to summon the bot to get this correctly marked
@cla-bot check

@cla-bot cla-bot bot added the cla:yes label Jun 18, 2024
Copy link

cla-bot bot commented Jun 18, 2024

The cla-bot has been summoned, and re-checked this pull request!

@jeancochrane
Copy link
Contributor Author

Thank you for summoning the bot @QMalcolm! If you could also summon it on dbt-labs/dbt-adapters#236 I would be very grateful 🙏🏻

@jeancochrane
Copy link
Contributor Author

Any idea of the timeline for review on this and dbt-labs/dbt-adapters#236 @QMalcolm? My team would love to get started using unit tests but these two PRs are blocking us. If you think it might take a while to get merged then we can always vendor the dependency, but I'd prefer to keep using the official package if you think there's a chance this might get reviewed soon.

@dbeatty10
Copy link
Contributor

@jeancochrane with the latest version of dbt-labs/dbt-adapters#236, do we need this PR in dbt-core anymore?

@jeancochrane
Copy link
Contributor Author

jeancochrane commented Jul 16, 2024

@jeancochrane with the latest version of dbt-labs/dbt-adapters#236, do we need this PR in dbt-core anymore?

Unfortunately I believe it's still needed in order to fix unit testing [Edited to add: and ephemeral models] in our case @dbeatty10! While dbt-labs/dbt-adapters#236 successfully gets ephemeral materializations working for projects that use aliases to strip out special characters fixes the identifiers used when referencing a CTE, unit tests the identifiers used when defining a CTE appear to use Compiler._recursively_prepend_ctes() to construct the CTE identifiers that represent input data, and _recursively_prepend_ctes() uses model.name when constructing these CTE identifiers:

new_cte_name = self.add_ephemeral_prefix(cte_model.name)

In practice, this means that with the current main branch of dbt-core installed alongside dbt-labs/dbt-adapters#236, a unit test query [Edited to add: or an ephemeral materialization] can end up looking something like this (adapted from an example in our project that uses dots to namespace model filenames) -- note how the definition of the CTE at the top does not use the alias final_model_raw, while the reference at the bottom does:

with __dbt__cte__model.final_model_raw as (
    -- Fixture for model.final_model_raw
    select try_cast(null as varchar) as year, ...
)
select
    ...
from __dbt__cte__final_model_raw AS fmr

So I think that the current PR title "Use model alias for the CTE identifier generated during ephemeral materialization" is actually misleading; this particular PR is in fact fixing the CTE identifier generated during unit testing, not during ephemeral materialization. I'll go ahead and change that!

Taking a step back, however, it strikes me as a potential code smell that the CTE identifier is generated one way during definition (with {identifier} as (...)) and another way during reference (... from {identifier} as fmr). Perhaps that's an intentional design decision, or perhaps it's a sign that a better fix than this one would be to consolidate the code that generates CTE identifiers into one place. I leave that decision up to you, however, since I don't have as much insight into this code as you do! Just let me know what direction you think is appropriate to go and I'd be happy to take a stab at it.

@jeancochrane jeancochrane changed the title Use model alias for the CTE identifier generated during ephemeral materialization Use model identifier for the CTE identifier generated during unit testing Jul 16, 2024
@jeancochrane jeancochrane changed the title Use model identifier for the CTE identifier generated during unit testing Use model alias for the CTE identifier generated during unit testing Jul 16, 2024
@jeancochrane jeancochrane changed the title Use model alias for the CTE identifier generated during unit testing Use model alias for the CTE identifier generated during ephemeral materialization Jul 16, 2024
@jeancochrane
Copy link
Contributor Author

Actually @dbeatty10, my previous comment was incorrect -- the discrepancy between the CTE identifier used during definition vs. reference that I described above does in fact affect ephemeral materialization as well, not just unit tests. I should have tested more thoroughly prior to commenting! I've reverted the title of this PR and will edit my previous comment to reflect this mistake.

@dbeatty10
Copy link
Contributor

Closing and re-opening to try to restart CI tests.

@dbeatty10 dbeatty10 closed this Jul 16, 2024
@dbeatty10 dbeatty10 reopened this Jul 16, 2024
Copy link

codecov bot commented Jul 16, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.36%. Comparing base (88ccc8a) to head (5da895c).
Report is 66 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #10290      +/-   ##
==========================================
+ Coverage   88.73%   89.36%   +0.62%     
==========================================
  Files         180      180              
  Lines       22489    25105    +2616     
==========================================
+ Hits        19955    22434    +2479     
- Misses       2534     2671     +137     
Flag Coverage Δ
integration 86.78% <100.00%> (+0.72%) ⬆️
unit 64.03% <0.00%> (+0.80%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@dbeatty10
Copy link
Contributor

the discrepancy between the CTE identifier used during definition vs. reference that I described above does in fact affect ephemeral materialization as well

Okay, makes sense @jeancochrane !

Could you do a couple things?

  1. Update from alias to identifier as-needed to pass CI
  2. Instead of modifying an existing test in tests/functional/schema/fixtures/sql.py and tests/functional/schema/test_custom_schema.py, could you add a new one?

@jeancochrane
Copy link
Contributor Author

jeancochrane commented Jul 17, 2024

Good notes @dbeatty10, this should be ready for another look! I also put up a corresponding docs PR at dbt-labs/docs.getdbt.com#5795.

matthewshaver added a commit to dbt-labs/docs.getdbt.com that referenced this pull request Aug 6, 2024
## What are you changing in this pull request and why?

This PR updates the docs to reflect the bugfix made in
dbt-labs/dbt-core#10290 and
dbt-labs/dbt-adapters#236 to bring the
identifiers used for CTEs in line with the identifiers used for tables
and views.

⚠️ Note that these changes should **not** be deployed until those two
PRs have been merged and released, since these docs will not accurately
reflect the behavior of the packages until they have been patched. I've
included this prerequisite in the checklist below.

## Checklist
- [x] Review the [Content style
guide](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/content-style-guide.md)
so my content adheres to these guidelines.
- [x] For [docs
versioning](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/single-sourcing-content.md#about-versioning),
review how to [version a whole
page](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/single-sourcing-content.md#adding-a-new-version)
and [version a block of
content](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/single-sourcing-content.md#versioning-blocks-of-content).
- [x] Add a checklist item for anything that needs to happen before this
PR is merged, such as "needs technical review" or "change base branch."
- [ ] Merge and release dbt-labs/dbt-core#10290
and dbt-labs/dbt-adapters#236

---------

Co-authored-by: Matt Shaver <[email protected]>
Copy link
Contributor

@ChenyuLInx ChenyuLInx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for contributing @jeancochrane !
This change looks good, I think we need to merge the one in dbt-adapters, release a patch version, bump requirements here then merge.

@jeancochrane
Copy link
Contributor Author

Sounds great, thanks so much @ChenyuLInx! Let me know if you'd like a hand with any of that, otherwise I'll hang tight.

@jeancochrane
Copy link
Contributor Author

@dbeatty10 @ChenyuLInx Seems like these changes didn't get added to the changelog for v1.8.6, any idea why not? I'm assuming I just didn't format the changelog metadata correctly, I'll try to do that right next time!

@dbeatty10
Copy link
Contributor

@jeancochrane I looked into why these changes weren't included in the changelog for v1.8.6.

The changes across dbt-core and dbt-adapters needed to be coordinated, and we accidentally got a little crosswise. This lead to a duplicate solution to yours in #10550 that got backported to v1.8 in #10553. It was missing a changelog entry, so it didn't show up in the changelog.

@jeancochrane
Copy link
Contributor Author

Makes sense, thanks for all your help getting this over the line @dbeatty10!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla:yes community This PR is from a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CT-665] [Feature] Allow control over the outer CTE identifier generated in ephemeral materialization
4 participants