-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat!: Adjust physical_properties evaluation and add macro to resolve physical table names #3772
base: main
Are you sure you want to change the base?
Conversation
@@ -1000,6 +1000,46 @@ Note: This is DuckDB SQL and other dialects will be transpiled accordingly. | |||
- Recursive CTEs (common table expressions) will be used for `Redshift / MySQL / MSSQL`. | |||
- For `MSSQL` in particular, there's a recursion limit of approximately 100. If this becomes a problem, you can add an `OPTION (MAXRECURSION 0)` clause after the date spine macro logic to remove the limit. This applies for long date ranges. | |||
|
|||
### @PHYSICAL_LOCATION | |||
|
|||
`@PHYSICAL_LOCATION` is a helper macro intended to be used in situations where you need to gain acccess to the *components* of the physical table name. It's intended for use in the following situations: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As always, naming things is one of the hardest problems. I considered:
@physical_table
- the object being created is not always a table, such askind: view
@this_snapshot
- uses the terminologysnapshot
which I can see in some places is being eschewed for "model version"@this_model_version
- kind of long winded
26085a9
to
52624f7
Compare
52624f7
to
cbc7fbf
Compare
**common_render_kwargs, | ||
) | ||
|
||
rendered_physical_properties = snapshot.model.render_physical_properties( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps this could be moved inside apply
, so it doesn't occur when limit is None
where the properties aren't needed. Though this doesn't seem a particularly expensive render, so it's likely not that significant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, the reason I had it outside apply
was so that we didnt occur the hit for every index
being applied since we only need to render it once.
I see what you're saying though. Perhaps it could be rendered right after the limit check and passed to apply
instead? The problem is, apply
is called in a bunch of places so all the call sites would need to be updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes you're right, I hadn't considered that. Maybe then something like this:
rendered_physical_properties = snapshot.model.render_physical_properties(
**render_statements_kwargs
) if limit is None else None
But arguably I don't think it is that much of a slowdown, so the way you have it now should be fine
sqlmesh/core/macros.py
Outdated
mode: exp.Literal = exp.Literal.string("literal"), | ||
) -> t.Union[exp.Literal, exp.Table]: | ||
""" | ||
Generates a either a String literal or an exp.Table representing a physical table location, based on rendering the provided template String literal. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small typo: Generates either a String literal...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
… physical_properties
cbc7fbf
to
0f03c7b
Compare
@@ -630,6 +630,26 @@ def render_merge_filter( | |||
raise SQLMeshError(f"Expected one expression but got {len(rendered_exprs)}") | |||
return rendered_exprs[0].transform(d.replace_merge_table_aliases) | |||
|
|||
def render_physical_properties(self, **render_kwargs: t.Any) -> t.Dict[str, exp.Expression]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we want the same behavior for virtual properties?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recording the result of our internal discussion here: maybe, but I couldnt think of a use-case and was trying to keep this PR small.
Since the load time rendering context is a subset of the runtime rendering context, if we were to later find a use-case to render virtual_properties at runtime then we could make the switch then without breaking backwards compatibility
sqlmesh/core/macros.py
Outdated
Example: | ||
>>> from sqlglot import parse_one, exp | ||
>>> from sqlmesh.core.macros import MacroEvaluator, RuntimeStage | ||
>>> sql = "@physical_location('s3://data-bucket/prod/@{catalog_name}/@{schema_name}/@{table_name}')" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tobymao does this API look good to you?
sqlmesh/core/macros.py
Outdated
@@ -1183,6 +1183,59 @@ def date_spine( | |||
return exp.select(alias_name).from_(exploded) | |||
|
|||
|
|||
@macro() | |||
def physical_location( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO, the name is somewhat confusing and doesn't accurately reflect what it does. We already have resolve_table
can this be resolve_template
? Or perhaps we can extend resolve_table
with an optional template instead? If it's provided then we follow the template, if not we return fqn.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I don't like the current name either.
The problem with extending resolve_table
is that it's not a macro per se - it's a lambda that gets passed into the template context. And it's also available directly in Python from Python models and is defined to return a str
instead of an AST node so I think changing it will cause breakage.
I like @resolve_template
- it doesnt limit the terminiology to tables only (since it's really "physical layer object" which could be a view for kind: VIEW
models) and it indicates that templating is occurring
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok this is fine
sqlmesh/core/macros.py
Outdated
>>> evaluator.transform(parse_one(sql)).sql() | ||
"'s3://data-bucket/prod/test_catalog/sqlmesh__test/test__test_model__2517971505'" | ||
""" | ||
if evaluator.runtime_stage != "loading": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, there's also the "testing" runtime stage. How would these macro calls be handled in the context of unit tests? Can we actually unit test models that use this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call, i'll write a test to verify the behaviour during unit tests.
Theoretically it should be ignored entirely unless @this_model
is available so maybe I adjust the condition to just check for the presence of that and ignore the runtime stage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've adjusted the logic to be dependent on the presence of @this_model
instead and written a test to show it works from unit tests
sqlmesh/core/macros.py
Outdated
else: | ||
return exp.Literal.string(result) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Nit]
else: | |
return exp.Literal.string(result) | |
return exp.Literal.string(result) |
sqlmesh/core/macros.py
Outdated
) | ||
|
||
if mode.this.lower() == "table": | ||
return exp.to_table(result) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we pass the dialect in this to_table
call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Man this trap for young players always gets me 🤦
Thanks for noticing it, will fix
# Discard the potentially half-rendered versions of these properties and replace them with the | ||
# original unrendered versions. They will get rendered properly at evaluation time | ||
meta_fields.update(unrendered_properties) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is breaking because it impacts the data hash. If someone used @gateway
or other load-time-renderable variables within physical_properties
, then they should get a diff due to this change.
This is similar to the recent when_matched
refactor, IIRC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes, I was meant to ask about this.
I'm still not 100% with how to write migrations and how to test they're working, they seem to be quite difficult / error prone to write with lots of pitfalls.
Also, is there any default upgrade logic that we get "for free" with a sqlglot version change?
@@ -1000,6 +1000,46 @@ Note: This is DuckDB SQL and other dialects will be transpiled accordingly. | |||
- Recursive CTEs (common table expressions) will be used for `Redshift / MySQL / MSSQL`. | |||
- For `MSSQL` in particular, there's a recursion limit of approximately 100. If this becomes a problem, you can add an `OPTION (MAXRECURSION 0)` clause after the date spine macro logic to remove the limit. This applies for long date ranges. | |||
|
|||
### @RESOLVE_TEMPLATE | |||
|
|||
`@resolve_template` is a helper macro intended to be used in situations where you need to gain access to the *components* of the physical object name. It's intended for use in the following situations: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why template? i'm not sure i understand the name, seems a bit too generic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe @render_template
would be more appropriate?
It's essentially rendering an in-line template with @this_model
broken out into its components and made available as @{catalog_name}
, @{schema_name}
and @{table_name}
so the user can recombine those as required
My original implementation added these as variables to the main evaluation context so users could access them directly, something like:
physical_properties (
location = @'s3://bucket/@{catalog_name}/@{schema_name}/@{table_name}'
)
but there was a preference to create a macro instead?
this_model = exp.to_table(evaluator.locals["this_model"]) | ||
template_str: str = template.this | ||
result = ( | ||
template_str.replace("@{catalog_name}", this_model.catalog) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this just doing string replacement?, is this safe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In short, yes. It's doing the same thing that this is doing to keep consistency. It deliberately mimics our macro syntax, is that what you meant by safety?
I was holding off "upgrading" it to actually run the template string through the macro evaluator via evaluator.evaluate()
, figuring that could be easily added later if the need arose
sqlmesh/core/macros.py
Outdated
def resolve_template( | ||
evaluator: MacroEvaluator, | ||
template: exp.Literal, | ||
mode: exp.Literal = exp.Literal.string("literal"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can just be a literal string, the type convert will handle it
mode: str
This PR:
physical_properties
to run time instead of load time@this_model
@physical_location
that allows a user to create arbitraryexp.Literal
orexp.Table
objects in the AST based on the parts of the physical table nameThe motivation is twofold:
physical_properties
, becausephysical_properties
is currently rendered at load time which means no snapshot info is available@this_model
returns a fully rendered FQN string which makes it difficult to work with. We can't change this without breaking backwards compatibility, but we can introduce something else that allows users to build new strings based on the components of the FQN, rather than the FQN itself