Feat!: Adjust physical_properties evaluation and add macro to resolve physical table names #3772

erindru · 2025-02-03T03:53:58Z

This PR:

Moves evaluation of physical_properties to run time instead of load time
- This enables the usage of things that only exist at run time, such as @this_model
Adds a macro called @physical_location that allows a user to create arbitrary exp.Literal or exp.Table objects in the AST based on the parts of the physical table name

The motivation is twofold:

Right now, there is no way to set custom locations for tables on engines that separate storage and compute (Athena, Trino, Spark etc). This is true even if you write a custom macro and reference it from physical_properties, because physical_properties is currently rendered at load time which means no snapshot info is available
Right now, @this_model returns a fully rendered FQN string which makes it difficult to work with. We can't change this without breaking backwards compatibility, but we can introduce something else that allows users to build new strings based on the components of the FQN, rather than the FQN itself

erindru · 2025-02-03T03:56:20Z

docs/concepts/macros/sqlmesh_macros.md

@@ -1000,6 +1000,46 @@ Note: This is DuckDB SQL and other dialects will be transpiled accordingly.
 - Recursive CTEs (common table expressions) will be used for `Redshift / MySQL / MSSQL`.
 - For `MSSQL` in particular, there's a recursion limit of approximately 100. If this becomes a problem, you can add an `OPTION (MAXRECURSION 0)` clause after the date spine macro logic to remove the limit. This applies for long date ranges.

+### @PHYSICAL_LOCATION
+
+`@PHYSICAL_LOCATION` is a helper macro intended to be used in situations where you need to gain acccess to the *components* of the physical table name. It's intended for use in the following situations:


As always, naming things is one of the hardest problems. I considered:

@physical_table - the object being created is not always a table, such as kind: view

@this_snapshot - uses the terminology snapshot which I can see in some places is being eschewed for "model version"

@this_model_version - kind of long winded

themisvaltinos · 2025-02-03T21:25:27Z

sqlmesh/core/snapshot/evaluator.py

+            **common_render_kwargs,
+        )
+
+        rendered_physical_properties = snapshot.model.render_physical_properties(


Perhaps this could be moved inside apply, so it doesn't occur when limit is None where the properties aren't needed. Though this doesn't seem a particularly expensive render, so it's likely not that significant

Oh, the reason I had it outside apply was so that we didnt occur the hit for every index being applied since we only need to render it once.

I see what you're saying though. Perhaps it could be rendered right after the limit check and passed to apply instead? The problem is, apply is called in a bunch of places so all the call sites would need to be updated

Yes you're right, I hadn't considered that. Maybe then something like this:

rendered_physical_properties = snapshot.model.render_physical_properties( **render_statements_kwargs ) if limit is None else None

But arguably I don't think it is that much of a slowdown, so the way you have it now should be fine

themisvaltinos · 2025-02-03T21:26:30Z

sqlmesh/core/macros.py

+    mode: exp.Literal = exp.Literal.string("literal"),
+) -> t.Union[exp.Literal, exp.Table]:
+    """
+    Generates a either a String literal or an exp.Table representing a physical table location, based on rendering the provided template String literal.


Small typo: Generates either a String literal...

… physical_properties

izeigerman · 2025-02-04T00:48:04Z

sqlmesh/core/model/definition.py

@@ -630,6 +630,26 @@ def render_merge_filter(
            raise SQLMeshError(f"Expected one expression but got {len(rendered_exprs)}")
        return rendered_exprs[0].transform(d.replace_merge_table_aliases)

+    def render_physical_properties(self, **render_kwargs: t.Any) -> t.Dict[str, exp.Expression]:


Don't we want the same behavior for virtual properties?

Recording the result of our internal discussion here: maybe, but I couldnt think of a use-case and was trying to keep this PR small.

Since the load time rendering context is a subset of the runtime rendering context, if we were to later find a use-case to render virtual_properties at runtime then we could make the switch then without breaking backwards compatibility

izeigerman · 2025-02-04T00:58:45Z

sqlmesh/core/macros.py

+    Example:
+        >>> from sqlglot import parse_one, exp
+        >>> from sqlmesh.core.macros import MacroEvaluator, RuntimeStage
+        >>> sql = "@physical_location('s3://data-bucket/prod/@{catalog_name}/@{schema_name}/@{table_name}')"


@tobymao does this API look good to you?

izeigerman · 2025-02-04T01:11:56Z

sqlmesh/core/macros.py

@@ -1183,6 +1183,59 @@ def date_spine(
    return exp.select(alias_name).from_(exploded)


+@macro()
+def physical_location(


IMHO, the name is somewhat confusing and doesn't accurately reflect what it does. We already have resolve_table can this be resolve_template? Or perhaps we can extend resolve_table with an optional template instead? If it's provided then we follow the template, if not we return fqn.

Yep, I don't like the current name either.

The problem with extending resolve_table is that it's not a macro per se - it's a lambda that gets passed into the template context. And it's also available directly in Python from Python models and is defined to return a str instead of an AST node so I think changing it will cause breakage.

I like @resolve_template - it doesnt limit the terminiology to tables only (since it's really "physical layer object" which could be a view for kind: VIEW models) and it indicates that templating is occurring

ok this is fine

georgesittas · 2025-02-04T18:51:53Z

sqlmesh/core/macros.py

+        >>> evaluator.transform(parse_one(sql)).sql()
+        "'s3://data-bucket/prod/test_catalog/sqlmesh__test/test__test_model__2517971505'"
+    """
+    if evaluator.runtime_stage != "loading":


FYI, there's also the "testing" runtime stage. How would these macro calls be handled in the context of unit tests? Can we actually unit test models that use this?

Good call, i'll write a test to verify the behaviour during unit tests.

Theoretically it should be ignored entirely unless @this_model is available so maybe I adjust the condition to just check for the presence of that and ignore the runtime stage

I've adjusted the logic to be dependent on the presence of @this_model instead and written a test to show it works from unit tests

georgesittas · 2025-02-04T18:53:09Z

sqlmesh/core/macros.py

+        else:
+            return exp.Literal.string(result)


[Nit]

Suggested change

else:

return exp.Literal.string(result)

return exp.Literal.string(result)

georgesittas · 2025-02-04T18:54:25Z

sqlmesh/core/macros.py

+        )
+
+        if mode.this.lower() == "table":
+            return exp.to_table(result)


Should we pass the dialect in this to_table call?

Man this trap for young players always gets me 🤦

Thanks for noticing it, will fix

georgesittas · 2025-02-04T19:00:14Z

sqlmesh/core/model/definition.py

+    # Discard the potentially half-rendered versions of these properties and replace them with the
+    # original unrendered versions. They will get rendered properly at evaluation time
+    meta_fields.update(unrendered_properties)


I think this is breaking because it impacts the data hash. If someone used @gateway or other load-time-renderable variables within physical_properties, then they should get a diff due to this change.

This is similar to the recent when_matched refactor, IIRC.

Ah, yes, I was meant to ask about this.

I'm still not 100% with how to write migrations and how to test they're working, they seem to be quite difficult / error prone to write with lots of pitfalls.

Also, is there any default upgrade logic that we get "for free" with a sqlglot version change?

tobymao · 2025-02-04T22:25:20Z

docs/concepts/macros/sqlmesh_macros.md

@@ -1000,6 +1000,46 @@ Note: This is DuckDB SQL and other dialects will be transpiled accordingly.
 - Recursive CTEs (common table expressions) will be used for `Redshift / MySQL / MSSQL`.
 - For `MSSQL` in particular, there's a recursion limit of approximately 100. If this becomes a problem, you can add an `OPTION (MAXRECURSION 0)` clause after the date spine macro logic to remove the limit. This applies for long date ranges.

+### @RESOLVE_TEMPLATE
+
+`@resolve_template` is a helper macro intended to be used in situations where you need to gain access to the *components* of the physical object name. It's intended for use in the following situations:


why template? i'm not sure i understand the name, seems a bit too generic

maybe @render_template would be more appropriate?

It's essentially rendering an in-line template with @this_model broken out into its components and made available as @{catalog_name}, @{schema_name} and @{table_name} so the user can recombine those as required

My original implementation added these as variables to the main evaluation context so users could access them directly, something like:

physical_properties ( location = @'s3://bucket/@{catalog_name}/@{schema_name}/@{table_name}' )

but there was a preference to create a macro instead?

tobymao · 2025-02-04T22:26:33Z

sqlmesh/core/macros.py

+        this_model = exp.to_table(evaluator.locals["this_model"])
+        template_str: str = template.this
+        result = (
+            template_str.replace("@{catalog_name}", this_model.catalog)


is this just doing string replacement?, is this safe?

In short, yes. It's doing the same thing that this is doing to keep consistency. It deliberately mimics our macro syntax, is that what you meant by safety?

I was holding off "upgrading" it to actually run the template string through the macro evaluator via evaluator.evaluate(), figuring that could be easily added later if the need arose

tobymao · 2025-02-04T22:27:34Z

sqlmesh/core/macros.py

+def resolve_template(
+    evaluator: MacroEvaluator,
+    template: exp.Literal,
+    mode: exp.Literal = exp.Literal.string("literal"),


this can just be a literal string, the type convert will handle it

mode: str

erindru commented Feb 3, 2025

View reviewed changes

erindru force-pushed the erin/macros-in-physical-properties branch 2 times, most recently from 26085a9 to 52624f7 Compare February 3, 2025 04:59

erindru marked this pull request as ready for review February 3, 2025 05:08

erindru force-pushed the erin/macros-in-physical-properties branch from 52624f7 to cbc7fbf Compare February 3, 2025 21:07

themisvaltinos reviewed Feb 3, 2025

View reviewed changes

erindru added 2 commits February 4, 2025 00:42

Feat: Add macro to resolve physical table names and enable its use in…

e239d4b

… physical_properties

Fix typo

0f03c7b

erindru force-pushed the erin/macros-in-physical-properties branch from cbc7fbf to 0f03c7b Compare February 4, 2025 00:44

izeigerman reviewed Feb 4, 2025

View reviewed changes

Rename @physical_location -> @resolve_template

36a86a0

georgesittas reviewed Feb 4, 2025

View reviewed changes

tobymao reviewed Feb 4, 2025

View reviewed changes

Show that the macro works in unit tests

274960e

erindru changed the title ~~Feat: Adjust physical_properties evaluation and add macro to resolve physical table names~~ Feat!: Adjust physical_properties evaluation and add macro to resolve physical table names Feb 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat!: Adjust physical_properties evaluation and add macro to resolve physical table names #3772

Feat!: Adjust physical_properties evaluation and add macro to resolve physical table names #3772

erindru commented Feb 3, 2025 •

edited

Loading

erindru Feb 3, 2025 •

edited

Loading

themisvaltinos Feb 3, 2025

erindru Feb 3, 2025

themisvaltinos Feb 4, 2025

themisvaltinos Feb 3, 2025

erindru Feb 4, 2025

izeigerman Feb 4, 2025

erindru Feb 4, 2025

izeigerman Feb 4, 2025

izeigerman Feb 4, 2025

erindru Feb 4, 2025 •

edited

Loading

tobymao Feb 4, 2025

georgesittas Feb 4, 2025

erindru Feb 4, 2025 •

edited

Loading

erindru Feb 5, 2025

georgesittas Feb 4, 2025 •

edited

Loading

georgesittas Feb 4, 2025

erindru Feb 4, 2025

georgesittas Feb 4, 2025

erindru Feb 4, 2025

tobymao Feb 4, 2025

erindru Feb 4, 2025 •

edited

Loading

tobymao Feb 4, 2025

erindru Feb 4, 2025

tobymao Feb 4, 2025

Feat!: Adjust physical_properties evaluation and add macro to resolve physical table names #3772

Are you sure you want to change the base?

Feat!: Adjust physical_properties evaluation and add macro to resolve physical table names #3772

Conversation

erindru commented Feb 3, 2025 • edited Loading

erindru Feb 3, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

erindru Feb 4, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

erindru Feb 4, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

georgesittas Feb 4, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

erindru Feb 4, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

erindru commented Feb 3, 2025 •

edited

Loading

erindru Feb 3, 2025 •

edited

Loading

erindru Feb 4, 2025 •

edited

Loading

erindru Feb 4, 2025 •

edited

Loading

georgesittas Feb 4, 2025 •

edited

Loading

erindru Feb 4, 2025 •

edited

Loading