Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Incorrect Column Name Quoting in Python Model Audit Queries #3621

Closed
rbreejen opened this issue Jan 11, 2025 · 5 comments
Closed

Bug: Incorrect Column Name Quoting in Python Model Audit Queries #3621

rbreejen opened this issue Jan 11, 2025 · 5 comments
Assignees

Comments

@rbreejen
Copy link

Description

When using Python models in SQLMesh, column names in audit configurations are incorrectly handled during SQL generation. While SQL model audits work correctly (where columns are specified without quotes), Python model audits fail because the quoted column names in the Python configuration are not properly converted to SQL identifiers.

Current Behavior

In Python models like:

@model(
    name='table',
    audits=[
        ("not_null", {"columns": ["naam"]}),  # Column name is quoted as string
    ],
)

The generated SQL treats the column name as a string literal instead of a column identifier.

Expected Behavior

  • Python models should convert the quoted string column names into proper SQL identifiers

Example

-- Current (incorrect) SQL generation for Python model:
SELECT * FROM table WHERE 'naam' IS NULL

-- Expected SQL generation:
SELECT * FROM table WHERE naam IS NULL
-- or
SELECT * FROM table WHERE "naam" IS NULL

Technical Details

The issue is specific to Python model definitions where column names are necessarily provided as Python strings. The string values aren't being properly converted to SQL identifiers during the audit query generation process.

Steps to Reproduce

  1. Create a Python model with an audit that references column names:
import typing as t
from datetime import datetime

import pandas as pd
from sqlmesh import ExecutionContext, model

@model(
    "example.basic",
    columns={
        "id": "int",
        "name": "text",
    },
    audits=[
        ("not_null", {"columns": ["id"]}),
    ],
)
def execute(
    context: ExecutionContext,
    start: datetime,
    end: datetime,
    execution_time: datetime,
    **kwargs: t.Any,
) -> pd.DataFrame:

    return pd.DataFrame([
        {"id": None, "name": "name"}
    ])
  1. Run the audit
  2. Observe the generated SQL contains single-quoted column names
    The test should fail but actually succeeds!

Impact

  • Affects only Python model audit configurations
  • SQL model audits work as expected
  • Audit validations for Python models fail because they operate on string literals instead of column values

Suggested Fix

Add proper column identifier conversion specifically for Python model audit configurations, ensuring that string column names are converted to proper SQL identifiers.

@georgesittas georgesittas added the Bug Something isn't working label Jan 13, 2025
@georgesittas georgesittas self-assigned this Jan 13, 2025
@georgesittas
Copy link
Contributor

georgesittas commented Jan 13, 2025

I'm actually unsure if this is a bug or expected behavior. I get why one would expect your example to "just work", but I noticed that the prevalent pattern in all of our examples is to instantiate Columns through to_column explicitly (e.g. here).

@tobymao what do you think? This report is legit -- we're using exp.convert here when loading python models, which in turn produces Literals out of python strings.

I thought about changing it to maybe_parse, but on one hand I'm not sure about its side-effects yet, and on the other hand we may need more logic to convert python lists into Structs, as convert does today.

@rbreejen
Copy link
Author

@georgesittas I got the example explicitly from the docs website. Didn't know there is another way.

@georgesittas
Copy link
Contributor

Ah, interesting, thanks for providing that context.

@georgesittas georgesittas removed the Bug Something isn't working label Jan 13, 2025
@georgesittas
Copy link
Contributor

Hey @rbreejen, I looked into this and I believe this behavior is expected– it's up to the user to provide the right values for each attribute (e.g. Column instances in this case, with the help of to_column). If we were to change the logic and parse the values, that would break other use cases (such as the items.py model I linked above).

I will make sure to fix docs, thanks for flagging it.

@georgesittas georgesittas closed this as not planned Won't fix, can't repro, duplicate, stale Jan 14, 2025
@rbreejen
Copy link
Author

@georgesittas Makes sense! Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants