[Bug] spark_catalog requires a single-part namespace in dbt python incremental model #1300

carlos-veris · 2024-07-23T15:07:24Z

Is this a new bug in dbt-bigquery?

I believe this is a new bug in dbt-bigquery
I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

When running a dbt python model with an incremental strategy and using the property dbt.this to access the location of the current model, the code breaks.

Here's the faulty code:

# Processs new rows only
if dbt.is_incremental:
    # only new rows compared to max in current table
    max_from_this = f"select max(created_at) from {dbt.this}"
    df = df.filter(df.created_at >= session.sql(max_from_this).collect()[0][0])

Here's the error output:

df = df.filter(df.created_at >= session.sql(max_from_this).collect()[0][0])
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/session.py", line 1034, in sql
File "/usr/lib/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", line 1321, in __call__
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 196, in deco
pyspark.sql.utils.AnalysisException: spark_catalog requires a single-part namespace, but got [x, y]

X is the project name and Y is the dataset name.
It is using the dbt-bigquery adapter (v.1.7.2) and uses dataproc to submit the Python model.

Expected Behavior

It is expected that one can make use of the aforementioned property in order to run incremental models.

Steps To Reproduce

Python model using the dbt-bigquery adapter
Under the model() function, set the materialized property of dbt.config to incremental

def model(dbt, session):
    dbt.config(
        materialized="incremental",
        dataproc_region=<DATAPROC_REGION>
        submission_method=<SUBMISSION_METHOD>
    )

Try to use the dbt.this property

max_from_this = f"select max(created_at) from {dbt.this}
df = df.filter(df.created_at >= session.sql(max_from_this).collect()[0][0])

Run the model using dbt run
You should be able to check the logs in the dataproc batch, using the Google Cloud Console.

Relevant log output

Using the default container image
Waiting for container log creation
PYSPARK_PYTHON=/opt/dataproc/conda/bin/python
JAVA_HOME=/usr/lib/jvm/temurin-11-jdk-amd64
SPARK_EXTRA_CLASSPATH=
:: loading settings :: file = /etc/spark/conf/ivysettings.xml
/usr/lib/spark/python/lib/pyspark.zip/pyspark/pandas/__init__.py:49: UserWarning: 'PYARROW_IGNORE_TIMEZONE' environment variable was not set. It is required to set this environment variable to '1' in both driver and executor sides if you use pyarrow>=2.0.0. pandas-on-Spark will set it for you but it does not work if there is a Spark context already launched.
Traceback (most recent call last):
  File "/var/dataproc/tmp/srvls-batch-0c5d7153-2f67-4614-86b2-1ed2f1264837/<PYTHON-MODEL.py>", line 264, in <module>
    df = model(dbt, spark)
  File "/var/dataproc/tmp/srvls-batch-0c5d7153-2f67-4614-86b2-1ed2f1264837/<PYTHON-MODEL.py>", line 165, in model
    df = df.filter(df.created_at >= session.sql(max_from_this).collect()[0][0])
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/session.py", line 1034, in sql
  File "/usr/lib/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", line 1321, in __call__
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 196, in deco
pyspark.sql.utils.AnalysisException: spark_catalog requires a single-part namespace, but got [x, y]

Environment

dbt-core: 1.7.2
dbt-bigquery: 1.7.2

Additional Context

References:

python models in dbt

The text was updated successfully, but these errors were encountered:

carlos-veris added bug Something isn't working triage labels Jul 23, 2024

amychen1776 added the python Pull requests that update Python code label Jul 24, 2024

amychen1776 added python_models and removed python Pull requests that update Python code triage labels Aug 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] spark_catalog requires a single-part namespace in dbt python incremental model #1300

[Bug] spark_catalog requires a single-part namespace in dbt python incremental model #1300

carlos-veris commented Jul 23, 2024 •

edited

Loading

[Bug] spark_catalog requires a single-part namespace in dbt python incremental model #1300

[Bug] spark_catalog requires a single-part namespace in dbt python incremental model #1300

Comments

carlos-veris commented Jul 23, 2024 • edited Loading

Is this a new bug in dbt-bigquery?

Current Behavior

Expected Behavior

Steps To Reproduce

Relevant log output

Environment

Additional Context

References:

carlos-veris commented Jul 23, 2024 •

edited

Loading