You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have searched the existing issues, and I could not find an existing issue for this bug
Current Behavior
When running a dbt python model with an incremental strategy and using the property dbt.this to access the location of the current model, the code breaks.
Here's the faulty code:
# Processs new rows onlyifdbt.is_incremental:
# only new rows compared to max in current tablemax_from_this=f"select max(created_at) from {dbt.this}"df=df.filter(df.created_at>=session.sql(max_from_this).collect()[0][0])
Here's the error output:
df = df.filter(df.created_at >= session.sql(max_from_this).collect()[0][0])
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/session.py", line 1034, in sql
File "/usr/lib/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", line 1321, in __call__
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 196, in deco
pyspark.sql.utils.AnalysisException: spark_catalog requires a single-part namespace, but got [x, y]
X is the project name and Y is the dataset name.
It is using the dbt-bigquery adapter (v.1.7.2) and uses dataproc to submit the Python model.
Expected Behavior
It is expected that one can make use of the aforementioned property in order to run incremental models.
Steps To Reproduce
Python model using the dbt-bigquery adapter
Under the model() function, set the materialized property of dbt.config to incremental
max_from_this= f"select max(created_at) from {dbt.this}df=df.filter(df.created_at>=session.sql(max_from_this).collect()[0][0])
Run the model using dbt run
You should be able to check the logs in the dataproc batch, using the Google Cloud Console.
Relevant log output
Using the default container image
Waiting for container log creation
PYSPARK_PYTHON=/opt/dataproc/conda/bin/python
JAVA_HOME=/usr/lib/jvm/temurin-11-jdk-amd64
SPARK_EXTRA_CLASSPATH=
:: loading settings :: file = /etc/spark/conf/ivysettings.xml
/usr/lib/spark/python/lib/pyspark.zip/pyspark/pandas/__init__.py:49: UserWarning: 'PYARROW_IGNORE_TIMEZONE' environment variable was not set. It is required to set this environment variable to '1'in both driver and executor sides if you use pyarrow>=2.0.0. pandas-on-Spark will set it for you but it does not work if there is a Spark context already launched.
Traceback (most recent call last):
File "/var/dataproc/tmp/srvls-batch-0c5d7153-2f67-4614-86b2-1ed2f1264837/<PYTHON-MODEL.py>", line 264, in<module>
df = model(dbt, spark)
File "/var/dataproc/tmp/srvls-batch-0c5d7153-2f67-4614-86b2-1ed2f1264837/<PYTHON-MODEL.py>", line 165, in model
df = df.filter(df.created_at >= session.sql(max_from_this).collect()[0][0])
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/session.py", line 1034, in sql
File "/usr/lib/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", line 1321, in __call__
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 196, in deco
pyspark.sql.utils.AnalysisException: spark_catalog requires a single-part namespace, but got [x, y]
Is this a new bug in dbt-bigquery?
Current Behavior
When running a dbt python model with an incremental strategy and using the property
dbt.this
to access the location of the current model, the code breaks.Here's the faulty code:
Here's the error output:
X is the project name and Y is the dataset name.
It is using the
dbt-bigquery
adapter (v.1.7.2) and uses dataproc to submit the Python model.Expected Behavior
It is expected that one can make use of the aforementioned property in order to run incremental models.
Steps To Reproduce
dbt-bigquery
adaptermodel()
function, set thematerialized
property ofdbt.config
toincremental
dbt.this
propertydbt run
Relevant log output
Environment
Additional Context
References:
The text was updated successfully, but these errors were encountered: