[Bug] Unit tests not working for incremental models #10686

CraigWilson-ZOE · 2024-09-10T15:15:30Z

Is this a new bug in dbt-core?

I believe this is a new bug in dbt-core
I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

When running a unit tests for an incremental model I am getting the following error:

14:41:49    Database Error in unit_test unit_test__int_barcode_scanned_events_incremental_mode (models/silver/intermediate/food_logging/tests.yml)
  Syntax error: Unexpected "." at [18:44]

Expected Behavior

The unit tests to pass

Steps To Reproduce

Environment is:

dbt 1.8.6
dbt bigquery 1.8.2
dbt-adapters 1.5.0
dbt-utils 1.3.0

incremental model configuration:

{{
    config(
        materialized='incremental',
        incremental_strategy='insert_overwrite',
        unique_key='event_id',
        partition_by={
            "field": "date_day",
            "data_type": "date"
        }
    )
}}

Unit test config:

unit_tests:
  - name: unit_test__int_barcode_scanned_events_incremental_mode
    description: Unit test for int_barcode_scanned_events in incremental mode
    model: int_barcode_scanned_events
    overrides:
      macros:
        is_incremental: true
    given:
        - input: ref('stg_insights_mixpanel_event')
          format: csv
          fixture: raw_mixpanel_events
        - input: this
          format: csv
          fixture: incremental_int_barcode_scanned_events
    expect:
        format: csv
        fixture: expected__int_barcode_scanned_events

Relevant log output

...
15:06:35  3 of 3 ERROR int_barcode_scanned_events::unit_test__int_barcode_scanned_events_incremental_mode  [ERROR in 1.31s]
...
15:06:58    Database Error in unit_test unit_test__int_barcode_scanned_events_incremental_mode (models/silver/intermediate/food_logging/tests.yml)
  Syntax error: Unexpected "." at [18:44]

Environment

- OS: macos 14.3.1
- Python: Python 3.12.2
- dbt: 1.8.6

Which database adapter are you using with dbt?

bigquery

Additional Context

The SQL that seems to be produced and is causing the error is the following:

/* {"app": "dbt", "dbt_version": "1.8.5", "profile_name": "default", "target_name": "development", "node_id": "unit_test.dbt_bq_elt.int_barcode_scanned_events.unit_test__int_barcode_scanned_events_incremental_mode"} */


      
        
            select distinct
                table_schema,
                table_name,
                
            case table_type
                when 'BASE TABLE' then 'table'
                when 'EXTERNAL TABLE' then 'external'
                when 'MATERIALIZED VIEW' then 'materializedview'
                else lower(table_type)
            end as `table_type`


            from `dbt-development-joinzoe`..INFORMATION_SCHEMA.TABLES
            where lower(table_name) like lower ('')
                and lower(table_name) not like lower ('')

The text was updated successfully, but these errors were encountered:

dbeatty10 · 2024-09-11T13:49:16Z

Thanks for reporting this @CraigWilson-ZOE !

If you comment out the unit test completely, does it still give you that error?

The SQL that is creating that syntax error looks like it could be coming from bigquery__get_catalog or bigquery__get_catalog_relations. Did you happen to override either of those macros or any others? If so, which ones?

See below for a simplified version of the situation you described. It worked without error for me, so I'm curious if it works for you or not?

Example

models/int_barcode_scanned_events.sql

{{
    config(
        materialized='incremental',
        incremental_strategy='insert_overwrite',
        unique_key='event_id',
        partition_by={
            "field": "date_day",
            "data_type": "date"
        }
    )
}}

select 1 as event_id, cast('2002-02-02' as date) as date_day

models/_unit_tests.yml

unit_tests:
  - name: unit_test__int_barcode_scanned_events_incremental_mode
    description: Unit test for int_barcode_scanned_events in incremental mode
    model: int_barcode_scanned_events
    overrides:
      macros:
        is_incremental: true
    given:
      - input: this
        rows:
          - {event_id: 1, date_day: 2001-01-01}
    expect:
        rows:
          - {event_id: 1, date_day: 2002-02-02}

Commands to run:

dbt build -s int_barcode_scanned_events

Output:

$ dbt build -s int_barcode_scanned_events    
 
13:41:43  Running with dbt=1.8.6
13:42:10  Registered adapter: bigquery=1.8.2
13:42:11  Found 1 model, 473 macros, 1 unit test
13:42:11  
13:42:27  Concurrency: 10 threads (target='blue')
13:42:27  
13:42:27  1 of 2 START unit_test int_barcode_scanned_events::unit_test__int_barcode_scanned_events_incremental_mode  [RUN]
13:42:33  1 of 2 PASS int_barcode_scanned_events::unit_test__int_barcode_scanned_events_incremental_mode  [PASS in 5.48s]
13:42:33  2 of 2 START sql incremental model dbt_dbeatty.int_barcode_scanned_events ...... [RUN]
13:42:40  2 of 2 OK created sql incremental model dbt_dbeatty.int_barcode_scanned_events . [SCRIPT (56.0 Bytes processed) in 7.21s]
13:42:40  
13:42:40  Finished running 1 unit test, 1 incremental model in 0 hours 0 minutes and 29.38 seconds (29.38s).
13:42:40  
13:42:40  Completed successfully
13:42:40  
13:42:40  Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2

tsafacjo · 2024-09-11T18:41:20Z

can I pick it ?

dbeatty10 · 2024-09-11T21:33:58Z

@tsafacjo We're not yet sure if this is a bug in dbt-core or if the root cause is something else.

If you're looking for a place to contribute to dbt, there's a list of issues we've labeled as "help wanted" here.

tsafacjo · 2024-09-11T21:42:58Z

ok thanks.

CraigWilson-ZOE · 2024-09-30T08:51:30Z

Sorry for the late reply, I have been on vacation.

If you comment out the unit test completely, does it still give you that error?

I have commented out the unit test and I do not get the error anymore.

The SQL that is creating that syntax error looks like it could be coming from bigquery__get_catalog or bigquery__get_catalog_relations. Did you happen to override either of those macros or any others? If so, which ones?

No we do not override any of those macros.

See below for a simplified version of the situation you described. It worked without error for me, so I'm curious if it works for you or not?

I have tried with the super simple version and it passed without error.

{{
    config(
        materialized='incremental',
        incremental_strategy='insert_overwrite',
        unique_key='event_id',
        partition_by={
            "field": "date_day",
            "data_type": "date"
        }
    )
}}

select 1 as event_id, cast('2002-02-02' as date) as date_day

  - name: unit_test__int_barcode_test_increment
    description: Unit test for int_barcode_test
    model: int_barcode_test
    overrides:
      macros:
        is_incremental: true
    given:
      - input: this
        rows:
          - {event_id: 1, date_day: 2001-01-01}
    expect:
        rows:
          - {event_id: 1, date_day: 2002-02-02}

NOTE: I created a new model so I could run both at the same time. My original model still fails with the same error.

CraigWilson-ZOE · 2024-09-30T10:00:37Z

Also worth posting here is my packages.yml, I have removed all unecessary packages and down to the bare minimum for the who project:

packages:
  - package: dbt-labs/dbt_utils
    version: 1.3.0
  - package: fivetran/mailchimp_source
    version: [">=0.4.0", "<0.6.0"]
  - package: fivetran/mailchimp
    version: [">=0.7.0", "<0.10.0"]
  - package: fivetran/zendesk_source
    version: [">=0.10.0", "<0.14.0"]
  - package: fivetran/zendesk
    version: [">=0.10.0", "<0.14.0"]
  - package: fivetran/google_ads_source
    version: [">=0.9.1", "<0.11.0"]
  - package: fivetran/google_ads
    version: [">=0.9.1", "<0.11.0"]
  - package: fivetran/facebook_ads
    version: [">=0.6.0", "<0.8.0"]
  - package: fivetran/facebook_ads_source
    version: [">=0.6.0", "<0.8.0"]

dbeatty10 · 2024-09-30T17:16:28Z

I have tried with the super simple version and it passed without error.

Okay! The next step is to try to isolate whichever piece(s) of your model that is triggering the syntax error when it is executed via dbt unit tests.

Are you able to tweak that super simple version so that it is closer and closer to your actual model until you find something that breaks the simplified version?

devmessias · 2024-10-16T22:13:40Z

Hi @CraigWilson-ZOE, I understand that you are sharing the compiled SQL rather than the model, correct? Could you please share it with me? I’d like to see if it’s similar to the other issues I was looking into in this PR: #10849. If the model has any kind of var, env_var, or other thins that cames from context this can cause this kind of error during the unit test phase

devmessias · 2024-10-16T22:35:38Z

Why I'm saying that @CraigWilson-ZOE . Your compiled sql is wrong due the double "..", which will be the result of unit test if you have a model like this

select
    count(*) as result
from {{target.database}}.{{env_var('schema')}}.INFORMATION_SCHEMA.TABLES

result

select
    count(*) as result
from dbt.INFORMATION_SCHEMA.TABLES

kangshung · 2024-10-22T13:05:18Z

Are there any updates on this? I have the same problem @dbeatty10

devmessias · 2024-10-22T13:38:09Z

I think it's a good idea if you post the model.sql file you're having issues , @kangshung

dbeatty10 · 2024-10-25T22:35:52Z

Oops, I closed it too hastily!

@kangshung as @devmessias said, what you reported in #10686 (comment) is a duplicate of #10433

But we still don't have steps to reproduce for what @CraigWilson-ZOE originally reported in this issue.

@CraigWilson-ZOE does #10686 (comment) look similar to your model that is having the issues with unit testing?

CraigWilson-ZOE · 2024-10-29T13:53:33Z

Hi All,

Sorry I have been away from my computer for a while.

Now that I am back and had some time to dive into this, it seems like a call to an internal dbt-core macro is failing. The SQL that is being executed is:

            select distinct
                table_schema,
                table_name,
                
            case table_type
                when 'BASE TABLE' then 'table'
                when 'EXTERNAL TABLE' then 'external'
                when 'MATERIALIZED VIEW' then 'materializedview'
                else lower(table_type)
            end as `table_type`


            from `development`..INFORMATION_SCHEMA.TABLES
            where lower(table_name) like lower ('')
                and lower(table_name) not like lower ('')

I believe this is coming from get_relations_by_pattern. We use this to generate some SQL for the incremental model, and it seems like the schema section for this macro isn't getting set within dbt within a unit test. So it isn't directly related to the unit tests, but is a side effect of the overall environment that is setup within dbt during unit test executions.

Is there a reason that the schema wouldn't be set in this instance?

I have checked via logging output of the target and this objects, the this object is empty for all parts of the model information, whereas target object does have the information for the most part.

I do think this is a bug and needs to be fixed otherwise we are saying that some macros cannot be used in a incremental model.

devmessias · 2024-10-29T14:36:25Z

Hi @CraigWilson-ZOE, this seems similar to #10891 and #10139. This is not the raw code for the model, right? Can you share the raw code? As I discuss in #10908 maybe we should allow overriding dynamic variables

devmessias · 2024-10-29T14:48:24Z

In the meantime, testing a simpler model that uses the same macro in a non-incremental model might help. I tend to believe that your issue isn't related to the model being incremental or not

CraigWilson-ZOE · 2024-10-29T15:13:22Z

Hi @CraigWilson-ZOE, this seems similar to #10891 and #10139. This is not the raw code for the model, right? Can you share the raw code? As I discuss in #10908 maybe we should allow overriding dynamic variables

Yes it does look the same as #10891 , thanks!

I am not sure about overriding dynamic variables vs just having them work as expected by many people or provide an alternative.

In the meantime, testing a simpler model that uses the same macro in a non-incremental model might help. I tend to believe that your issue isn't related to the model being incremental or not

I agree that it isn't related to it being incremental or not, it is due to the above issue.

devmessias · 2024-10-29T16:09:07Z

I don't think there is an easy way to do that, as pointed in #8499. The CTE approach will solve some limitations but will impact the performance , and will create somo additional problems according to #8499 (comment)
Overriding dynamic variables using the yaml would be easier to implement as a new feature, and we could still keep the performance for unit tests, similar to the mocked behavior in Python's unittest.

But in your case, if you have just one macro call could you try to override the macro?

dbeatty10 · 2024-10-29T22:48:32Z

@CraigWilson-ZOE could you share your line of code that you are using for dbt_utils.get_relations_by_pattern so we can see the schema argument?

I'm trying to reproduce the same error you are getting, but I'm not able when I use target.schema for the schema within dbt_utils.get_relations_by_pattern.

devmessias · 2024-10-31T13:01:27Z

There are so many issues whose root cause is this limitation, and users are incorrectly associating it with other things. Wouldn't it be helpful to improve the unit test doc to emphasize this limitation, @dbeatty10?

CraigWilson-ZOE · 2024-10-31T15:21:09Z

@dbeatty10

Here is a breakdown of the SQL that I have and what is causing the issue. This is a change I made to ensure that getting the latest partition data in BigQuery is much more efficient than doing a query on the target table. This has saved a lot of processing on our large tables.

Macro to get the latest partition info:

{% macro get_latest_partition_value(table, partition_data_type=none) %}
    {% if not execute %}
        {{ return(modules.datetime.date.fromisoformat('2023-01-01')) }}
    {% endif %}
    {% if not dbt_utils.get_relations_by_pattern(table.schema, table.name) %}
        {{ return(modules.datetime.date.fromisoformat('1900-01-01')) }}
    {% endif %}
    {% set node = graph.nodes.get("model." + project_name + "." + table.name) %}
    {% if node is none %}
        {{ return(none) }}
    {% endif %}
    {% if not partition_data_type %}
        {% set raw_partition_by = config.get('partition_by', none) %}
        {% if not raw_partition_by %}
            {% do exceptions.raise_compiler_error("partition_by must be set in the model config to get latest partition value") %}
        {% endif %}
        {% set partition_by = adapter.parse_partition_by(raw_partition_by) %}
        {% set partition_data_type = partition_by.data_type %}
    {% endif %}

    {%- set query -%}
        SELECT MAX(
        {% if partition_data_type == 'date' %}
            PARSE_DATE("%Y%m%d",partition_id)
        {% elif partition_data_type == 'timestamp'%}
            PARSE_TIMESTAMP("%Y%m%d",partition_id)
        {% elif partition_data_type == 'datetime' %}
            PARSE_DATETIME("%Y%m%d",partition_id)
        {% elif partition_data_type == 'int64' %}
            CAST(partition_id AS INT64)
        {% else %}
            {% do exceptions.raise_compiler_error("partition_by data type must be date, timestamp, datetime or int64") %}
        {% endif %}
        )
        FROM {{table.database}}.{{table.schema}}.INFORMATION_SCHEMA.PARTITIONS
        WHERE 
            table_name = '{{table.name}}' 
            AND partition_id IS NOT NULL 
            AND partition_id != '__NULL__' 
            AND partition_id != '__UNPARTITIONED__'
            AND total_rows > 0
    {% endset %}

    {% set results = run_query(query) %}

    {% set partition_max_value = results.columns[0].values()[0] %}
    {{ return(partition_max_value) }}

{% endmacro %}

And it is used like this (example):

{% set partition_date = get_latest_partition_value(this) %}
{{
    config(
        materialized='incremental',
        unique_key='correlation_id',
        partition_by={
            "field": "search_date",
            "data_type": "date"
        },
        cluster_by=["message_type"],
    )
}}

WITH

all_search_logs AS
(
    SELECT DISTINCT
        correlation_id,
        timestamp AS search_time,
        username,
        payload,
        origin_service,
        message_type,
        DATE(timestamp) AS search_date,
        COALESCE(CAST(JSON_EXTRACT_SCALAR(metadata, '$.with_scores') AS BOOL), TRUE) AS with_scores
    FROM
        {{ source('pubsub_dataflow', 'pubsub_to_bigquery_partitioned') }}
    {% if target.name != 'production' %}
        WHERE
            timestamp >= (CAST(DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY) AS TIMESTAMP))
    {% else %}
        {% if is_incremental() and partition_date is not none %}
            WHERE
                timestamp >= TIMESTAMP('{{ partition_date }}')
        {% endif %}
    {% endif %}
)

SELECT * FROM all_search_logs

The table parameter in the macro during the unit test execution is NULL/NONE whereas in normal running or compilation it is correctly set as the target table.

devmessias · 2024-10-31T16:52:15Z

This will not work with the current limitations of unit tests. However, you can override the macro since it is being called only once. In that case, having a feature that allows to override a dynamic variable is not necessary.

dbeatty10 · 2024-11-01T21:31:11Z

Thanks for providing those details @CraigWilson-ZOE ! I was able to get the same error as you with a simplified version of your scenario. And I was also able to resolve it by providing an override for the get_latest_partition_value macro in the unit test definition.

Explanation

It looks like these lines are the ones that won't work the way you want when you execute them in a unit test:

        FROM {{table.database}}.{{table.schema}}.INFORMATION_SCHEMA.PARTITIONS
        WHERE 
            table_name = '{{table.name}}'

There's an explanation in #10891 (comment).

The TLDR is that within the context of unit tests, this is a str rather than a Relation, so things like {{table.schema}} will be empty.

Workaround

Could you try providing an override for your get_latest_partition_value macro like suggested by @devmessias ?

It would look something like this:

  - name: unit_test__int_barcode_scanned_events_incremental_mode
    description: Unit test for int_barcode_scanned_events in incremental mode
    model: int_barcode_scanned_events
    overrides:
      macros:
        is_incremental: true
        get_latest_partition_value: SOMETHING_HERE

CraigWilson-ZOE · 2024-11-04T08:25:24Z

Ah that makes a lot of sense, and thanks for the example and explanation. Happy for this to be closed as I think a lot of it is covered in other issues as linked above.

dbeatty10 · 2024-11-04T22:10:10Z

Thanks again for raising this and helping home in on the root causes @CraigWilson-ZOE. Going to close this in favor of #10891 and other relevant issues linked above.

CraigWilson-ZOE added bug Something isn't working triage labels Sep 10, 2024

dbeatty10 added the unit tests Issues related to built-in dbt unit testing functionality label Sep 10, 2024

dbeatty10 added awaiting_response and removed triage labels Sep 11, 2024

github-actions bot added triage and removed awaiting_response labels Sep 11, 2024

dbeatty10 added awaiting_response and removed triage labels Sep 11, 2024

github-actions bot added triage and removed awaiting_response labels Sep 11, 2024

dbeatty10 added awaiting_response and removed triage labels Sep 12, 2024

github-actions bot added triage and removed awaiting_response labels Sep 30, 2024

dbeatty10 added awaiting_response and removed triage labels Sep 30, 2024

github-actions bot added triage and removed awaiting_response labels Oct 16, 2024

dbeatty10 added the duplicate This issue or pull request already exists label Oct 25, 2024

dbeatty10 reopened this Oct 25, 2024

dbeatty10 added awaiting_response and removed duplicate This issue or pull request already exists labels Oct 25, 2024

github-actions bot added triage and removed awaiting_response labels Oct 29, 2024

dbeatty10 added awaiting_response and removed triage labels Oct 29, 2024

github-actions bot added triage and removed awaiting_response labels Oct 31, 2024

dbeatty10 added awaiting_response and removed triage labels Nov 1, 2024

github-actions bot added triage and removed awaiting_response labels Nov 4, 2024

dbeatty10 closed this as not planned Won't fix, can't repro, duplicate, stale Nov 4, 2024

dbeatty10 added duplicate This issue or pull request already exists and removed triage labels Nov 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Unit tests not working for incremental models #10686

[Bug] Unit tests not working for incremental models #10686

CraigWilson-ZOE commented Sep 10, 2024

dbeatty10 commented Sep 11, 2024

Example

tsafacjo commented Sep 11, 2024

dbeatty10 commented Sep 11, 2024

tsafacjo commented Sep 11, 2024

CraigWilson-ZOE commented Sep 30, 2024

CraigWilson-ZOE commented Sep 30, 2024

dbeatty10 commented Sep 30, 2024

devmessias commented Oct 16, 2024

devmessias commented Oct 16, 2024 •

edited

Loading

kangshung commented Oct 22, 2024

devmessias commented Oct 22, 2024 •

edited

Loading

dbeatty10 commented Oct 25, 2024

CraigWilson-ZOE commented Oct 29, 2024

devmessias commented Oct 29, 2024 •

edited

Loading

devmessias commented Oct 29, 2024

CraigWilson-ZOE commented Oct 29, 2024

devmessias commented Oct 29, 2024

dbeatty10 commented Oct 29, 2024

devmessias commented Oct 31, 2024

CraigWilson-ZOE commented Oct 31, 2024 •

edited

Loading

devmessias commented Oct 31, 2024 •

edited

Loading

dbeatty10 commented Nov 1, 2024

CraigWilson-ZOE commented Nov 4, 2024

dbeatty10 commented Nov 4, 2024

[Bug] Unit tests not working for incremental models #10686

[Bug] Unit tests not working for incremental models #10686

Comments

CraigWilson-ZOE commented Sep 10, 2024

Is this a new bug in dbt-core?

Current Behavior

Expected Behavior

Steps To Reproduce

Relevant log output

Environment

Which database adapter are you using with dbt?

Additional Context

dbeatty10 commented Sep 11, 2024

Example

tsafacjo commented Sep 11, 2024

dbeatty10 commented Sep 11, 2024

tsafacjo commented Sep 11, 2024

CraigWilson-ZOE commented Sep 30, 2024

CraigWilson-ZOE commented Sep 30, 2024

dbeatty10 commented Sep 30, 2024

devmessias commented Oct 16, 2024

devmessias commented Oct 16, 2024 • edited Loading

kangshung commented Oct 22, 2024

devmessias commented Oct 22, 2024 • edited Loading

dbeatty10 commented Oct 25, 2024

CraigWilson-ZOE commented Oct 29, 2024

devmessias commented Oct 29, 2024 • edited Loading

devmessias commented Oct 29, 2024

CraigWilson-ZOE commented Oct 29, 2024

devmessias commented Oct 29, 2024

dbeatty10 commented Oct 29, 2024

devmessias commented Oct 31, 2024

CraigWilson-ZOE commented Oct 31, 2024 • edited Loading

devmessias commented Oct 31, 2024 • edited Loading

dbeatty10 commented Nov 1, 2024

Explanation

Workaround

CraigWilson-ZOE commented Nov 4, 2024

dbeatty10 commented Nov 4, 2024

devmessias commented Oct 16, 2024 •

edited

Loading

devmessias commented Oct 22, 2024 •

edited

Loading

devmessias commented Oct 29, 2024 •

edited

Loading

CraigWilson-ZOE commented Oct 31, 2024 •

edited

Loading

devmessias commented Oct 31, 2024 •

edited

Loading