Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

incremental models failing in 1.9.0 and above versions #921

Open
SamrtCookie opened this issue Jan 29, 2025 · 12 comments
Open

incremental models failing in 1.9.0 and above versions #921

SamrtCookie opened this issue Jan 29, 2025 · 12 comments
Labels
bug Something isn't working

Comments

@SamrtCookie
Copy link

SamrtCookie commented Jan 29, 2025

Describe the bug

I have a running model with incremental materialization on Databricks:

{{ config(materialized='incremental',
        unique_key='key',
        incremental_strategy = 'merge',
        on_schema_change= 'sync_all_columns') }}
and I am executing this query:
 select new.*,
    case when d.IS_OBSOLETE =1 then FALSE else TRUE end as ACTIVE
    from
        {{source('schemaA', 'tableB')}} as new
    left join {{source('schemaB', 'tableC')}} as d 
    on d.id= new.id
{% if is_incremental() %}

    LEFT JOIN
        {{ this }} AS old
    ON
        old.key= new.key

{% endif %}

I it is keep failing since I upgraded the dbt-databricks to version 1.9.1

Steps To Reproduce

I have created my source schemas as variables and target schemas for each model directory all in dbt-project.yml and catalogs in profiles.yml !
using compute with below configurations:
(I tried it firstly with lower run time versions for data bricks like 13 and after that as below I used 16 but no difference in the results

job_clusters:
        - job_cluster_key: Compute_cluster
          new_cluster:
            node_type_id: Standard_D4ds_v5
            num_workers: 1
            spark_version: 16.0.x-scala2.12 #15.4.x-scala2.12
            autotermination_minutes: 10
            spark_env_vars:
              "DATABRICKS_HOST": "{{secrets/A/DATABRICKS_HOST}}"
              "DATABRICKS_TOKEN": "{{secrets/A/${bundle.target}_DATABRICKS_TOKEN}}"
              "DATABRICKS_SQL_WAREHOUSE": /sql/1.0/warehouses/${var.DATABRICKS_SQL_WAREHOUSE}
            spark_conf:
              "spark.databricks.unityCatalog.enabled": "true"
              "spark.databricks.delta.autoOptimize.optimizeWrite": "true"
              "spark.databricks.delta.autoOptimize.autoCompact": "true"
              "spark.sql.adaptive.enabled": "true"
              "spark.sql.adaptive.coalescePartitions.enabled": "true"
              "spark.sql.adaptive.localShuffleReader.enabled": "true"
              "spark.sql.adaptive.skewJoin.enabled": "true"

and using dbt-databricks 1.9.1 facing this error:

Compilation Error in model dim_market_delivery (models/A/mymodel.sql)
  'dbt.adapters.databricks.relation.DatabricksRelation object' has no attribute 'catalog'

Expected behavior

As soon as I downgrade dbt-databricks to the 1.8.7 version everything will go well and error-free.

Screenshots and log output

 Completed with 1 error, 0 partial successes, and 0 warnings:
13:36:02  
13:36:02    Compilation Error in model dim_market_delivery (models/A/mymodel.sql)
  'dbt.adapters.databricks.relation.DatabricksRelation object' has no attribute 'catalog'
13:36:02  

System information

The output of dbt --version:

13:35:41  Running with dbt=1.9.1
13:35:41  Updating lock file in file path: /tmp/tmp-dbt-run-320706426674297/DBT/Nielsen/package-lock.yml
13:35:41  Installing dbt-labs/dbt_utils
13:35:42  Installed from version 1.3.0
13:35:42  Up to date!

+ dbt run --select nielsen_to_sp --target=dev --profiles-dir=./../
13:35:43  Running with dbt=1.9.1

The operating system you're using:

The output of python --version:
3.10

Additional context

Add any other context about the problem here.

@SamrtCookie SamrtCookie added the bug Something isn't working label Jan 29, 2025
@benc-db
Copy link
Collaborator

benc-db commented Jan 29, 2025

Thanks for reporting, investigating.

@benc-db
Copy link
Collaborator

benc-db commented Jan 29, 2025

Btw, I'm not sure this is relevant to the bug you are reporting, but we do not support job clusters for SQL execution.

@benc-db
Copy link
Collaborator

benc-db commented Jan 29, 2025

Can you share the stack trace found in the dbt.log for this error? Having trouble finding where we even attempt to call catalog on relation.

@SamrtCookie
Copy link
Author

@benc-db
Yes you are right in databricks asset bundles we can not execute models only with job computes we need a SQL warehouse compute as well and I am using I will copy the part of my asset bundle hob that is related to this model. It uses job compute and SQL warehouse compute at the same time.

 - task_key: execute_mymodel
          dbt_task:
            source: GIT
          # Project directory is the   path from the current file to the folder with dbt_project.yml
            project_directory: DBT/Myproject
          # --profiles-dir is the   path to the project directory defined above to the directory with profiles.yml
          #   This can not be omitted even when the profiles file is in the same location as the project directory
          #   or dbt will search for a default profiles file in the root folder (which is tmp during execution on databricks)
            commands:
              - "dbt deps --target=${bundle.target} --profiles-dir=./../"
              - "dbt run --select my_models --target=${bundle.target} --profiles-dir=./../"
            warehouse_id: ${var.DATABRICKS_SQL_WAREHOUSE}
          libraries:
          - pypi:
              package: "dbt-databricks==1.8.7"
          job_cluster_key: Compute_cluster
          depends_on:
            - task_key: previous_task

I also looked at dbt.log but did not find more error specifications.
I attached the dbt.log .

dbt.log

@benc-db
Copy link
Collaborator

benc-db commented Jan 30, 2025

Very strange...

@benc-db
Copy link
Collaborator

benc-db commented Feb 6, 2025

Is this still happening? I'm wondering if there is some upstream change, since I know they are working to introduce catalog (what we in Databricks call a metastore) as a type. Would it be possible to somehow print out all of the installed python libraries (particularly the versions of the dbt libraries)? Not sure how easy it is to do this in a Databricks Workflow.

@benc-db
Copy link
Collaborator

benc-db commented Feb 6, 2025

I think it's related to this: dbt-labs/dbt-adapters#286

If only we had a more helpful stack trace.

@benc-db
Copy link
Collaborator

benc-db commented Feb 6, 2025

@colin-rogers-dbt do you have any idea how to track this down?

@benc-db
Copy link
Collaborator

benc-db commented Feb 6, 2025

@SamrtCookie do you have any custom macros by any chance?

@SamrtCookie
Copy link
Author

@benc-db Yes I do. I have some macros example:

-- Macro to overwrite default schema name generation in order to keep naming uniform
{% macro generate_schema_name(custom_schema_name, node) -%}
    {%- set default_schema = target.schema -%}
    {%- if custom_schema_name is none -%}

        {{ default_schema }}

    {%- else -%}

        {{custom_schema_name}}

    {%- endif -%}

{%- endmacro %}

another example:

{% macro is_incremental() %}
    {% if not execute %}
        {{ return(False) }}
    {% else %}
        {% set relation = adapter.get_relation(this.catalog, this.schema, this.table) %}
        {{ return(relation is not none
                  and relation.type == 'table'
                  and model.config.materialized == 'incremental'
                  and not should_full_refresh()) }}
    {% endif %}
{% endmacro %}

@SamrtCookie
Copy link
Author

For now as a workaround I am using dbt-databricks==1.8.7 in my workflows. but yes the error is persisting.

@benc-db
Copy link
Collaborator

benc-db commented Feb 12, 2025

Could you try changing:

{% macro is_incremental() %}
    {% if not execute %}
        {{ return(False) }}
    {% else %}
        {% set relation = adapter.get_relation(this.catalog, this.schema, this.table) %}
        {{ return(relation is not none
                  and relation.type == 'table'
                  and model.config.materialized == 'incremental'
                  and not should_full_refresh()) }}
    {% endif %}
{% endmacro %}

to

{% macro is_incremental() %}
    {% if not execute %}
        {{ return(False) }}
    {% else %}
        {% set relation = adapter.get_relation(this.database, this.schema, this.table) %}
        {{ return(relation is not none
                  and relation.type == 'table'
                  and model.config.materialized == 'incremental'
                  and not should_full_refresh()) }}
    {% endif %}
{% endmacro %}

?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants