dbt-databricks 1.6.x (Release TBD)

Features

Follow up: re-implement fix for issue where the show tables extended command is limited to 2048 characters. (#326). Set DBT_DESCRIBE_TABLE_2048_CHAR_BYPASS to true to enable this behaviour.

Under the hood

Dropping the databricks_sql_endpoint test profile as not truly testing different behavior than databricks_uc_sql_endpoint profile (#417)

dbt-databricks 1.6.1 (August 2, 2023)

Fixes

Revert change from #326 as it breaks DESCRIBE table in cases where the dbt API key does not have access to all tables in the schema

dbt-databricks 1.6.0 (August 2, 2023)

Features

Support for dbt-core==1.6
Added support for materialized_view and streaming_table materializations
Support dbt clone operation
Support new dbt limit command-line flag

Fixes

Fix issue where the show tables extended command is limited to 2048 characters. (#326)
Extend python model support to cover the same config options as SQL (#379)

Features

Add liquid_clustered_by config to enable Liquid Clustering for Delta-based dbt models.

Other

Drop support for Python 3.7
Support for revamped dbt debug

dbt-databricks 1.5.5 (July 7, 2023)

Fixes

Fixed issue where starting a terminated cluster in the python path would never return

Features

Include log events from databricks-sql-connector in dbt logging output.
Adapter now populates the query_id field in run_results.json with Query History API query ID.

dbt-databricks 1.5.4 (June 9, 2023)

Features

Added support for model contracts (#336)

dbt-databricks 1.5.3 (June 8, 2023)

Fixes

Pins dependencies to minor versions
Sets default socket timeout to 180s

dbt-databricks 1.5.2 (May 17, 2023)

Fixes

Sets databricks sdk dependency to 0.1.6 to avoid SDK breaking changes

dbt-databricks 1.5.1 (May 9, 2023)

Fixes

Add explicit dependency to protobuf >4 to work around dbt-core issue

dbt-databricks 1.5.0 (May 2, 2023)

Features

Added support for OAuth (SSO and client credentials) (#327)

Fixes

Fix integration tests (#316)

Dependencies

Updated dbt-spark from >=1.4.1 to >= 1.5.0 (#316)

Under the hood

Throw an error if a model has an enforced contract. (#322)

dbt-databricks 1.4.3 (April 19, 2023)

Fixes

fix database not found error matching (#281)
Auto start cluster for Python models (#306)
databricks-sql-connector to 2.5.0 (#311)

Features

Adding replace_where incremental strategy (#293) (#310)
[feat] Support ZORDER as a model config (#292) (#297)

Dependencies

Added keyring>=23.13.0 for oauth token cache
Added databricks-sdk>=0.1.1 for oauth flows
Updated databricks-sql-connector from >=2.4.0 to >= 2.5.0

Under the hood

Throw an error if a model has an enforced contract. (#322)

dbt-databricks 1.4.2 (February 17, 2023)

Fixes

Fix test_grants to use the error class to check the error. (#273)
Raise exception on unexpected error of list relations (#270)

dbt-databricks 1.4.1 (January 31, 2023)

Fixes

Ignore case sensitivity in relation matches method. (#265)

dbt-databricks 1.4.0 (January 25, 2023)

Breaking changes

Raise an exception when schema contains '.'. (#222)
- Containing a catalog in schema is not allowed anymore.
- Need to explicitly use catalog instead.

Features

Support Python 3.11 (#233)
Support incremental_predicates (#161)
Apply connection retry refactor, add defaults with exponential backoff (#137)
Quote by Default (#241)
Avoid show table extended command. (#231)
Use show table extended with table name list for get_catalog. (#237)
Add support for a glob pattern in the databricks_copy_into macro (#259)

dbt-databricks 1.3.2 (November 9, 2022)

Fixes

Fix copy into macro when passing expression_list. (#223)
Partially revert to fix the case where schema config contains uppercase letters. (#224)

dbt-databricks 1.3.1 (November 1, 2022)

Under the hood

Show and log a warning when schema contains '.'. (#221)

dbt-databricks 1.3.0 (October 14, 2022)

Features

Support python model through run command API, currently supported materializations are table and incremental. (dbt-labs/dbt-spark#377, #126)
Enable Pandas and Pandas-on-Spark DataFrames for dbt python models (dbt-labs/dbt-spark#469, #181)
Support job cluster in notebook submission method (dbt-labs/dbt-spark#467, #194)
- In all_purpose_cluster submission method, a config http_path can be specified in Python model config to switch the cluster where Python model runs.
```
def model(dbt, _):
    dbt.config(
        materialized='table',
        http_path='...'
    )
    ...
```
Use builtin timestampadd and timestampdiff functions for dateadd/datediff macros if available (#185)
Implement testing for a test for various Python models (#189)
Implement testing for type_boolean in Databricks (dbt-labs/dbt-spark#471, #188)
Add a macro to support COPY INTO (#190)

Under the hood

Apply "Initial refactoring of incremental materialization" (#148)
- Now dbt-databricks uses adapter.get_incremental_strategy_macro instead of dbt_spark_get_incremental_sql macro to dispatch the incremental strategy macro. The overwritten dbt_spark_get_incremental_sql macro will not work anymore.
Better interface for python submission (dbt-labs/dbt-spark#452, #178)

dbt-databricks 1.2.3 (September 26, 2022)

Fixes

Fix cancellation (#173)
http_headers should be dict in the profile (#174)

dbt-databricks 1.2.2 (September 8, 2022)

Fixes

Data is duplicated on reloading seeds that are using an external table (#114, #149)

Under the hood

Explicitly close cursors (#163)
Upgrade databricks-sql-connector to 2.0.5 (#166)
Embed dbt-databricks and databricks-sql-connector versions to SQL comments (#167)

dbt-databricks 1.2.1 (August 24, 2022)

Features

Support Python 3.10 (#158)

dbt-databricks 1.2.0 (August 16, 2022)

Features

Add grants to materializations (dbt-labs/dbt-spark#366, dbt-labs/dbt-spark#381)
Add connection_parameters for databricks-sql-connector connection parameters (#135)
- This can be used to customize the connection by setting additional parameters.
- The full parameters are listed at Databricks SQL Connector for Python.
- Currently, the following parameters are reserved for dbt-databricks. Please use the normal credential settings instead.
  - server_hostname
  - http_path
  - access_token
  - session_configuration
  - catalog
  - schema

Fixes

Incremental materialization updated to not drop table first if full refresh for delta lake format, as it already runs create or replace table (dbt-labs/dbt-spark#286, dbt-labs/dbt-spark#287)

Under the hood

Update SparkColumn.numeric_type to return decimal instead of numeric, since SparkSQL exclusively supports the former (dbt-labs/dbt-spark#380)
Make minimal changes to support dbt Core incremental materialization refactor (dbt-labs/dbt-spark#402, dbt-labs/dbt-spark#394, #136)
Add new basic tests TestDocsGenerateDatabricks and TestDocsGenReferencesDatabricks (#134)
Set upper bound for databricks-sql-connector when Python 3.10 (#154)
- Note that databricks-sql-connector does not officially support Python 3.10 yet.

Contributors

@grindheim (dbt-labs/dbt-spark#287)

dbt-databricks 1.1.1 (July 19, 2022)

Features

Support for Databricks CATALOG as a DATABASE in DBT compilations (#95, #89, #94, #105)
- Setting an initial catalog with session_properties is deprecated and will not work in the future release. Please use catalog or database to set the initial catalog.
- When using catalog, spark_build_snapshot_staging_table macro will not be used. If trying to override the macro, databricks_build_snapshot_staging_table should be overridden instead.

Fixes

Block taking jinja2.runtime.Undefined into DatabricksAdapter (#98)
Avoid using Cursor.schema API when database is None (#100)

Under the hood

Drop databricks-sql-connector 1.0 (#108)

dbt-databricks 1.1.0 (May 11, 2022)

Features

Add support for Delta constraints (#71)

Under the hood

Port testing framework changes from dbt-labs/dbt-spark#299 and dbt-labs/dbt-spark#314 (#70)

dbt-databricks 1.0.3 (April 26, 2022)

Fixes

Make internal macros use macro dispatch pattern (#72)

dbt-databricks 1.0.2 (March 31, 2022)

Features

Support for setting table properties as part of a model configuration (#33, #49)
Get the session_properties map to work (#57)
Bump up databricks-sql-connector to 1.0.1 and use the Cursor APIs (#50)

dbt-databricks 1.0.1 (February 8, 2022)

Features

Inherit from dbt-spark for backward compatibility with spark-utils and other dbt packages (#32, #35)
Add SQL Endpoint specific integration tests (#45, #46)

Fixes

Close the connection properly (#34, #37)

dbt-databricks 1.0.0 (December 6, 2021)

Features

Make the connection use databricks-sql-connector (#3, #7)
Make the default file format 'delta' (#14, #16)
Make the default incremental strategy 'merge' (#23)
Remove unnecessary stack trace (#10)

dbt-spark 1.0.0 (December 3, 2021)

Fixes

Incremental materialization corrected to respect full_refresh config, by using should_full_refresh() macro (#260, #262)

Contributors

@grindheim (#262)

dbt-spark 1.0.0rc2 (November 24, 2021)

Features

Add support for Apache Hudi (hudi file format) which supports incremental merge strategies (#187, #210)

Under the hood

Refactor seed macros: remove duplicated code from dbt-core, and provide clearer logging of SQL parameters that differ by connection method (#249, #250)
Replace sample_profiles.yml with profile_template.yml, for use with new dbt init (#247)

Contributors

@vingov (#210)

dbt-spark 1.0.0rc1 (November 10, 2021)

Under the hood

Remove official support for python 3.6, which is reaching end of life on December 23, 2021 (dbt-core#4134, #253)
Add support for structured logging (#251)

dbt-spark 0.21.1 (Release TBD)

dbt-spark 0.21.1rc1 (November 3, 2021)

Fixes

Fix --store-failures for tests, by suppressing irrelevant error in comment_clause() macro (#232, #233)
Add support for on_schema_change config in incremental models: ignore, fail, append_new_columns. For sync_all_columns, removing columns is not supported by Apache Spark or Delta Lake (#198, #226, #229)
Add persist_docs call to incremental model (#224, #234)

Contributors

@binhnefits (#234)

dbt-spark 0.21.0 (October 4, 2021)

Fixes

Enhanced get_columns_in_relation method to handle a bug in open source deltalake which doesnt return schema details in show table extended in databasename like '*' query output. This impacts dbt snapshots if file format is open source deltalake (#207)
Parse properly columns when there are struct fields to avoid considering inner fields: Issue (#202)

Under the hood

Add unique_field to better understand adapter adoption in anonymous usage tracking (#211)

Contributors

@harryharanb (#207)
@SCouto (#204)

dbt-spark 0.21.0b2 (August 20, 2021)

Fixes

Add pyodbc import error message to dbt.exceptions.RuntimeException to get more detailed information when running dbt debug (#192)
Add support for ODBC Server Side Parameters, allowing options that need to be set with the SET statement to be used (#201)
Add retry_all configuration setting to retry all connection issues, not just when the _is_retryable_error function determines (#194)

Contributors

@JCZuurmond (#192)
@jethron (#201)
@gregingenii (#194)

dbt-spark 0.21.0b1 (August 3, 2021)

dbt-spark 0.20.1 (August 2, 2021)

dbt-spark 0.20.1rc1 (August 2, 2021)

Fixes

Fix get_columns_in_relation when called on models created in the same run (#196, #197)

Contributors

@ali-tny (#197)

dbt-spark 0.20.0 (July 12, 2021)

dbt-spark 0.20.0rc2 (July 7, 2021)

Features

Add support for merge_update_columns config in merge-strategy incremental models (#183, #184)

Fixes

Fix column-level persist_docs on Delta tables, add tests (#180)

dbt-spark 0.20.0rc1 (June 8, 2021)

Features

Allow user to specify use_ssl (#169)
Allow setting table OPTIONS using config (#171)
Add support for column-level persist_docs on Delta tables (#84, #170)

Fixes

Cast table_owner to string to avoid errors generating docs (#158, #159)
Explicitly cast column types when inserting seeds (#139, #166)

Under the hood

Parse information returned by list_relations_without_caching macro to speed up catalog generation (#93, #160)
More flexible host passing, https:// can be omitted (#153)

Contributors

@friendofasquid (#159)
@franloza (#160)
@Fokko (#165)
@rahulgoyal2987 (#169)
@JCZuurmond (#171)
@cristianoperez (#170)

dbt-spark 0.19.1 (April 2, 2021)

dbt-spark 0.19.1b2 (February 26, 2021)

Under the hood

Update serialization calls to use new API in dbt-core 0.19.1b2 (#150)

dbt-spark 0.19.0.1 (February 26, 2021)

Fixes

Fix package distribution to include incremental model materializations (#151, #152)

dbt-spark 0.19.0 (February 21, 2021)

Breaking changes

Incremental models have incremental_strategy: append by default. This strategy adds new records without updating or overwriting existing records. For that, use merge or insert_overwrite instead, depending on the file format, connection method, and attributes of your underlying data. dbt will try to raise a helpful error if you configure a strategy that is not supported for a given file format or connection. (#140, #141)

Fixes

Capture hard-deleted records in snapshot merge, when invalidate_hard_deletes config is set (#109, #126)

dbt-spark 0.19.0rc1 (January 8, 2021)

Breaking changes

Users of the http and thrift connection methods need to install extra requirements: pip install dbt-spark[PyHive] (#109, #126)

Under the hood

Enable CREATE OR REPLACE support when using Delta. Instead of dropping and recreating the table, it will keep the existing table, and add a new version as supported by Delta. This will ensure that the table stays available when running the pipeline, and you can track the history.
Add changelog, issue templates (#119, #120)

Fixes

Handle case of 0 retries better for HTTP Spark Connections (#132)

Contributors

@danielvdende (#132)
@Fokko (#125)

dbt-spark 0.18.1.1 (November 13, 2020)

Fixes

Fix extras_require typo to enable pip install dbt-spark[ODBC] ((#121), (#122))

dbt-spark 0.18.1 (November 6, 2020)

Features

Allows users to specify auth and kerberos_service_name (#107)
Add support for ODBC driver connections to Databricks clusters and endpoints (#116)

Under the hood

Updated README links (#115)
Support complete atomic overwrite of non-partitioned incremental models (#117)
Update to support dbt-core 0.18.1 (#110, #118)

Contributors

@danielhstahl (#107)
@collinprather (#115)
@charlottevdscheun (#117)
@Fokko (#117)

dbt-spark 0.18.0 (September 18, 2020)

Under the hood

Make a number of changes to support dbt-adapter-tests (#103)
Update to support dbt-core 0.18.0. Run CI tests against local Spark, Databricks (#105)

Files

CHANGELOG.md

Latest commit

History

CHANGELOG.md

File metadata and controls

dbt-databricks 1.6.x (Release TBD)

Features

Under the hood

dbt-databricks 1.6.1 (August 2, 2023)

Fixes

dbt-databricks 1.6.0 (August 2, 2023)

Features

Fixes

Features

Other

dbt-databricks 1.5.5 (July 7, 2023)

Fixes

Features

dbt-databricks 1.5.4 (June 9, 2023)

Features

dbt-databricks 1.5.3 (June 8, 2023)

Fixes

dbt-databricks 1.5.2 (May 17, 2023)

Fixes

dbt-databricks 1.5.1 (May 9, 2023)

Fixes

dbt-databricks 1.5.0 (May 2, 2023)

Features

Fixes

Dependencies

Under the hood

dbt-databricks 1.4.3 (April 19, 2023)

Fixes

Features

Dependencies

Under the hood

dbt-databricks 1.4.2 (February 17, 2023)

Fixes

dbt-databricks 1.4.1 (January 31, 2023)

Fixes

dbt-databricks 1.4.0 (January 25, 2023)

Breaking changes

Features

dbt-databricks 1.3.2 (November 9, 2022)

Fixes

dbt-databricks 1.3.1 (November 1, 2022)

Under the hood

dbt-databricks 1.3.0 (October 14, 2022)

Features

Under the hood

dbt-databricks 1.2.3 (September 26, 2022)

Fixes

dbt-databricks 1.2.2 (September 8, 2022)

Fixes

Under the hood

dbt-databricks 1.2.1 (August 24, 2022)

Features

dbt-databricks 1.2.0 (August 16, 2022)

Features

Fixes

Under the hood

Contributors

dbt-databricks 1.1.1 (July 19, 2022)

Features

Fixes

Under the hood

dbt-databricks 1.1.0 (May 11, 2022)

Features

Under the hood

dbt-databricks 1.0.3 (April 26, 2022)

Fixes

dbt-databricks 1.0.2 (March 31, 2022)

Features

dbt-databricks 1.0.1 (February 8, 2022)

Features

Fixes

dbt-databricks 1.0.0 (December 6, 2021)

Features

dbt-spark 1.0.0 (December 3, 2021)