Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Databricks plugin for Unity Catalog + SQL Warehouse compute #236

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

FlorianSchroevers
Copy link

@FlorianSchroevers FlorianSchroevers commented Oct 23, 2023

Description & motivation

resolves: #269

This change adds a Databricks plugin that can be used for Databricks using SQL Warehouse compute with Unity Catalog enabled, which is not possible using the spark plugin. I have used this fork of the repository successfully within my organization for several months now. It also moves the alter table ... recover partitions definition from the spark plugin to common, since it's named 'default'.

Differences between databricks plugin and spark plugin:

  • It removes the recover_partitions step from get_external_build_plan, since this does not work with SQL Warehouse compute (and is also not necessary).
  • It adds the 'database' parameter to the old_relation variable in get_external_build_plan, since Unity Catalog provides another layer in the hierarchy (catalogs).

I'd be happy to make changes to my fork if necessary.

Checklist

  • I have verified that these changes work locally
  • I have updated the README.md (if applicable)
  • I have added an integration test for my fix/feature (if applicable)

{%- set external = source_node.external -%}
{%- set partitions = external.partitions -%}
{%- set options = external.options -%}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love that you've created this PR. Been thinking about doing it myself for a while, but haven't gotten further than overriding the databricks__create_external_table and databricks__get_external_build_plan macros locally.

One idea you could take into account is adding the support for liquid clustering:
image

@dataders dataders modified the milestone: 1.0.0 Apr 4, 2024
@CodeGeek1212
Copy link

Love to use this one! Any chance this one is going to main?

@dataders
Copy link
Collaborator

@FlorianSchroevers @CodeGeek1212 @grindheim have y'all been using these macros already by chance by adding to your dbt project's macros/ directory?

I haven't yet had the chance to test them or this PR, but would like to understand how "ready for primetime" they are. Because setting up the CI infrastructure is a non-trivial amount of work in and of itself

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add back support for Databricks
4 participants