Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running RECOVER PARTITIONS without defining partitions #126

Closed
ferdyh opened this issue Jan 5, 2022 · 6 comments · May be fixed by #136
Closed

Running RECOVER PARTITIONS without defining partitions #126

ferdyh opened this issue Jan 5, 2022 · 6 comments · May be fixed by #136
Labels
enhancement New feature or request Stale

Comments

@ferdyh
Copy link

ferdyh commented Jan 5, 2022

Describe the feature

In Databricks you can recover partitions from existing parquet files when creating a table without partitions. When you define the partitions in the source (dbt) you also need to define the schema of the table. If you don't define a schema without partitions, the recover partitions still works, only the dbt_external_tables won't run the recover partitions part (cause no partitions).

Describe alternatives you've considered

Run it manually afterwards; Think a nice fix would be to supply it as a parameter or the ability to run it from cli afterwards.

Additional context

I asume this is only Spark / Databricks related.

Who will this benefit?

Anyone using parquet sources that are partitioned, but don't want to supply a schema in the source file (dbt).

@ferdyh ferdyh added the enhancement New feature or request label Jan 5, 2022
@jarno-r
Copy link

jarno-r commented Feb 9, 2022

Running ALTER TABLE RECOVER PARTITIONS or MSCK REPAIR TABLE on a table that does not have partitions causes an error. So always running it is not an option.

I've created a quick fix to this issue by running ALTER TABLE RECOVER PARTITIONS if external.recover_partitions is true. This means that you can do this in your sources.yml:

external:
  recover_partitions: true

An even better alternative would be to use 'DESCRIBE TABLE' (or something similar) to determine if the table has partitions and run ALTER TABLE RECOVER PARTITIONS accordingly.
This would require changes to dbt-spark.

I can create a PR of my quick fix, if that is sufficient.

@jtcohen6 jtcohen6 removed the triage label Feb 28, 2022
@jtcohen6
Copy link
Collaborator

@jarno-r Definitely open to a PR for this one!

I like the idea of an explicit option for specifying that dbt + Databricks should recover/infer the partitions, when partitions is not itself defined.

I'm not strictly opposed to the cleverer approach, where dbt uses describe table to determine this on the user's behalf... but an explicit config feels in keeping with the approach on other databases that can infer partitions. As a general rule, I try to keep this package as a lightweight lens into each database's capabilities, without too many magic tricks behind the scenes.

@jarno-r
Copy link

jarno-r commented Mar 1, 2022

I've created a PR. Link above.

@github-actions
Copy link

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

@github-actions github-actions bot added the Stale label Jul 29, 2023
@github-actions
Copy link

github-actions bot commented Aug 6, 2023

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Aug 6, 2023
@jarno-r
Copy link

jarno-r commented Aug 9, 2023

@jtcohen6 This is still relevant for us. We've been running our own version of this package just to have this feature. It is cumbersome to maintain. Could the PR be merged?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Stale
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants