-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running RECOVER PARTITIONS without defining partitions #126
Comments
Running ALTER TABLE RECOVER PARTITIONS or MSCK REPAIR TABLE on a table that does not have partitions causes an error. So always running it is not an option. I've created a quick fix to this issue by running ALTER TABLE RECOVER PARTITIONS if external.recover_partitions is true. This means that you can do this in your sources.yml: external:
recover_partitions: true An even better alternative would be to use 'DESCRIBE TABLE' (or something similar) to determine if the table has partitions and run ALTER TABLE RECOVER PARTITIONS accordingly. I can create a PR of my quick fix, if that is sufficient. |
@jarno-r Definitely open to a PR for this one! I like the idea of an explicit option for specifying that dbt + Databricks should recover/infer the partitions, when I'm not strictly opposed to the cleverer approach, where dbt uses |
I've created a PR. Link above. |
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days. |
Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers. |
@jtcohen6 This is still relevant for us. We've been running our own version of this package just to have this feature. It is cumbersome to maintain. Could the PR be merged? |
Describe the feature
In Databricks you can recover partitions from existing parquet files when creating a table without partitions. When you define the partitions in the source (dbt) you also need to define the schema of the table. If you don't define a schema without partitions, the recover partitions still works, only the dbt_external_tables won't run the recover partitions part (cause no partitions).
Describe alternatives you've considered
Run it manually afterwards; Think a nice fix would be to supply it as a parameter or the ability to run it from cli afterwards.
Additional context
I asume this is only Spark / Databricks related.
Who will this benefit?
Anyone using parquet sources that are partitioned, but don't want to supply a schema in the source file (dbt).
The text was updated successfully, but these errors were encountered: