Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-975] [Feature] Parity on create_table_as for python and SQL model #414

Open
ChenyuLInx opened this issue Aug 2, 2022 · 7 comments
Open
Labels
enhancement New feature or request python_models issues related to python model

Comments

@ChenyuLInx
Copy link
Contributor

ChenyuLInx commented Aug 2, 2022

Describe the feature

spark__create_table_as macro support options including partition_by, clustered_by, file_format, location_root, and more options defined in options_clause.

Right now in python models we are just saving everything as delta format with the default setting. We should reach parity for this for python models where possible and raise a clear error when running with options that is not supported.

Motivation:

User would be able to optimize the storage format based on their usage of the table.

Acceptance criteria

Python model would materialize the table with the correct option, and raise error when unsupported option is being specified.

Tests for the PR

You should add integration tests to run the table materialization with supported options, then check that the table has intended property, for example SHOW PARTITION table(link) can be used to check partitions. You should also add tests to to make sure we raised the error on unsupported options.

@ChenyuLInx ChenyuLInx added enhancement New feature or request triage labels Aug 2, 2022
@github-actions github-actions bot changed the title [Feature] Parity on create_table_as for python and SQL model [CT-975] [Feature] Parity on create_table_as for python and SQL model Aug 2, 2022
@ChenyuLInx ChenyuLInx added python_models issues related to python model and removed triage labels Aug 2, 2022
@BulyginMaksim
Copy link

Hi!
After setting up my first python-model in dbt I've found out that partitioning is not supported in dbt python-models and found this issue.
Are there any updations on this feature?

@xg1990
Copy link

xg1990 commented Mar 23, 2023

Try to update https://github.com/xg1990/dbt-spark/blob/feature/partition-for-py-model/dbt/include/spark/macros/materializations/table.sql
however got the following error with dbt-core-1.4.5 dbt-spark-1.4.1:

07:12:24    '_MISSING_TYPE' object is not callable
07:12:24    
07:12:24    > in macro py_script_postfix (macros/python_model/python.sql)
07:12:24    > called by model py_part (models/raw/aks_logs/py_part.py)
07:12:24

@talperetz1
Copy link

I created my first python model with dbt and its look like the config for location_root and partition by do nothing. I found this issue, is there any updates or progress about this?

@AlbertoRguezConesa
Copy link

Do you happen to have any updates on the location_root and partition_by issues?

@srggrs
Copy link

srggrs commented Sep 15, 2023

same here! +1 to this as it seems that at least spark session adapter would have the partitionBy method https://sparkbyexamples.com/pyspark/pyspark-partitionby-example/

Copy link
Contributor

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

@github-actions github-actions bot added the Stale label Mar 14, 2024
@ConstantinoSchillebeeckx

Any movement on this?

@github-actions github-actions bot removed the Stale label Mar 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request python_models issues related to python model
Projects
None yet
Development

No branches or pull requests

7 participants