You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using s3.to_parquet to update a parquet file that is partitioned by a time interval or a timestamp "attribute" (such as year, month, hour, etc.), the function fails because for this mode the implementation assumes that the values of partition_cols are names of the parquet / table columns, and it does not find something like hour(column) in the dataframe columns.
I think the problem is this line, which uses the function delete_from_iceberg_table, which expects column names.
How to Reproduce
Expected behavior
I expect the partition_cols option to accept anything that can be used to partition a parquet. In particular, anything that is accepted when the argument mode is append or overwrite instead of overwrite_partitions.
Your project
No response
Screenshots
No response
OS
Ubuntu 22.04
Python version
3.10
AWS SDK for pandas version
3.7.3
Additional context
No response
The text was updated successfully, but these errors were encountered:
Unfortunately, because this implementation of to_iceberg relies on a mesh of Pandas and Athena queries, we can't currently support this option of using a partition transform function with mode="overwrite_partititons". However, we are exploring other APIs for refactoring to_iceberg, such as PyIceberg or other AWS Glue APIs, which would allow us to support this in the future.
Marking this issue as stale due to inactivity. This helps our maintainers find and focus on the active issues. If this issue receives no comments in the next 7 days it will automatically be closed.
Describe the bug
When using
s3.to_parquet
to update a parquet file that is partitioned by a time interval or a timestamp "attribute" (such as year, month, hour, etc.), the function fails because for this mode the implementation assumes that the values ofpartition_cols
are names of the parquet / table columns, and it does not find something likehour(column)
in the dataframe columns.I think the problem is this line, which uses the function
delete_from_iceberg_table
, which expects column names.How to Reproduce
Expected behavior
I expect the
partition_cols
option to accept anything that can be used to partition a parquet. In particular, anything that is accepted when the argumentmode
isappend
oroverwrite
instead ofoverwrite_partitions
.Your project
No response
Screenshots
No response
OS
Ubuntu 22.04
Python version
3.10
AWS SDK for pandas version
3.7.3
Additional context
No response
The text was updated successfully, but these errors were encountered: