Description
Describe the bug
wr.athena.to_iceberg(
df=df,
database='test_database',
table='my_table2',
table_location='s3://bucket-testing1/my_table2/',
temp_path=f's3://bucket-testing1/temp_path/',
keep_files=True
)
For parallel writing, If keep_files=True then it is resulting the duplicates and I tried appending the nano timestamp to the temporary path so it's unique but now I have "ICEBERG_COMMIT_ERROR"
If keep_files=False then it is giving "HIVE_CANNOT_OPEN_SPLIT NoSuchKey Error" when ingesting iceberg data in parallel
and we observed if keep_files=False then in that library entire temp_path was removed from the s3 and getting the above error.
It's not supporting to write to the iceberg table using wrangler from lambda.
So, how can we overcome the above issues in lambda parallel writing to iceberg table using awswrangler.
How to Reproduce
wr.athena.to_iceberg(
df=df,
database='test_database',
table='my_table2',
table_location='s3://bucket-testing1/my_table2/',
temp_path=f's3://bucket-testing1/temp_path/',
keep_files=False
)
we observed if keep_files=False then in that library entire temp_path was removed from the s3 and resulted "HIVE_CANNOT_OPEN_SPLIT NoSuchKey Error"
if you remove the particular parquet file from the temp_path instead of removing entire temp_path from s3, I think might give the above error.
Expected behavior
No response
Your project
No response
Screenshots
No response
OS
Win
Python version
3.8
AWS SDK for pandas version
12
Additional context
No response