Skip to content

awswrangler.athena.to_iceberg not supporting to synchronous/parallel lambda instances. #2651

Closed as not planned
@B161851

Description

@B161851

Describe the bug

wr.athena.to_iceberg(
            df=df,
            database='test_database',
            table='my_table2',
            table_location='s3://bucket-testing1/my_table2/',
            temp_path=f's3://bucket-testing1/temp_path/',
            keep_files=True
        )

For parallel writing, If keep_files=True then it is resulting the duplicates and I tried appending the nano timestamp to the temporary path so it's unique but now I have "ICEBERG_COMMIT_ERROR"
If keep_files=False then it is giving "HIVE_CANNOT_OPEN_SPLIT NoSuchKey Error" when ingesting iceberg data in parallel
and we observed if keep_files=False then in that library entire temp_path was removed from the s3 and getting the above error.

It's not supporting to write to the iceberg table using wrangler from lambda.
So, how can we overcome the above issues in lambda parallel writing to iceberg table using awswrangler.

How to Reproduce

wr.athena.to_iceberg(
            df=df,
            database='test_database',
            table='my_table2',
            table_location='s3://bucket-testing1/my_table2/',
            temp_path=f's3://bucket-testing1/temp_path/',
            keep_files=False
        )

we observed if keep_files=False then in that library entire temp_path was removed from the s3 and resulted "HIVE_CANNOT_OPEN_SPLIT NoSuchKey Error"
if you remove the particular parquet file from the temp_path instead of removing entire temp_path from s3, I think might give the above error.

Expected behavior

No response

Your project

No response

Screenshots

No response

OS

Win

Python version

3.8

AWS SDK for pandas version

12

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions