-
Notifications
You must be signed in to change notification settings - Fork 700
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Athena to iceberg method not writting data to columns that are new in the schema #2997
Comments
Hi @lautarortega thanks for opening this!
Just to double confirm and make sure I understand the issue, since you are appending data, any existing data in the table would not have values for the new columns. Are you appending or overwriting? Or is the table empty when you are appending? Note: there was a fix merged in #2982 to a related issue regarding how Iceberg treats new columns. I recommend to upgrade to AWS SDK for pandas 3.10.0. |
Hi @kukushking, thanks for reaching out. I am appending data to the table. The problem is with the new data.
Local df
Athena post append:
Expected table:
I did some tests, and I think it might be related to the fact the the table was created with a Glue job. When creating a table from scratch with AWSWrangler and only using wrangler, seems to work just fine. My current workaround is doing a write once. It will have missing data, but it will update the schema. I then delete that las batch of data from the Athena console, and then write the data again. I think that not having to evolve the schema makes it work fine. |
Hi @lautarortega thanks - which version of AWS SDK for pandas are you using? The pull request that I linked above fixes representation of the current Iceberg columns and 3.10.0 version should display data for all columns. Additionally, verify that latest Glue schema contains the new column. |
I was running 3.9.0. I tested today 3.10.1 and it is failing in a new way, that 3.9.0 wasn't. _utils.py line 41: These are the parameters my fields are getting, so nothing related to iceberg.field.current |
Describe the bug
I have a table that was created by a glue job. I want to append data to that table using AWS Wrangler. The writting process seems to work fine, but when I check on Athena, the columns that were not there before are added but appear to be completely empty, while there were no nulls in my dataframe.
If I delete the rows I appended and write the data again using AWS Wrangler, the table is updated correctly, since the columns are not new anymore.
How to Reproduce
I tried replicating the issue using just AWS Wrangler and I could not do it.
Try having a glue job create an iceberg table and then try to update this table with an extra column using wrangler.
Expected behavior
No response
Your project
No response
Screenshots
No response
OS
Mac
Python version
3.10
AWS SDK for pandas version
3.9.0
Additional context
No response
The text was updated successfully, but these errors were encountered: