[Feature] Support configurable management of Table Optimisers for Iceberg tables #627
Open
1 task done
Labels
enhancement
New feature or request
Is this your first time submitting a feature request?
Describe the feature
https://aws.amazon.com/blogs/aws/aws-glue-data-catalog-now-supports-automatic-compaction-of-apache-iceberg-tables/
Honestly I have not fully thought through how it would work, hoping to spark some discussion in thread.
Perhaps just another config variable for this? E.g.
When
use_glue_automatic_compaction
is specified, then we would use the Glue{Create,Update}TableOptimizer
API operations to create the optimiser for compaction.Describe alternatives you've considered
You can just
OPTIMIZE {{ this.schema }}.{{ this.identifier }} ...
in yourpost_hook
yes, but on full-refresh of a very large table (e.g. requiringinsert_by_period
) this may fail due to timeout or the iceberg "not finished, please run compaction again" message. Regardless I think it would be good to let AWS just handle it.Caveat; I haven't actually tried to use the automatic compaction feature so I have no idea how it performs in practise. Maybe it just scan your entire table once a day and you get charged for 100 DPUs 😂.
Who will this benefit?
Anybody with large datasets in Iceberg. I would think quite a lot of overlap with users of
insert_by_period
.Are you interested in contributing this feature?
maybe, depends how much work it would be
Anything else?
#514 somewhat related, in the realm of "table optimisation management"
The text was updated successfully, but these errors were encountered: