Support Snowflake-Managed Iceberg Tables via SnowflakeCatalog #1834
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Closes #685.
Rationale for this change
Reopens PR #687 that adds a Snowflake Catalog that was closed. I addressed comments and applied some additional changes based on errors found when used with the Bodo data processing library.
One way Snowflake supports Iceberg is via managed tables, where Snowflake has both read and write access to these tables. They are basically regular Snowflake tables with an Iceberg backend. Outside of Snowflake, these tables are read-only. To work with them, we wrap some SQL calls in a Catalog API.
I skipped some of the less-commonly used APIs that can be filled in later.
Are these changes tested?
Tested manually by itself and with the Bodo library on both AWS and Azure. Some of the Azure tests don't current work because Snowflake uses path prefixes like
wasb://
,wasbs://
, etc. Waiting for the other PR for support for PyArrowFileIO w/ Azure.Also copied the mock tests from the original PR.
Are there any user-facing changes?
Users can read and query Snowflake-managed Iceberg tables, with minimal write operations.