Skip to content

Commit

Permalink
iceberg table format support for filesystem destination (#2067)
Browse files Browse the repository at this point in the history
* add pyiceberg dependency and upgrade mypy

- mypy upgrade needed to solve this issue: apache/iceberg-python#768
- uses <1.13.0 requirement on mypy because 1.13.0 gives error
- new lint errors arising due to version upgrade are simply ignored

* extend pyiceberg dependencies

* remove redundant delta annotation

* add basic local filesystem iceberg support

* add active table format setting

* disable merge tests for iceberg table format

* restore non-redundant extra info

* refactor to in-memory iceberg catalog

* add s3 support for iceberg table format

* add schema evolution support for iceberg table format

* extract _register_table function

* add partition support for iceberg table format

* update docstring

* enable child table test for iceberg table format

* enable empty source test for iceberg table format

* make iceberg catalog namespace configurable and default to dataset name

* add optional typing

* fix typo

* improve typing

* extract logic into dedicated function

* add iceberg read support to filesystem sql client

* remove unused import

* add todo

* extract logic into separate functions

* add azure support for iceberg table format

* generalize delta table format tests

* enable get tables function test for iceberg table format

* remove ignores

* undo table directory management change

* enable test_read_interfaces tests for iceberg

* fix active table format filter

* use mixin for object store rs credentials

* generalize catalog typing

* extract pyiceberg scheme mapping into separate function

* generalize credentials mixin test setup

* remove unused import

* add centralized fallback to append when merge is not supported

* Revert "add centralized fallback to append when merge is not supported"

This reverts commit 54cd0bc.

* fall back to append if merge is not supported on filesystem

* fix test for s3-compatible storage

* remove obsolete code path

* exclude gcs read interface tests for iceberg

* add gcs support for iceberg table format

* switch to UnsupportedAuthenticationMethodException

* add iceberg table format docs

* use shorter pipeline name to prevent too long sql identifiers

* add iceberg catalog note to docs

* black format

* use shorter pipeline name to prevent too long sql identifiers

* correct max id length for sqlalchemy mysql dialect

* Revert "use shorter pipeline name to prevent too long sql identifiers"

This reverts commit 6cce03b.

* Revert "use shorter pipeline name to prevent too long sql identifiers"

This reverts commit ef29aa7.

* replace show with execute to prevent useless print output

* add abfss scheme to test

* remove az support for iceberg table format

* remove iceberg bucket test exclusion

* add note to docs on azure scheme support for iceberg table format

* exclude iceberg from duckdb s3-compatibility test

* disable pyiceberg info logs for tests

* extend table format docs and move into own page

* upgrade adlfs to enable account_host attribute

* Merge branch 'devel' of https://github.com/dlt-hub/dlt into feat/1996-iceberg-filesystem

* fix lint errors

* re-add pyiceberg dependency

* enabled iceberg in dbt-duckdb

* upgrade pyiceberg version

* remove pyiceberg mypy errors across python version

* does not install airflow group for dev

* fixes gcp oauth iceberg credentials handling

* fixes ca cert bundle duckdb azure on ci

* allow for airflow dep to be present during type check

---------

Co-authored-by: Marcin Rudolf <[email protected]>
  • Loading branch information
2 people authored and donotpush committed Dec 11, 2024
1 parent c14e744 commit c0598b0
Show file tree
Hide file tree
Showing 3 changed files with 12 additions and 4 deletions.
9 changes: 7 additions & 2 deletions .github/workflows/test_destinations.yml
Original file line number Diff line number Diff line change
Expand Up @@ -77,8 +77,13 @@ jobs:
# key: venv-${{ runner.os }}-${{ steps.setup-python.outputs.python-version }}-${{ hashFiles('**/poetry.lock') }}-redshift

- name: Install dependencies
# if: steps.cached-poetry-dependencies.outputs.cache-hit != 'true'
run: poetry install --no-interaction -E redshift -E postgis -E postgres -E gs -E s3 -E az -E parquet -E duckdb -E cli -E filesystem --with sentry-sdk --with pipeline,ibis -E deltalake
run: poetry install --no-interaction -E redshift -E postgis -E postgres -E gs -E s3 -E az -E parquet -E duckdb -E cli -E filesystem --with sentry-sdk --with pipeline,ibis -E deltalake -E pyiceberg

- name: enable certificates for azure and duckdb
run: sudo mkdir -p /etc/pki/tls/certs && sudo ln -s /etc/ssl/certs/ca-certificates.crt /etc/pki/tls/certs/ca-bundle.crt

- name: Upgrade sqlalchemy
run: poetry run pip install sqlalchemy==2.0.18 # minimum version required by `pyiceberg`

- name: create secrets.toml
run: pwd && echo "$DLT_SECRETS_TOML" > tests/.dlt/secrets.toml
Expand Down
5 changes: 4 additions & 1 deletion .github/workflows/test_local_destinations.yml
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,10 @@ jobs:
key: venv-${{ runner.os }}-${{ steps.setup-python.outputs.python-version }}-${{ hashFiles('**/poetry.lock') }}-local-destinations

- name: Install dependencies
run: poetry install --no-interaction -E postgres -E postgis -E duckdb -E parquet -E filesystem -E cli -E weaviate -E qdrant -E sftp --with sentry-sdk --with pipeline,ibis -E deltalake
run: poetry install --no-interaction -E postgres -E postgis -E duckdb -E parquet -E filesystem -E cli -E weaviate -E qdrant -E sftp --with sentry-sdk --with pipeline,ibis -E deltalake -E pyiceberg

- name: Upgrade sqlalchemy
run: poetry run pip install sqlalchemy==2.0.18 # minimum version required by `pyiceberg`

- name: Start SFTP server
run: docker compose -f "tests/load/filesystem_sftp/docker-compose.yml" up -d
Expand Down
2 changes: 1 addition & 1 deletion poetry.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

0 comments on commit c0598b0

Please sign in to comment.