Skip to content

Use version-hint.text for StaticTable #1887

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 17, 2025

Conversation

arnaudbriche
Copy link
Contributor

@arnaudbriche arnaudbriche commented Apr 7, 2025

This change allow making use of the version-hint.text file when a static table is instantiated with a metadata_location not ending with '.metadata.json'.
User can just point to the table location, and metadata file path will be read from version-hint.text.

Closes #763

Rationale for this change

version-hint.text is useful in context where you does not want or need a full-fledge catalog.
Our use case is sharing datasets publicly as Iceberg tables on S3.

Are these changes tested?

No yet.

Are there any user-facing changes?

Yes. User can now points StaticTable to the table location rather than a specific version file.

Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for raising this @arnaudbriche 🙌

How would you feel about:

  • Adding a test to make sure that this works, and we don't break it in the future
  • Adding a line of two to the docs so people know that it is here

@arnaudbriche
Copy link
Contributor Author

arnaudbriche commented Apr 8, 2025

Thanks for raising this @arnaudbriche 🙌

How would you feel about:

  • Adding a test to make sure that this works, and we don't break it in the future
  • Adding a line of two to the docs so people know that it is here

Sure.
I just did not managed to run tests locally yet.

I'm not doing much Python these days, so I must miss something.

I tried:

poetry install
poetry run pytest

But I go the following error:

poetry run pytest
=========================================================================================== test session starts ===========================================================================================
platform darwin -- Python 3.13.2, pytest-7.4.4, pluggy-1.5.0
rootdir: /Users/arnaudbriche/Desktop/code/iceberg-python
configfile: pyproject.toml
plugins: checkdocs-2.13.0, mock-3.14.0, lazy-fixture-0.6.3, requests-mock-1.12.1
collected 2777 items / 12 errors

================================================================================================= ERRORS ==================================================================================================
________________________________________________________________________________ ERROR collecting tests/test_transforms.py ________________________________________________________________________________
ImportError while importing test module '/Users/arnaudbriche/Desktop/code/iceberg-python/tests/test_transforms.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/homebrew/Cellar/[email protected]/3.13.2/Frameworks/Python.framework/Versions/3.13/lib/python3.13/importlib/__init__.py:88: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/test_transforms.py:27: in <module>
    import pytz
E   ModuleNotFoundError: No module named 'pytz'
_______________________________________________________________________________ ERROR collecting tests/catalog/test_base.py _______________________________________________________________________________
ImportError while importing test module '/Users/arnaudbriche/Desktop/code/iceberg-python/tests/catalog/test_base.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/homebrew/Cellar/[email protected]/3.13.2/Frameworks/Python.framework/Versions/3.13/lib/python3.13/importlib/__init__.py:88: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/catalog/test_base.py:29: in <module>
    from pyiceberg.catalog.memory import InMemoryCatalog
pyiceberg/catalog/memory.py:18: in <module>
    from pyiceberg.catalog.sql import SqlCatalog
pyiceberg/catalog/sql.py:27: in <module>
    from sqlalchemy import (
E   ModuleNotFoundError: No module named 'sqlalchemy'
_______________________________________________________________________________ ERROR collecting tests/catalog/test_glue.py _______________________________________________________________________________
ImportError while importing test module '/Users/arnaudbriche/Desktop/code/iceberg-python/tests/catalog/test_glue.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/homebrew/Cellar/[email protected]/3.13.2/Frameworks/Python.framework/Versions/3.13/lib/python3.13/importlib/__init__.py:88: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/catalog/test_glue.py:25: in <module>
    from pyiceberg.catalog.glue import GlueCatalog
pyiceberg/catalog/glue.py:33: in <module>
    from mypy_boto3_glue.client import GlueClient
E   ModuleNotFoundError: No module named 'mypy_boto3_glue'
_______________________________________________________________________________ ERROR collecting tests/catalog/test_hive.py _______________________________________________________________________________
ImportError while importing test module '/Users/arnaudbriche/Desktop/code/iceberg-python/tests/catalog/test_hive.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/homebrew/Cellar/[email protected]/3.13.2/Frameworks/Python.framework/Versions/3.13/lib/python3.13/importlib/__init__.py:88: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/catalog/test_hive.py:24: in <module>
    from hive_metastore.ttypes import (
vendor/hive_metastore/ttypes.py:27: in <module>
    from thrift.protocol.TProtocol import TProtocolException
E   ModuleNotFoundError: No module named 'thrift'
_______________________________________________________________________________ ERROR collecting tests/catalog/test_sql.py ________________________________________________________________________________
ImportError while importing test module '/Users/arnaudbriche/Desktop/code/iceberg-python/tests/catalog/test_sql.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/homebrew/Cellar/[email protected]/3.13.2/Frameworks/Python.framework/Versions/3.13/lib/python3.13/importlib/__init__.py:88: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/catalog/test_sql.py:26: in <module>
    from sqlalchemy import Engine, create_engine, inspect
E   ModuleNotFoundError: No module named 'sqlalchemy'
_______________________________________________________________________________ ERROR collecting tests/cli/test_console.py ________________________________________________________________________________
ImportError while importing test module '/Users/arnaudbriche/Desktop/code/iceberg-python/tests/cli/test_console.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/homebrew/Cellar/[email protected]/3.13.2/Frameworks/Python.framework/Versions/3.13/lib/python3.13/importlib/__init__.py:88: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/cli/test_console.py:28: in <module>
    from pyiceberg.catalog.memory import InMemoryCatalog
pyiceberg/catalog/memory.py:18: in <module>
    from pyiceberg.catalog.sql import SqlCatalog
pyiceberg/catalog/sql.py:27: in <module>
    from sqlalchemy import (
E   ModuleNotFoundError: No module named 'sqlalchemy'
________________________________________________________________________ ERROR collecting tests/integration/test_inspect_table.py _________________________________________________________________________
ImportError while importing test module '/Users/arnaudbriche/Desktop/code/iceberg-python/tests/integration/test_inspect_table.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/homebrew/Cellar/[email protected]/3.13.2/Frameworks/Python.framework/Versions/3.13/lib/python3.13/importlib/__init__.py:88: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/integration/test_inspect_table.py:24: in <module>
    import pytz
E   ModuleNotFoundError: No module named 'pytz'
____________________________________________________________________________ ERROR collecting tests/integration/test_reads.py _____________________________________________________________________________
ImportError while importing test module '/Users/arnaudbriche/Desktop/code/iceberg-python/tests/integration/test_reads.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/homebrew/Cellar/[email protected]/3.13.2/Frameworks/Python.framework/Versions/3.13/lib/python3.13/importlib/__init__.py:88: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/integration/test_reads.py:29: in <module>
    from hive_metastore.ttypes import LockRequest, LockResponse, LockState, UnlockRequest
vendor/hive_metastore/ttypes.py:27: in <module>
    from thrift.protocol.TProtocol import TProtocolException
E   ModuleNotFoundError: No module named 'thrift'
______________________________________________________________________ ERROR collecting tests/integration/test_writes/test_writes.py ______________________________________________________________________
ImportError while importing test module '/Users/arnaudbriche/Desktop/code/iceberg-python/tests/integration/test_writes/test_writes.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/homebrew/Cellar/[email protected]/3.13.2/Frameworks/Python.framework/Versions/3.13/lib/python3.13/importlib/__init__.py:88: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/integration/test_writes/test_writes.py:27: in <module>
    import pandas as pd
E   ModuleNotFoundError: No module named 'pandas'
________________________________________________________________________________ ERROR collecting tests/io/test_pyarrow.py ________________________________________________________________________________
ImportError while importing test module '/Users/arnaudbriche/Desktop/code/iceberg-python/tests/io/test_pyarrow.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/homebrew/Cellar/[email protected]/3.13.2/Frameworks/Python.framework/Versions/3.13/lib/python3.13/importlib/__init__.py:88: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/io/test_pyarrow.py:106: in <module>
    from tests.catalog.test_base import InMemoryCatalog
tests/catalog/test_base.py:29: in <module>
    from pyiceberg.catalog.memory import InMemoryCatalog
pyiceberg/catalog/memory.py:18: in <module>
    from pyiceberg.catalog.sql import SqlCatalog
pyiceberg/catalog/sql.py:27: in <module>
    from sqlalchemy import (
E   ModuleNotFoundError: No module named 'sqlalchemy'
_______________________________________________________________________________ ERROR collecting tests/table/test_upsert.py _______________________________________________________________________________
ImportError while importing test module '/Users/arnaudbriche/Desktop/code/iceberg-python/tests/table/test_upsert.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/homebrew/Cellar/[email protected]/3.13.2/Frameworks/Python.framework/Versions/3.13/lib/python3.13/importlib/__init__.py:88: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/table/test_upsert.py:34: in <module>
    from tests.catalog.test_base import InMemoryCatalog, Table
tests/catalog/test_base.py:29: in <module>
    from pyiceberg.catalog.memory import InMemoryCatalog
pyiceberg/catalog/memory.py:18: in <module>
    from pyiceberg.catalog.sql import SqlCatalog
pyiceberg/catalog/sql.py:27: in <module>
    from sqlalchemy import (
E   ModuleNotFoundError: No module named 'sqlalchemy'
______________________________________________________________________________ ERROR collecting tests/utils/test_datetime.py ______________________________________________________________________________
ImportError while importing test module '/Users/arnaudbriche/Desktop/code/iceberg-python/tests/utils/test_datetime.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/homebrew/Cellar/[email protected]/3.13.2/Frameworks/Python.framework/Versions/3.13/lib/python3.13/importlib/__init__.py:88: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/utils/test_datetime.py:20: in <module>
    import pytz
E   ModuleNotFoundError: No module named 'pytz'

@Fokko
Copy link
Contributor

Fokko commented Apr 8, 2025

@arnaudbriche That's no problem. Can you try:

make install
make test

We use a Makefile to run the correct commands.

@arnaudbriche
Copy link
Contributor Author

@arnaudbriche That's no problem. Can you try:

make install
make test

We use a Makefile to run the correct commands.

On OSX:

Without virtualenv:

 make install
/bin/sh: pip: command not found
Poetry version  does not match required version 2.0.1. Updating...
/bin/sh: pip: command not found
make: *** [install-poetry] Error 127

With virtualenv:

make install
WARNING: Package(s) not found: poetry
Poetry version  does not match required version 2.0.1. Updating...

[notice] A new release of pip is available: 25.0 -> 25.0.1
[notice] To update, run: pip install --upgrade pip
ERROR: Can not perform a '--user' install. User site-packages are not visible in this virtualenv.
make: *** [install-poetry] Error 1

@arnaudbriche
Copy link
Contributor Author

Ok so I installed the right poetry version by hand.

But now, while trying to run the tests, I get:

make test
poetry run pytest tests/ -m "(unmarked or parametrize) and not integration"
/Users/arnaudbriche/Desktop/code/iceberg-python/venv/lib/python3.13/site-packages/_pytest/config/__init__.py:331: PluggyTeardownRaisedWarning: A plugin raised an exception during an old-style hookwrapper teardown.
Plugin: helpconfig, Hook: pytest_cmdline_parse
ConftestImportFailure: ModuleNotFoundError: No module named 'pyiceberg' (from /Users/arnaudbriche/Desktop/code/iceberg-python/tests/conftest.py)
For more information see https://pluggy.readthedocs.io/en/stable/api_reference.html#pluggy.PluggyTeardownRaisedWarning
  config = pluginmanager.hook.pytest_cmdline_parse(
ImportError while loading conftest '/Users/arnaudbriche/Desktop/code/iceberg-python/tests/conftest.py'.
tests/conftest.py:51: in <module>
    from pyiceberg.catalog import Catalog, load_catalog
E   ModuleNotFoundError: No module named 'pyiceberg'
make: *** [test] Error 4

@arnaudbriche
Copy link
Contributor Author

And now:

PYTHONPATH=".:$PYTHONPATH" make test
poetry run pytest tests/ -m "(unmarked or parametrize) and not integration"
=========================================================================================== test session starts ===========================================================================================
platform darwin -- Python 3.13.2, pytest-7.4.4, pluggy-1.5.0
rootdir: /Users/arnaudbriche/Desktop/code/iceberg-python
configfile: pyproject.toml
plugins: checkdocs-2.13.0, mock-3.14.0, lazy-fixture-0.6.3, requests-mock-1.12.1
collected 3705 items / 3 errors / 884 deselected / 2821 selected

================================================================================================= ERRORS ==================================================================================================
_______________________________________________________________________________ ERROR collecting tests/catalog/test_hive.py _______________________________________________________________________________
ImportError while importing test module '/Users/arnaudbriche/Desktop/code/iceberg-python/tests/catalog/test_hive.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/homebrew/Cellar/[email protected]/3.13.2/Frameworks/Python.framework/Versions/3.13/lib/python3.13/importlib/__init__.py:88: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/catalog/test_hive.py:24: in <module>
    from hive_metastore.ttypes import (
E   ModuleNotFoundError: No module named 'hive_metastore'
____________________________________________________________________________ ERROR collecting tests/integration/test_reads.py _____________________________________________________________________________
ImportError while importing test module '/Users/arnaudbriche/Desktop/code/iceberg-python/tests/integration/test_reads.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/homebrew/Cellar/[email protected]/3.13.2/Frameworks/Python.framework/Versions/3.13/lib/python3.13/importlib/__init__.py:88: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/integration/test_reads.py:29: in <module>
    from hive_metastore.ttypes import LockRequest, LockResponse, LockState, UnlockRequest
E   ModuleNotFoundError: No module named 'hive_metastore'
______________________________________________________________________ ERROR collecting tests/integration/test_writes/test_writes.py ______________________________________________________________________
ImportError while importing test module '/Users/arnaudbriche/Desktop/code/iceberg-python/tests/integration/test_writes/test_writes.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/homebrew/Cellar/[email protected]/3.13.2/Frameworks/Python.framework/Versions/3.13/lib/python3.13/importlib/__init__.py:88: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/integration/test_writes/test_writes.py:40: in <module>
    from pyiceberg.catalog.hive import HiveCatalog
pyiceberg/catalog/hive.py:35: in <module>
    from hive_metastore.ThriftHiveMetastore import Client
E   ModuleNotFoundError: No module named 'hive_metastore'
========================================================================================= short test summary info =========================================================================================
ERROR tests/catalog/test_hive.py
ERROR tests/integration/test_reads.py
ERROR tests/integration/test_writes/test_writes.py
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 3 errors during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
=================================================================================== 884 deselected, 3 errors in 15.41s ====================================================================================
make: *** [test] Error 2

@arnaudbriche
Copy link
Contributor Author

Hi @Fokko ,

Do you know if running the test suite on OSX is supported ? Or should I switch to working from a Linux box ?

@Fokko
Copy link
Contributor

Fokko commented Apr 9, 2025

Hey @arnaudbriche OSX should work fine (using it myself). We vendor the hive_metastore modules, so I'm not sure what's going on there. Which Python version are you using? Do you have a way to reproduce it?

docker run -v `pwd`:/pyiceberg -t -i python:3.12 bash
root@bdb0eca99544:/# cd /pyiceberg/
root@bdb0eca99544:/pyiceberg# pip install poetry --force
root@bdb0eca99544:/pyiceberg# make install
...
root@bdb0eca99544:/pyiceberg# make lint
...
root@bdb0eca99544:/pyiceberg# make test
...

@arnaudbriche arnaudbriche force-pushed the use-version-hint-for-static-table branch from 9b28cb2 to 9f7aa22 Compare April 16, 2025 14:33
@arnaudbriche
Copy link
Contributor Author

arnaudbriche commented Apr 16, 2025

Hi @Fokko

I have version 3.13.2 installed.
Anyway, I used the Docker image as proposed, and tests execution worked.
I added a small one for this feature, and a bit of doc too.

Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @arnaudbriche This looks great to me! 👍

@Fokko Fokko merged commit 00c548a into apache:main Apr 17, 2025
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[feat] Ability to read table using version-hint.txt
2 participants