Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor of Field / FieldCI / ColumnInfo #63

Open
thomasaarholt opened this issue Apr 6, 2024 · 1 comment
Open

Refactor of Field / FieldCI / ColumnInfo #63

thomasaarholt opened this issue Apr 6, 2024 · 1 comment

Comments

@thomasaarholt
Copy link
Collaborator

thomasaarholt commented Apr 6, 2024

@brendancooley (and others), I want to refactor the Field function in order to:

  • Have a statically (pyright) parseable docstring
  • Explicitly pass arguments like gt, dtype, constraints etc rather than use args/kwargs
  • Properly be able to serialize and deserialize types, satisfying any pydantic type requirements.

The suggestion

I'm considering encoding all the relevant Field parameters inside ColumnInfo, and ensuring that ColumnInfo can serialize (using field_serializer) and deserialize (through validators) all parameters.

This means that we can type-safely pass all these arguments to pydantic's Field using pydantic.fields.Field(json_schema_extra=column_info.model_dump()).

Then, at validation time, we would reconstruct the ColumnInfo object for each column using ColumnInfo.model_validate(some_patito_model.model_fields["some_field"]), and be able to relatively easily use these objects for validation.

However

The only things that I'm a bit unsure about:

  • Should we use ColumnInfo for fields like gt, which already exist in pydantic's Field? Or do as we currently do and pop them off? I'm leaning towards keeping them within the ColumnInfo just to have all the logic in one place.
  • If we do use ColumnInfo like I suggest in the previous point, are there cases where we shouldn't do that?

Let me know if this seems unclear! Writing this while alternating entertaining a 2.5 year old and a 3 month old 😅

@brendancooley
Copy link
Contributor

brendancooley commented Apr 7, 2024

Certainly agree that this interface could use a refactor. A few concerns about replicating pydantic args onto ColumnInfo (or another patito-side interface):

  1. keeping up with pydantic's signature will likely require some maintenance work, and it's possible that the signature might change on a minor version. Some pydantic field args (e.g. max_items) are scheduled for deprecation. How do we intend to handle these?
  2. Which pydantic metadata does patito intend to support? Should we compile some of the constraints to polars expressions and append them to constraints (e.g. Ge(0) -> pl.col("foo") >= 0). How should patito handle strict?

Overall, being able to take an existing pydantic model and quickly it into a patito model while retaining object-level validation from pydantic is a very nice feature. But not having our own Field definition makes us reactive to changes on the pydantic side.

Maybe we should start by defining more concretely what constitutes a patito Field (i.e. which elements are required to perform tabular validation and schema specification), and then we can work on the conversion/serialization of a pydantic Field to a patito Field.

pydantic's FieldInfo, for reference: https://github.com/pydantic/pydantic/blob/8aeac1a4c61b084ebecf61b38bb8d3e80884dc33/pydantic/fields.py#L89

My one year old is napping and giving me a chance to catch up on all of this great work and thinking! :)

@lmmx lmmx added this to Planner Sep 9, 2024
@lmmx lmmx moved this to 🔜 Soon in Planner Sep 9, 2024
@lmmx lmmx removed this from Planner Sep 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants