You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@brendancooley (and others), I want to refactor the Field function in order to:
Have a statically (pyright) parseable docstring
Explicitly pass arguments like gt, dtype, constraints etc rather than use args/kwargs
Properly be able to serialize and deserialize types, satisfying any pydantic type requirements.
The suggestion
I'm considering encoding all the relevant Field parameters inside ColumnInfo, and ensuring that ColumnInfo can serialize (using field_serializer) and deserialize (through validators) all parameters.
This means that we can type-safely pass all these arguments to pydantic's Field using pydantic.fields.Field(json_schema_extra=column_info.model_dump()).
Then, at validation time, we would reconstruct the ColumnInfo object for each column using ColumnInfo.model_validate(some_patito_model.model_fields["some_field"]), and be able to relatively easily use these objects for validation.
However
The only things that I'm a bit unsure about:
Should we use ColumnInfo for fields like gt, which already exist in pydantic's Field? Or do as we currently do and pop them off? I'm leaning towards keeping them within the ColumnInfo just to have all the logic in one place.
If we do use ColumnInfo like I suggest in the previous point, are there cases where we shouldn't do that?
Let me know if this seems unclear! Writing this while alternating entertaining a 2.5 year old and a 3 month old 😅
The text was updated successfully, but these errors were encountered:
Certainly agree that this interface could use a refactor. A few concerns about replicating pydantic args onto ColumnInfo (or another patito-side interface):
keeping up with pydantic's signature will likely require some maintenance work, and it's possible that the signature might change on a minor version. Some pydantic field args (e.g. max_items) are scheduled for deprecation. How do we intend to handle these?
Which pydantic metadata does patito intend to support? Should we compile some of the constraints to polars expressions and append them to constraints (e.g. Ge(0) -> pl.col("foo") >= 0). How should patito handle strict?
Overall, being able to take an existing pydantic model and quickly it into a patito model while retaining object-level validation from pydantic is a very nice feature. But not having our own Field definition makes us reactive to changes on the pydantic side.
Maybe we should start by defining more concretely what constitutes a patito Field (i.e. which elements are required to perform tabular validation and schema specification), and then we can work on the conversion/serialization of a pydantic Field to a patito Field.
@brendancooley (and others), I want to refactor the Field function in order to:
gt
,dtype
,constraints
etc rather than use args/kwargsThe suggestion
I'm considering encoding all the relevant
Field
parameters insideColumnInfo
, and ensuring thatColumnInfo
can serialize (usingfield_serializer
) and deserialize (through validators) all parameters.This means that we can type-safely pass all these arguments to pydantic's
Field
usingpydantic.fields.Field(json_schema_extra=column_info.model_dump())
.Then, at validation time, we would reconstruct the
ColumnInfo
object for each column usingColumnInfo.model_validate(some_patito_model.model_fields["some_field"])
, and be able to relatively easily use these objects for validation.However
The only things that I'm a bit unsure about:
ColumnInfo
for fields likegt
, which already exist in pydantic'sField
? Or do as we currently do and pop them off? I'm leaning towards keeping them within theColumnInfo
just to have all the logic in one place.ColumnInfo
like I suggest in the previous point, are there cases where we shouldn't do that?Let me know if this seems unclear! Writing this while alternating entertaining a 2.5 year old and a 3 month old 😅
The text was updated successfully, but these errors were encountered: