-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pydantic contrib module #757
base: main
Are you sure you want to change the base?
Conversation
9cc5725
to
fe57d5b
Compare
34d1ae2
to
9f5b771
Compare
# constructor. Return the field value. | ||
if len(args) == 1: | ||
return args[0] | ||
return super().__new__(cls) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: pydantic is calling the _RestrictedProxy
constructor with a single (non-wrapped) datetime.datetime
argument. Returning args[0]
doesn't seem correct, but does work. I need to better understand exactly why pydantic is making this call. The stack looks like
-> o1, _ = make_pydantic_objects()
/Users/dan/src/temporalio/sdk-python/tests/contrib/test_pydantic.py(41)make_pydantic_objects()
-> MyPydanticModel(
/Users/dan/src/temporalio/sdk-python/.venv/lib/python3.9/site-packages/pydantic/main.py(214)__init__()
-> validated_self = self.__pydantic_validator__.validate_python(data, self_instance=self)
> /Users/dan/src/temporalio/sdk-python/temporalio/worker/workflow_sandbox/_restrictions.py(958)__new__()
Pydantic v2 "validation" (i.e. model instance construction) is implemented in Rust, and since intervening stack frames are missing, I think that pydantic/main.py(214)
is calling into pydantic's Rust layer. But I would like to understand why pydantic is calling the constructor with the datetime value as the single argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to better understand exactly why pydantic is making this call
Agreed, I am also curious why Pydantic is instantiating the class with a single argument. ChatGPT mentions that with "type adaption", the absence of __get_pydantic_core_schema__
and/or the __init__
looking a certain way may make Pydantic think it it's a wrapper type.
When pydantic instantiates a model containing a field whose type annotation is proxied
With other create_model
code in this PR, pydantic should never be instantiating a model with proxied annotations right?
fa2c8b5
to
a1cdf1b
Compare
@@ -943,26 +952,37 @@ def r_op(obj: Any, other: Any) -> Any: | |||
|
|||
def _is_restrictable(v: Any) -> bool: | |||
return v is not None and not isinstance( | |||
v, (bool, int, float, complex, str, bytes, bytearray) | |||
v, (bool, int, float, complex, str, bytes, bytearray, datetime) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: As things stand this means that users will be able to do my_datetime.now()
. But datetime.datetime.now()
will still be protected against, which is the usual way to call it. One option is to make this change only when conditional on HAVE_PYDANTIC
.
If we get rid of this change and allow datetime
instances to be restricted, then Pydantic validation fails. This failure is in Pydantic's Rust layer: https://github.com/pydantic/pydantic-core/blob/main/src/errors/types.rs#L706-L707
datetime_field
Input should be a valid datetime [type=datetime_type, input_value=datetime.datetime(2000, 1, 2, 3, 4, 5), input_type=_RestrictedProxy]
For further information visit https://errors.pydantic.dev/2.10/v/datetime_type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we never meant for the instances to be restricted just the static methods, so this is fine IMO. What about datetime.date
, datetime.time
, and datetime.timedelta
objects, do we need explicit support for them too (at least the former)?
# building their validators causing it to use date instead | ||
# TODO(cretz): https://github.com/temporalio/sdk-python/issues/207 | ||
if pydantic.compiled: | ||
assert isinstance(PydanticMessage(content=workflow.now()).content, date) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pydantic.compiled
is not present in v2
if pydantic.compiled: | ||
assert isinstance(PydanticMessage(content=workflow.now()).content, date) | ||
else: | ||
assert isinstance(PydanticMessage(content=workflow.now()).content, datetime) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is tested in test_pydantic.py
json_payload_converter = JSONPlainPayloadConverter( | ||
encoder=PydanticJSONEncoder, | ||
custom_type_converters=[PydanticModelTypeConverter()], | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our pydantic sample was implementing to_payload
. This is slightly incorrect because it means that our AdvancedJSONEncoder
was not being used. In practice, pydantic provides a function (called to_jsonable_python
in v2) with similar functionality to our AdvancedJSONEncoder
, but nevertheless, we should ensure that we fall back to AdvancedJSONEncoder
in case we add to it in the future. That's what we do here, by using JSONPlainPayloadConverter
directly.
@@ -325,6 +319,14 @@ This notably doesn't include any `date`, `time`, or `datetime` objects as they m | |||
Users are strongly encouraged to use a single `dataclass` for parameter and return types so fields with defaults can be | |||
easily added without breaking compatibility. | |||
|
|||
To use pydantic model instances (or python objects containing pydantic model instances), use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should make this a whole subsection, e.g. ##### Pydantic
@@ -35,6 +35,7 @@ grpcio = {version = "^1.48.2", optional = true} | |||
opentelemetry-api = { version = "^1.11.1", optional = true } | |||
opentelemetry-sdk = { version = "^1.11.1", optional = true } | |||
protobuf = ">=3.20" | |||
pydantic = { version = "^2.10.6", optional = true } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you confirm that pydantic v1 users aren't bound by our version constraint here if they never add the extra? Maybe the easiest way to confirm is somehow have a sample or test that confirms existing pydantic support?
# pydantic v1 | ||
from pydantic.json import pydantic_encoder as to_jsonable_python # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we saying this contrib library works even with pydantic v1? If so, we should change our version constraint to be >=
whatever the absolute lowest we support is. If not, we should probably error if they try to import this.
if temporalio.workflow.unsafe.in_sandbox(): | ||
# Unwrap proxied model field types so that Pydantic can call their constructors | ||
model = pydantic.create_model( | ||
model.__name__, | ||
**{ # type: ignore | ||
name: (RestrictionContext.unwrap_if_proxied(f.annotation), f) | ||
for name, f in model.model_fields.items() | ||
}, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of questions here just due to my lack of knowledge...
- Arguably models should not be defined/loaded in the sandbox right? That means they are reloaded for every workflow run. They should be imported via passthrough. I wonder if we should have a warning if we detect any model fields are proxied? Or maybe a warning if we see Pydantic models being loaded in the sandbox instead of passed through? EDIT: Seeing test I can see situations where people define their models in the sandbox, even if we discourage it.
- Is
create_model
cheap? Meaning is this what Pydantic often uses anyways to instantiate a model instance, or do they somehow cache this definition and manually doingcreate_model
isn't meant for each time you need the instance? If it is not cheap, we should both consider caching and consider not calling it if none of the types are proxied.
}, | ||
) | ||
if hasattr(model, "model_validate"): | ||
return model.model_validate(value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this raises a Pydantic-specific exception that a user could add to workflow failure exception types if they wanted? (if not, we should consider a usability improvement to have an option to wrap this in application error)
@@ -943,26 +952,37 @@ def r_op(obj: Any, other: Any) -> Any: | |||
|
|||
def _is_restrictable(v: Any) -> bool: | |||
return v is not None and not isinstance( | |||
v, (bool, int, float, complex, str, bytes, bytearray) | |||
v, (bool, int, float, complex, str, bytes, bytearray, datetime) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we never meant for the instances to be restricted just the static methods, so this is fine IMO. What about datetime.date
, datetime.time
, and datetime.timedelta
objects, do we need explicit support for them too (at least the former)?
if HAVE_PYDANTIC and __name == "__get_pydantic_core_schema__": | ||
return object.__getattribute__(self, "__get_pydantic_core_schema__") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrmm, trying to think if we even want this generic to void pydantic code inside our sandbox. I guess all ways to do it would require that the contrib library is at least imported by the user somewhere in the process, but maybe that is ok? If that is ok, I wonder if a side-effect of importing the contrib is ideal over altering the sandbox even if pydantic
is even available? Though I admit I kinda like the idea of this patch if pydantic
is even available regardless of contrib lib usage, so maybe this should remain as is.
|
||
class MyPydanticModel(BaseModel): | ||
ip_field: IPv4Address | ||
datetime_field: datetime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add datetime.date
, datetime.time
, and datetime.timedelta
in here? And maybe uuid.UUID
while we're at it since we have restrictions in that module too.
ShortSequence = Annotated[SequenceType, Len(max_length=2)] | ||
|
||
|
||
class MyPydanticModel(BaseModel): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will say for many users they will be importing this model from a different module on passthrough. This test confirms the model works when defined in the sandbox, but I wonder if we also need at least a little coverage for models defined out of the sandbox (and passed through).
# constructor. Return the field value. | ||
if len(args) == 1: | ||
return args[0] | ||
return super().__new__(cls) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to better understand exactly why pydantic is making this call
Agreed, I am also curious why Pydantic is instantiating the class with a single argument. ChatGPT mentions that with "type adaption", the absence of __get_pydantic_core_schema__
and/or the __init__
looking a certain way may make Pydantic think it it's a wrapper type.
When pydantic instantiates a model containing a field whose type annotation is proxied
With other create_model
code in this PR, pydantic should never be instantiating a model with proxied annotations right?
Fixes #621, #496
temporalio.contrib.pydantic
.v2
, but we do retain the existing code that should allowv1
to still be used.datetime.datetime
class, anddatetime.datetime
instances, with proxy objects, there are a few places where we have to unwrap proxied objects to allow pydantic to work correctly.datetime.datetime
sandboxing when pydantic is being used.See also #627, #207
TODO:
v1
usage