Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pydantic contrib module #757

Open
wants to merge 21 commits into
base: main
Choose a base branch
from
Open

Pydantic contrib module #757

wants to merge 21 commits into from

Conversation

dandavison
Copy link
Contributor

@dandavison dandavison commented Jan 31, 2025

Fixes #621, #496

  • Provide official pydantic support in a new module temporalio.contrib.pydantic.
  • This support is based on v2, but we do retain the existing code that should allow v1 to still be used.
  • Due to our wrapping of the datetime.datetime class, and datetime.datetime instances, with proxy objects, there are a few places where we have to unwrap proxied objects to allow pydantic to work correctly.
  • Some of these changes are essentially "hacks", and it is an open question whether we should retain datetime.datetime sandboxing when pydantic is being used.

See also #627, #207

TODO:

  • test v1 usage

@dandavison dandavison force-pushed the pydantic branch 4 times, most recently from 9cc5725 to fe57d5b Compare February 3, 2025 02:11
@dandavison dandavison force-pushed the pydantic branch 3 times, most recently from 34d1ae2 to 9f5b771 Compare February 4, 2025 04:55
# constructor. Return the field value.
if len(args) == 1:
return args[0]
return super().__new__(cls)
Copy link
Contributor Author

@dandavison dandavison Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: pydantic is calling the _RestrictedProxy constructor with a single (non-wrapped) datetime.datetime argument. Returning args[0] doesn't seem correct, but does work. I need to better understand exactly why pydantic is making this call. The stack looks like

-> o1, _ = make_pydantic_objects()
  /Users/dan/src/temporalio/sdk-python/tests/contrib/test_pydantic.py(41)make_pydantic_objects()
-> MyPydanticModel(
  /Users/dan/src/temporalio/sdk-python/.venv/lib/python3.9/site-packages/pydantic/main.py(214)__init__()
-> validated_self = self.__pydantic_validator__.validate_python(data, self_instance=self)
> /Users/dan/src/temporalio/sdk-python/temporalio/worker/workflow_sandbox/_restrictions.py(958)__new__()

Pydantic v2 "validation" (i.e. model instance construction) is implemented in Rust, and since intervening stack frames are missing, I think that pydantic/main.py(214) is calling into pydantic's Rust layer. But I would like to understand why pydantic is calling the constructor with the datetime value as the single argument.

Copy link
Member

@cretz cretz Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to better understand exactly why pydantic is making this call

Agreed, I am also curious why Pydantic is instantiating the class with a single argument. ChatGPT mentions that with "type adaption", the absence of __get_pydantic_core_schema__ and/or the __init__ looking a certain way may make Pydantic think it it's a wrapper type.

When pydantic instantiates a model containing a field whose type annotation is proxied

With other create_model code in this PR, pydantic should never be instantiating a model with proxied annotations right?

@dandavison dandavison force-pushed the pydantic branch 2 times, most recently from fa2c8b5 to a1cdf1b Compare February 4, 2025 14:39
@@ -943,26 +952,37 @@ def r_op(obj: Any, other: Any) -> Any:

def _is_restrictable(v: Any) -> bool:
return v is not None and not isinstance(
v, (bool, int, float, complex, str, bytes, bytearray)
v, (bool, int, float, complex, str, bytes, bytearray, datetime)
Copy link
Contributor Author

@dandavison dandavison Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: As things stand this means that users will be able to do my_datetime.now(). But datetime.datetime.now() will still be protected against, which is the usual way to call it. One option is to make this change only when conditional on HAVE_PYDANTIC.

If we get rid of this change and allow datetime instances to be restricted, then Pydantic validation fails. This failure is in Pydantic's Rust layer: https://github.com/pydantic/pydantic-core/blob/main/src/errors/types.rs#L706-L707

datetime_field
  Input should be a valid datetime [type=datetime_type, input_value=datetime.datetime(2000, 1, 2, 3, 4, 5), input_type=_RestrictedProxy]
    For further information visit https://errors.pydantic.dev/2.10/v/datetime_type

Copy link
Member

@cretz cretz Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we never meant for the instances to be restricted just the static methods, so this is fine IMO. What about datetime.date, datetime.time, and datetime.timedelta objects, do we need explicit support for them too (at least the former)?

# building their validators causing it to use date instead
# TODO(cretz): https://github.com/temporalio/sdk-python/issues/207
if pydantic.compiled:
assert isinstance(PydanticMessage(content=workflow.now()).content, date)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pydantic.compiled is not present in v2

if pydantic.compiled:
assert isinstance(PydanticMessage(content=workflow.now()).content, date)
else:
assert isinstance(PydanticMessage(content=workflow.now()).content, datetime)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is tested in test_pydantic.py

json_payload_converter = JSONPlainPayloadConverter(
encoder=PydanticJSONEncoder,
custom_type_converters=[PydanticModelTypeConverter()],
)
Copy link
Contributor Author

@dandavison dandavison Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our pydantic sample was implementing to_payload. This is slightly incorrect because it means that our AdvancedJSONEncoder was not being used. In practice, pydantic provides a function (called to_jsonable_python in v2) with similar functionality to our AdvancedJSONEncoder, but nevertheless, we should ensure that we fall back to AdvancedJSONEncoder in case we add to it in the future. That's what we do here, by using JSONPlainPayloadConverter directly.

@dandavison dandavison marked this pull request as ready for review February 4, 2025 15:24
@dandavison dandavison requested a review from a team as a code owner February 4, 2025 15:24
@@ -325,6 +319,14 @@ This notably doesn't include any `date`, `time`, or `datetime` objects as they m
Users are strongly encouraged to use a single `dataclass` for parameter and return types so fields with defaults can be
easily added without breaking compatibility.

To use pydantic model instances (or python objects containing pydantic model instances), use
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should make this a whole subsection, e.g. ##### Pydantic

@@ -35,6 +35,7 @@ grpcio = {version = "^1.48.2", optional = true}
opentelemetry-api = { version = "^1.11.1", optional = true }
opentelemetry-sdk = { version = "^1.11.1", optional = true }
protobuf = ">=3.20"
pydantic = { version = "^2.10.6", optional = true }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you confirm that pydantic v1 users aren't bound by our version constraint here if they never add the extra? Maybe the easiest way to confirm is somehow have a sample or test that confirms existing pydantic support?

Comment on lines +25 to +26
# pydantic v1
from pydantic.json import pydantic_encoder as to_jsonable_python # type: ignore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we saying this contrib library works even with pydantic v1? If so, we should change our version constraint to be >= whatever the absolute lowest we support is. If not, we should probably error if they try to import this.

Comment on lines +56 to +64
if temporalio.workflow.unsafe.in_sandbox():
# Unwrap proxied model field types so that Pydantic can call their constructors
model = pydantic.create_model(
model.__name__,
**{ # type: ignore
name: (RestrictionContext.unwrap_if_proxied(f.annotation), f)
for name, f in model.model_fields.items()
},
)
Copy link
Member

@cretz cretz Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of questions here just due to my lack of knowledge...

  1. Arguably models should not be defined/loaded in the sandbox right? That means they are reloaded for every workflow run. They should be imported via passthrough. I wonder if we should have a warning if we detect any model fields are proxied? Or maybe a warning if we see Pydantic models being loaded in the sandbox instead of passed through? EDIT: Seeing test I can see situations where people define their models in the sandbox, even if we discourage it.
  2. Is create_model cheap? Meaning is this what Pydantic often uses anyways to instantiate a model instance, or do they somehow cache this definition and manually doing create_model isn't meant for each time you need the instance? If it is not cheap, we should both consider caching and consider not calling it if none of the types are proxied.

},
)
if hasattr(model, "model_validate"):
return model.model_validate(value)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this raises a Pydantic-specific exception that a user could add to workflow failure exception types if they wanted? (if not, we should consider a usability improvement to have an option to wrap this in application error)

@@ -943,26 +952,37 @@ def r_op(obj: Any, other: Any) -> Any:

def _is_restrictable(v: Any) -> bool:
return v is not None and not isinstance(
v, (bool, int, float, complex, str, bytes, bytearray)
v, (bool, int, float, complex, str, bytes, bytearray, datetime)
Copy link
Member

@cretz cretz Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we never meant for the instances to be restricted just the static methods, so this is fine IMO. What about datetime.date, datetime.time, and datetime.timedelta objects, do we need explicit support for them too (at least the former)?

Comment on lines +994 to +995
if HAVE_PYDANTIC and __name == "__get_pydantic_core_schema__":
return object.__getattribute__(self, "__get_pydantic_core_schema__")
Copy link
Member

@cretz cretz Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrmm, trying to think if we even want this generic to void pydantic code inside our sandbox. I guess all ways to do it would require that the contrib library is at least imported by the user somewhere in the process, but maybe that is ok? If that is ok, I wonder if a side-effect of importing the contrib is ideal over altering the sandbox even if pydantic is even available? Though I admit I kinda like the idea of this patch if pydantic is even available regardless of contrib lib usage, so maybe this should remain as is.


class MyPydanticModel(BaseModel):
ip_field: IPv4Address
datetime_field: datetime
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add datetime.date, datetime.time, and datetime.timedelta in here? And maybe uuid.UUID while we're at it since we have restrictions in that module too.

ShortSequence = Annotated[SequenceType, Len(max_length=2)]


class MyPydanticModel(BaseModel):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will say for many users they will be importing this model from a different module on passthrough. This test confirms the model works when defined in the sandbox, but I wonder if we also need at least a little coverage for models defined out of the sandbox (and passed through).

# constructor. Return the field value.
if len(args) == 1:
return args[0]
return super().__new__(cls)
Copy link
Member

@cretz cretz Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to better understand exactly why pydantic is making this call

Agreed, I am also curious why Pydantic is instantiating the class with a single argument. ChatGPT mentions that with "type adaption", the absence of __get_pydantic_core_schema__ and/or the __init__ looking a certain way may make Pydantic think it it's a wrapper type.

When pydantic instantiates a model containing a field whose type annotation is proxied

With other create_model code in this PR, pydantic should never be instantiating a model with proxied annotations right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request] Create official Pydantic contrib module
2 participants