Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Annotated types for the output types #1299

Open
rlouf opened this issue Nov 30, 2024 · 0 comments
Open

Use Annotated types for the output types #1299

rlouf opened this issue Nov 30, 2024 · 0 comments
Assignees
Milestone

Comments

@rlouf
Copy link
Member

rlouf commented Nov 30, 2024

We are currently using classes to define the output types on the v1.0 branch. For instance, to define a Json output type we use the outlines.types.Json class:

from outlines import Generator
from outlines.types import Json

model = ...
generator = Generator(model, Json(...))

And the behavior of the model is determined by dispatching based on the class passed to Generator. However, this introduces complexity and ambiguity as we will show in the following.

Passing custom types to initialize the Generator

We recently introduced custom types to abstract common regular expression, for instance the regular expression that represents US phone numbers. It is defined as:

from pydantic import WithJsonSchema
from typing_extensions import Annotated

US_PHONE_NUMBER = r"(\([0-9]{3}\) |[0-9]{3}-)[0-9]{3}-[0-9]{4}"


USPhoneNumber = Annotated[
    str,
    WithJsonSchema({"type": "string", "pattern": US_PHONE_NUMBER}),
]

We use the Annotated python type so this custom type can be used in Pydantic models. We would naturally want to be able to build a Generator object by passing USPhoneNumber directly as follows:

from outlines import Generator

model = ...
generator = Generator(model, USPhoneNumber)

But then the type of the objects that we pass to Generator are different. Sometimes a class instance, sometimes a type annotation. This is a design disaster waiting to happen, and will introduce a lot of complexity to the code down the line.

Use Python type annotations such as List and Union

Defining the output type as List[Json(...)] or Union[Json(...)] will raise an exception, since these parametric types can only be parametrized by types, and Json(...) is an instance of the Json class. This is very confusing from a user perspective; it is not clear why these types would not be supported.

Use Annotated types everywhere

I thus suggest that we use type constructors that generate an Annotated type. For instance:

from typing import List
from pydantic import BaseModel
from outlines.types import Json


class Foo(BaseModel):
    bar: str


json_type = Json(Foo)
print(json_type)
# typing.Annotated[str, JsonType(definition=Foo, whitespace_pattern=' ')]

list_type = List[Json(Foo)]
print(list_type)
# typing.List[typing.Annotated[str, JsonType(definition={'a': 1}, whitespace_pattern=' ')]]

model = ...
generator = build_generator(model, Json(Foo))

generator = build_generator(model, List[Json(Foo)])

This way we can also support Tuple, Union, and other composite types. As a bonus, we can configure them so that they can be serialized by Pydantic into the correct Json Schema.

@rlouf rlouf added this to the 1.0 milestone Nov 30, 2024
@rlouf rlouf self-assigned this Dec 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant