Thank you for contributing to the VLM Run Hub! To maintain consistency and adhere to industry best practices, please follow these guidelines when creating a new schema.
-
Use Pydantic’s BaseModel: All schemas must inherit from Pydantic’s
BaseModel
.from pydantic import BaseModel class ExampleSchema(BaseModel): ...
-
Strongly-Typed Fields: Define each field with precise, strongly-typed annotations (e.g.,
str
,int
,float
,list
,dict
). -
Optional Fields: Use
| None
as the default for optional fields. This is critical as some fields may not be present in the data, and we don't want the Pydantic model to fail the schema validation when this happens. -
Descriptive Field Names: Use clear, descriptive, and
snake_case
field names, along with a shortdescription
field that explains the field's purpose. This is critical for the model to interpret the field to be mapped from.Good example:
class CustomerInvoice(BaseModel): invoice_number: str = Field(..., description="The invoice number, typically represented as a string of alphanumeric characters.")
Bad example:
class CustomerInvoice(BaseModel): invoice_number: str = Field(..., description="The invoice number.")
-
Field Metadata:
- Use the
Field
class to provide:default
: If applicable (e.g.,Field(None, ...)
).description
: Include a short, clear explanation of the field’s purpose. (e.g.,Field(..., description="The invoice number, typically represented as a string of alphanumeric characters.")
)- Other constraints: For validation (e.g.,
max_length
,regex
). - Validation: Add custom validators where necessary to enforce domain-specific rules.
- Use the
-
Nested Models: Use nested Pydantic models for complex structures (e.g., lists of dictionaries).
class CustomerInvoice(BaseModel): invoice_number: str = Field(..., description="The invoice number, typically represented as a string of alphanumeric characters.") items: list[Item] = Field(..., description="A list of items in the invoice.")
-
Enums: Use enums or
Literal
for fixed choices.Using
Enum
:class Status(Enum): pending = "pending" paid = "paid" cancelled = "cancelled" class CustomerInvoice(BaseModel): ... status: Status = Field(..., description="The status of the invoice, which can be either 'pending', 'paid', or 'cancelled'.")
Using
Literal
:class CustomerInvoice(BaseModel): status: Literal["pending", "paid", "cancelled"] = Field(..., description="The status of the invoice, which can be either 'pending', 'paid', or 'cancelled'.")
-
Examples: Include a
Config.schema_extra
with example data for each schema.class CustomerInvoice(BaseModel): ... CustomerInvoice.model_json_schema(indent=2)
Before submitting your schema:
- Field Types: Ensure all fields are strongly-typed.
- Field Metadata: Check that all fields include descriptions and constraints where applicable.
- Examples: Include clear, complete examples in Config.schema_extra.
- Validation: Add custom validators for domain-specific rules.
- Reusability: Use nested models for complex types and avoid redundancy.
- Tests: Provide unit tests to validate the schema against valid and invalid data.
-
Create a new schema file: Create a new file in the
schemas/contrib
directory, under the appropriate industry and use case (e.g.,schemas/contrib/retail/ecommerce_product_caption.py
). Follow the Schema Guidelines to write the schema. -
Add sample image, prompt, and test: Add a sample image, text, or other data that the schema can be applied to and add a pytest test for the schema under
tests/test_schemas.py
.