Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON and TypeAdapters produce unwanted values or empty list #1069

Open
AlbanPerli opened this issue Oct 29, 2024 · 3 comments
Open

JSON and TypeAdapters produce unwanted values or empty list #1069

AlbanPerli opened this issue Oct 29, 2024 · 3 comments

Comments

@AlbanPerli
Copy link

Hi @hudson-ai!

Concerning the TypeAdapter constrained generation, here are some example of the issue mentioned here:

from guidance import models, capture
from guidance import json as jj
from pydantic import BaseModel, TypeAdapter
import json
from Noema.cfg import *

lm = models.LlamaCpp(
    "../Models/Mistral-NeMo-Minitron-8B-Instruct.Q4_K_M.gguf",
    n_gpu_layers=99,
    n_ctx=512*8,
    echo=False
)

lm.reset()
lm += "Generate a list of 3 integers between 1 and 4: " + capture(G.arrayOf(G.num()), name="generated_object")
print(lm["generated_object"])
# Output: ["1", "2", "3"]

lm.reset()
schema = TypeAdapter(list[int])
lm += "Generate a list of 3 integers between 0 and 4: " + jj(name="generated_object", schema=schema)
print(json.loads(lm["generated_object"]))
# Output: []

lm.reset()
lm += "Créé une liste des différentes étapes décrites ici: Ce matin je suis parti tot, puis j'ai acheté des pommes et enfin je suis allé au restaurant." + capture(G.arrayOf(G.sentence()), name="generated_object")
print(lm["generated_object"])
# Output: ["Ce matin je suis parti tot, puis j'ai acheté des pommes et enfin je suis allé au restaurant."]

lm.reset()
schema = TypeAdapter(list[str])
lm +=  "Créé une liste des différentes étapes décrites ici: Ce matin je suis parti tot, puis j'ai acheté des pommes et enfin je suis allé au restaurant." + jj(name="generated_object", schema=schema)
print(json.loads(lm["generated_object"]))
# Output: []

The file containing custom CFG is here.

This is just a workaround but it helps to produce a non empty list.

Concerning the JSON:

lm.reset()
class Schema(BaseModel):
     weather: str
lm += "What is the weather today? " + jj(name="generated_object", schema=Schema)
print(json.loads(lm["generated_object"]))
# Output using Minitron 8B : {'weather': ', '} 
# Output using llama3 instruct: {'weather': ':sunny:'}

I'm not sure to understand what the expected generation is, but it seems that characters from the format are interfering with the generated content.

@hudson-ai
Copy link
Collaborator

Hi @AlbanPerli sorry for the late reply here :)

I think that part of what you are encountering here is that lists aren't forced to be non-empty by default (I think your custom grammar definitions enforce a minimum length of one). If you want to enforce this behavior with TypeAdapters, you can use typing.Annotated and annotated_types.MinLen like so:

from typing import Annotated
from annotated_types import MinLen
from pydantic import TypeAdapter

ta = TypeAdapter(Annotated[list[int], MinLen(1)])
ta.json_schema()
# Output: {'items': {'type': 'integer'}, 'minItems': 1, 'type': 'array'}

You can of course get this behavior by just writing the JSON schema directly, or if you're using a pydantic.BaseModel, you can do these annotations a bit more ergonomically with the pydantic.Field descriptor.

If this doesn't address the core issue you're seeing, just let me know and we can figure it out :)

@AlbanPerli
Copy link
Author

Hi @hudson-ai , my turn to apologize for the response time! :)

The point was indeed the minimum length, I wasn't aware of this parameter for the TypeAdapter.

Thank you!

@hudson-ai
Copy link
Collaborator

Ok, good to know that works for you! Let us know if you hit any other unexpected or unintuitive behaviors :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants