Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core: remove unsupported keywords from pydantic schema #29611

Closed
wants to merge 7 commits into from

Conversation

Aaryia
Copy link

@Aaryia Aaryia commented Feb 5, 2025

Description

This modified the _rm_titles function to allow using json_schema method of structure output with datetime.date . The set of unsupported openai keywords includes the title key, as well as all the unsupported items listed here : https://platform.openai.com/docs/guides/structured-outputs#some-type-specific-keywords-are-not-yet-supported

Issue

Resolves #29604

Resolves 29604

This modified the _rm_titles function to allow using `json_schema` method of structure output with `datetime.date` . The set of unsupported openai keywords includes the `title` key, as well as all the unsupported items listed here : https://platform.openai.com/docs/guides/structured-outputs#some-type-specific-keywords-are-not-yet-supported
@dosubot dosubot bot added the size:S This PR changes 10-29 lines, ignoring generated files. label Feb 5, 2025
Copy link

vercel bot commented Feb 5, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
langchain ⬜️ Ignored (Inspect) Visit Preview Feb 5, 2025 3:13pm

@dosubot dosubot bot added langchain Related to the langchain package 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature labels Feb 5, 2025
@Aaryia Aaryia marked this pull request as draft February 5, 2025 14:55
@Aaryia Aaryia marked this pull request as ready for review February 5, 2025 15:22
@dosubot dosubot bot added size:M This PR changes 30-99 lines, ignoring generated files. and removed size:S This PR changes 10-29 lines, ignoring generated files. labels Feb 5, 2025
Copy link
Collaborator

@ccurme ccurme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this. I don't think we want to silently remove these from schemas:

  • If OpenAI, Mistral, etc. begin supporting them, it will be a breaking change for LangChain to include them as well.
  • I believe this change would impact all tool-calling functionality for all integrations (not just structured output and not just for Mistral / OpenAI).

I think it's preferable to raise an informative error message in the Mistral/OpenAI integrations directing users to use a different method for structured output, such as "function_calling".

Let me know what you think.

@ccurme ccurme closed this Feb 5, 2025
@Aaryia
Copy link
Author

Aaryia commented Feb 6, 2025

The function_calling method for structured output is not as constraining as json_schema from what I gathered, making validation fail occasionally. I tried finding the best way to use this json_schema to validate pydantic models. I agree with the problem of silently removing them. If there is another method you would prefer me to try to implement, I could. I simply went for the minimal library change !

Regarding the second point, this method (_convert_pydantic_to_openai_function) is only used for api that are using the specifications from OpenAI, as far as I saw. This means that all models using the openai function model are subject to this bug, not simply mistralAI. To me, it seems logical then that it should impact all tools using the _convert_pydantic_to_openai_function and not only MistralAI.

What I believe could be done better than what I did :

  • using a new method, specifically for removal of openai unsupported keywords, separating concerns between title and the rest
  • adding a parameter that allows for removal of those keywords if specified by the user

Why I think it's better to allow this instead of simply raising an informative error :
I use langchain with pydantic to build an AI agent for a legal tech company. We are trying to adhere to strict dev practices and the fact that function_calling is flaky regarding validation is a major problem to me, especially as we found out that json_schema seems to be the exact answer we need. Not being able to fully use the pydantic + langchain + mistral-ai stack feels bad because it works so well in all other regards.

I am ready to rework this PR as something more robust, but only if it seems like there is an option that could be accepted. What would you advise on this ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature langchain Related to the langchain package size:M This PR changes 30-99 lines, ignoring generated files.
Projects
Status: Closed
2 participants