Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using ChatMistralAI with structured output : Pydantic model with a datetime.date value using json_schema raises a 400 bad request #29604

Open
5 tasks done
Aaryia opened this issue Feb 5, 2025 · 1 comment
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature

Comments

@Aaryia
Copy link

Aaryia commented Feb 5, 2025

Checked other resources

  • I added a very descriptive title to this issue.
  • I searched the LangChain documentation with the integrated search.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

from pydantic import BaseModel
from langchain_mistralai.chat_models import ChatMistralAI
from datetime import date

class DummyClass(BaseModel):
    date: date


llm = ChatMistralAI(model='mistral-small-latest',temperature=0).with_structured_output(DummyClass, method='json_schema')

result: DummyClass = llm.invoke('Answer me with a date. When was the first man on the moon ?')

Error Message and Stack Trace (if applicable)


HTTPStatusError Traceback (most recent call last)
Cell In[124], line 12
7 date: date
10 llm = ChatMistralAI(api_key=api_key, model='mistral-small-latest',temperature=0).with_structured_output(DummyClass, method='json_schema')
---> 12 result: DummyClass = llm.invoke('Answer me with a date. When was the first man on the moon ?')

File ~/rag-project/rag-sandbox/.venv/lib/python3.12/site-packages/langchain_core/runnables/base.py:3014, in RunnableSequence.invoke(self, input, config, **kwargs)
3012 context.run(_set_config_context, config)
3013 if i == 0:
-> 3014 input = context.run(step.invoke, input, config, **kwargs)
3015 else:
3016 input = context.run(step.invoke, input, config)

File ~/rag-project/rag-sandbox/.venv/lib/python3.12/site-packages/langchain_core/runnables/base.py:5352, in RunnableBindingBase.invoke(self, input, config, **kwargs)
5346 def invoke(
5347 self,
5348 input: Input,
5349 config: Optional[RunnableConfig] = None,
5350 **kwargs: Optional[Any],
5351 ) -> Output:
-> 5352 return self.bound.invoke(
5353 input,
5354 self._merge_configs(config),
5355 **{**self.kwargs, **kwargs},
5356 )

File ~/rag-project/rag-sandbox/.venv/lib/python3.12/site-packages/langchain_core/language_models/chat_models.py:284, in BaseChatModel.invoke(self, input, config, stop, **kwargs)
273 def invoke(
274 self,
275 input: LanguageModelInput,
(...)
279 **kwargs: Any,
280 ) -> BaseMessage:
281 config = ensure_config(config)
282 return cast(
283 ChatGeneration,
--> 284 self.generate_prompt(
285 [self._convert_input(input)],
286 stop=stop,
287 callbacks=config.get("callbacks"),
288 tags=config.get("tags"),
289 metadata=config.get("metadata"),
290 run_name=config.get("run_name"),
291 run_id=config.pop("run_id", None),
292 **kwargs,
293 ).generations[0][0],
294 ).message

File ~/rag-project/rag-sandbox/.venv/lib/python3.12/site-packages/langchain_core/language_models/chat_models.py:860, in BaseChatModel.generate_prompt(self, prompts, stop, callbacks, **kwargs)
852 def generate_prompt(
853 self,
854 prompts: list[PromptValue],
(...)
857 **kwargs: Any,
858 ) -> LLMResult:
859 prompt_messages = [p.to_messages() for p in prompts]
--> 860 return self.generate(prompt_messages, stop=stop, callbacks=callbacks, **kwargs)

File ~/rag-project/rag-sandbox/.venv/lib/python3.12/site-packages/langchain_core/language_models/chat_models.py:690, in BaseChatModel.generate(self, messages, stop, callbacks, tags, metadata, run_name, run_id, **kwargs)
687 for i, m in enumerate(messages):
688 try:
689 results.append(
--> 690 self._generate_with_cache(
691 m,
692 stop=stop,
693 run_manager=run_managers[i] if run_managers else None,
694 **kwargs,
695 )
696 )
697 except BaseException as e:
698 if run_managers:

File ~/rag-project/rag-sandbox/.venv/lib/python3.12/site-packages/langchain_core/language_models/chat_models.py:925, in BaseChatModel._generate_with_cache(self, messages, stop, run_manager, **kwargs)
923 else:
924 if inspect.signature(self._generate).parameters.get("run_manager"):
--> 925 result = self._generate(
926 messages, stop=stop, run_manager=run_manager, **kwargs
927 )
928 else:
929 result = self._generate(messages, stop=stop, **kwargs)

File ~/rag-project/rag-sandbox/.venv/lib/python3.12/site-packages/langchain_mistralai/chat_models.py:547, in ChatMistralAI._generate(self, messages, stop, run_manager, stream, **kwargs)
545 message_dicts, params = self._create_message_dicts(messages, stop)
546 params = {**params, **kwargs}
--> 547 response = self.completion_with_retry(
548 messages=message_dicts, run_manager=run_manager, **params
549 )
550 return self._create_chat_result(response)

File ~/rag-project/rag-sandbox/.venv/lib/python3.12/site-packages/langchain_mistralai/chat_models.py:466, in ChatMistralAI.completion_with_retry(self, run_manager, **kwargs)
463 _raise_on_error(response)
464 return response.json()
--> 466 rtn = _completion_with_retry(**kwargs)
467 return rtn

File ~/rag-project/rag-sandbox/.venv/lib/python3.12/site-packages/langchain_mistralai/chat_models.py:463, in ChatMistralAI.completion_with_retry.._completion_with_retry(**kwargs)
461 else:
462 response = self.client.post(url="/chat/completions", json=kwargs)
--> 463 _raise_on_error(response)
464 return response.json()

File ~/rag-project/rag-sandbox/.venv/lib/python3.12/site-packages/langchain_mistralai/chat_models.py:170, in _raise_on_error(response)
168 if httpx.codes.is_error(response.status_code):
169 error_message = response.read().decode("utf-8")
--> 170 raise httpx.HTTPStatusError(
171 f"Error response {response.status_code} "
172 f"while fetching {response.url}: {error_message}",
173 request=response.request,
174 response=response,
175 )

HTTPStatusError: Error response 400 while fetching https://api.mistral.ai/v1/chat/completions: {"object":"error","message":"Received unsupported keyword format in schema.","type":"invalid_request_error","param":null,"code":null}

Description

I am trying to use langchain to identify dates for downstream filtering. I used the with_structured_output and it seemed to work out-of-the-box but I encountered some issues with the method='function_calling' approach (sometimes the model was not properly following the pydantic schema), so I tried using the method='json_schema' to constrain it another way.

I expected json_schema to work the same or better, but it did not. I got the stacktrace above.
I followed the problem down to the _convert_pydantic_to_openai_function method.

Using pydantic model_json_schema returns the following :

{
  "properties": {
    "date": { "format": "date", "title": "Date", "type": "string" },
    "required": ["date"],
    "title": "DummyClass",
    "type": "object"
  }
}

The problem lies in the format key. This key is not supported by mistral, nor openai as documented here.

Deleting this format key and providing the following description :

schema['properties']['date']['description'] = 'format:date'

does the call properly.

I believe that the _rm_titles function should be extended to remove all keys that are unsupported per the openai documentation.

System Info

System Information

OS: Linux
OS Version: #15-Ubuntu SMP PREEMPT_DYNAMIC Fri Jan 10 23:48:25 UTC 2025
Python Version: 3.12.7 (main, Jan 17 2025, 16:55:27) [GCC 14.2.0]

Package Information

langchain_core: 0.3.33
langchain: 0.3.2
langchain_community: 0.3.1
langsmith: 0.1.147
langchain_huggingface: 0.1.2
langchain_mistralai: 0.2.6
langchain_text_splitters: 0.3.5

Optional packages not installed

langserve

Other Dependencies

aiohttp: 3.11.11
async-timeout: Installed. No version info available.
dataclasses-json: 0.6.7
httpx: 0.28.1
httpx-sse: 0.4.0
huggingface-hub: 0.27.1
jsonpatch: 1.33
langsmith-pyo3: Installed. No version info available.
numpy: 1.26.4
orjson: 3.10.15
packaging: 24.2
pydantic: 2.10.5
pydantic-settings: 2.7.1
PyYAML: 6.0.2
requests: 2.32.3
requests-toolbelt: 1.0.0
sentence-transformers: 3.3.1
SQLAlchemy: 2.0.37
tenacity: 8.5.0
tokenizers: 0.21.0
transformers: 4.48.1
typing-extensions: 4.12.2

@dosubot dosubot bot added the 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature label Feb 5, 2025
@Aaryia
Copy link
Author

Aaryia commented Feb 5, 2025

I made a short PR which fixes the issue. It removes unsupported keywords from the schema using the _rm_titles function. I did not venture so far as to modify function names or parameter names, not knowing what impact it could have.

The main problem I have with this issue is that it does not constrain the generation, it simply converts it to the basic json schema without formatting or additional information. There might be a band-aid way of doing it, adding to the description value a string representation of the unsupported keywords. This still would not be a constraint, but it would help alignment of the LLM answer with our expected output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant