You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If an LLM generates two parallel tool calls for a final result tool, but the first one fails validation, there are two problems:
the second "good" call is ignored (this is arguably fine from a design standpoint)
more critically, because the good call is ignored no tool result is generated for it so the message history is incomplete. This is problematic.
I have a unit test that demonstrates the issue but I'm not sure how the maintainers would prefer to solve it. Currently, the design implicitly assumes that the LLM never generates two final tool calls (as the final tools are returned via result_schema.find_tool(tool_calls) as the first match to the tool call payload).
# tests/test_agent.py# inside class TestMultipleToolCallsdeftest_multiple_final_result_are_validated_correctly(self):
"""Tests that if multiple final results are returned, but one fails validation, the other is used."""defreturn_model(_: list[ModelMessage], info: AgentInfo) ->ModelResponse:
assertinfo.result_toolsisnotNonereturnModelResponse(
parts=[
ToolCallPart.from_raw_args('final_result', {'bad_value': 'first'}),
ToolCallPart.from_raw_args('final_result', {'value': 'second'}),
]
)
agent=Agent(FunctionModel(return_model), result_type=self.ResultType, end_strategy='early')
result=agent.run_sync('test multiple final results')
# Verify the result came from the second final toolassertresult.data.value=='second'# Verify we got appropriate tool returnsassertresult.new_messages()[-1].parts==snapshot(
[
ToolReturnPart(
tool_name='final_result', content='Final result processed.', timestamp=IsNow(tz=timezone.utc)
),
ToolReturnPart(
tool_name='final_result',
content='Result tool not used - a final result was already processed.',
timestamp=IsNow(tz=timezone.utc),
),
]
)
The text was updated successfully, but these errors were encountered:
I think what we should do in the short term is just ensure that all result tools still get called, but just use the first one as the final result.
I'm open to bigger changes to behavior long term, especially if there's an obvious better way to handle it, but just calling the provided tools (and adding the results to the message history) seems like an obvious/straightforward improvement. I'll try to do that today.
If an LLM generates two parallel tool calls for a final result tool, but the first one fails validation, there are two problems:
I have a unit test that demonstrates the issue but I'm not sure how the maintainers would prefer to solve it. Currently, the design implicitly assumes that the LLM never generates two final tool calls (as the final tools are returned via
result_schema.find_tool(tool_calls)
as the first match to the tool call payload).The text was updated successfully, but these errors were encountered: