If the LLM generates multiple final tool calls, but one fails validation, the wrong messages are generated #672

jlowin · 2025-01-13T16:26:08Z

If an LLM generates two parallel tool calls for a final result tool, but the first one fails validation, there are two problems:

the second "good" call is ignored (this is arguably fine from a design standpoint)
more critically, because the good call is ignored no tool result is generated for it so the message history is incomplete. This is problematic.

I have a unit test that demonstrates the issue but I'm not sure how the maintainers would prefer to solve it. Currently, the design implicitly assumes that the LLM never generates two final tool calls (as the final tools are returned via result_schema.find_tool(tool_calls) as the first match to the tool call payload).

# tests/test_agent.py

# inside class TestMultipleToolCalls

    def test_multiple_final_result_are_validated_correctly(self):
        """Tests that if multiple final results are returned, but one fails validation, the other is used."""

        def return_model(_: list[ModelMessage], info: AgentInfo) -> ModelResponse:
            assert info.result_tools is not None
            return ModelResponse(
                parts=[
                    ToolCallPart.from_raw_args('final_result', {'bad_value': 'first'}),
                    ToolCallPart.from_raw_args('final_result', {'value': 'second'}),
                ]
            )

        agent = Agent(FunctionModel(return_model), result_type=self.ResultType, end_strategy='early')
        result = agent.run_sync('test multiple final results')

        # Verify the result came from the second final tool
        assert result.data.value == 'second'

        # Verify we got appropriate tool returns
        assert result.new_messages()[-1].parts == snapshot(
            [
                ToolReturnPart(
                    tool_name='final_result', content='Final result processed.', timestamp=IsNow(tz=timezone.utc)
                ),
                ToolReturnPart(
                    tool_name='final_result',
                    content='Result tool not used - a final result was already processed.',
                    timestamp=IsNow(tz=timezone.utc),
                ),
            ]
        )

The text was updated successfully, but these errors were encountered:

dmontagu · 2025-02-13T14:37:36Z

Sorry for the delay in resolving this.

I think what we should do in the short term is just ensure that all result tools still get called, but just use the first one as the final result.

I'm open to bigger changes to behavior long term, especially if there's an obvious better way to handle it, but just calling the provided tools (and adding the results to the message history) seems like an obvious/straightforward improvement. I'll try to do that today.

samuelcolvin added the bug Something isn't working label Jan 16, 2025

dmontagu linked a pull request Feb 13, 2025 that will close this issue

Fix bug related to handling multiple result tools #926

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

If the LLM generates multiple final tool calls, but one fails validation, the wrong messages are generated #672

If the LLM generates multiple final tool calls, but one fails validation, the wrong messages are generated #672

jlowin commented Jan 13, 2025 •

edited

Loading

dmontagu commented Feb 13, 2025

If the LLM generates multiple final tool calls, but one fails validation, the wrong messages are generated #672

If the LLM generates multiple final tool calls, but one fails validation, the wrong messages are generated #672

Comments

jlowin commented Jan 13, 2025 • edited Loading

dmontagu commented Feb 13, 2025

jlowin commented Jan 13, 2025 •

edited

Loading