Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support both tool call and message text handling in response #675

Open
Finndersen opened this issue Jan 13, 2025 · 11 comments
Open

Support both tool call and message text handling in response #675

Finndersen opened this issue Jan 13, 2025 · 11 comments
Labels
enhancement New feature or request

Comments

@Finndersen
Copy link

Finndersen commented Jan 13, 2025

At the moment it appears that only only Tool calls OR message text processing is supported, not both at the same time.

I'm not sure if all LLMs support providing both, but it appears some do:
OpenAI example
Anthropic documentation appears to indicate it is possible

There are cases where a LLM may perform a tool call as a final response (doesn't need to see the results), along with an associated relevant message. Enabling handling of both would avoid an additional request/response cycle to the LLM.

I think this is somewhat related to this issue: #127, and this PR: #142, however slightly different since I believe those two are about whether a tool call can end a "conversation", which would also be a necessary capability to resolve this issue, but would also involve additional capability to handle both the tool call and then return the message text as the "final message" content.

@samuelcolvin
Copy link
Member

This should have been fixed by #468, if not please let us know what's missing.

@samuelcolvin samuelcolvin added the more info More information required label Jan 16, 2025
@Finndersen
Copy link
Author

Finndersen commented Jan 16, 2025

@samuelcolvin this issue is referring to behaviour of run_sync() (_handle_model_response()), which looks like isn't changed by that PR (hard to tell).
Are you referring to this issue? #678

@samuelcolvin
Copy link
Member

samuelcolvin commented Jan 16, 2025

Oh, sorry, I was going too quickly.

We support a mixture of both messages and tool calls, but it's not clear what we should do with the messages if there are tool calls.

What would you like to happen?

@Finndersen
Copy link
Author

Finndersen commented Jan 16, 2025

I think that in the case of a tool call ending a run (#142) then the behaviour is quite straightforward - just return the text message as the result of the run (assuming the tool is side-effect based and doesn't return something useful itself).

What to do with text content of tool calls mid-run is less clear, but I think it would be useful to at least have a mechanism to store and make them accessible for the developer to decide what to do with them?

Copy link

This issue is stale, and will be closed in 3 days if no reply is received.

@github-actions github-actions bot added the Stale label Jan 23, 2025
@Finndersen
Copy link
Author

Not stale

@dmontagu
Copy link
Contributor

dmontagu commented Jan 24, 2025

So, currently, if you allow text as a final response, and get a response with both tool calls and text, we don't treat the text as the final output (we assume the results of the tool calls should be provided back to the model before it produces a final response).

While this would be easy to change, I think in many important cases the current behavior is desirable. In particular, I believe @sydney-runkle has run into cases recently (I believe while using anthropic?) where, when making tool calls, the model would generally include some text in the response describing the tool calls being made.

That said, I totally understand why you might want to treat the text part of the response as the final result even when tool calls are present, so I think the main question here is — how should we make it possible to control this behavior?

I think it might be reasonable to add a setting for this to Agent; if we wanted to do that the PR to do so would be straightforward, but I'd be interested to get @samuelcolvin and @sydney-runkle's thoughts before charging ahead. In particular, adding settings increases maintenance burden, so if we can find any way to accomplish the goal without introducing new configuration options (which frequently need testing in combination with all the other configuration options), I think that would be preferable.

@github-actions github-actions bot removed the Stale label Jan 24, 2025
@Finndersen
Copy link
Author

Finndersen commented Jan 24, 2025

Just to clarify, if tool run-ending like #142 was implemented, then it would still be possible to inspect the message history of the run and find the text part of the ModelResponse associated with the tool call?

So the aim would be to potentially make that content more readily accessible in this scenario? like maybe assigning it to the RunResult.data if structured output is not being used?

What would RunResult.data be set to otherwise in this case?

@sydney-runkle sydney-runkle added bug Something isn't working enhancement New feature or request and removed more info More information required bug Something isn't working labels Jan 24, 2025
@sydney-runkle
Copy link
Member

Side note, this looks similar to #149

@Finndersen
Copy link
Author

Finndersen commented Jan 27, 2025

Could this, #142 and #677 be addressed by breaking up the Agent black box into more granular components that can be used with the new graph framework to have complete control over tool calling/message handling behaviour?

Is that whats happening here #725 ?

I guess it's a trade-off of simplicity for common use cases vs flexibility for edge cases.

@neon-john
Copy link

neon-john commented Feb 4, 2025

So, currently, if you allow text as a final response, and get a response with both tool calls and text, we don't treat the text as the final output (we assume the results of the tool calls should be provided back to the model before it produces a final response).

There is a big difference between how non-streaming and streaming currently handle this.

What you described is the behavior for non-streaming responses -- tool calls in the presence of text are still executed.

For streaming responses, when not using a result tool, it is the opposite -- any text causes an End (_MarkFinalResult) regardless of tool calls, which causes issue 149.

I would suggest the client be able to supply a callback to determine when a response is the end based on received_text and list[tool_name] (with a default behavior that could be similar to today, although reconciling the stream/nonstream behavior would be good)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants