Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PY] fix: #1705 - Output of multiple calls to same tool are overwritten #2228

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

andres-swax
Copy link

@andres-swax andres-swax commented Dec 10, 2024

The current implementation of the planner did not account for multiple calls to the same tool since it did not keep track of the call id (call.id) which is different for each call, regardless of which tool.

Therefore, the list was only being indexed by tool_id.

The result was that subsequent calls to the same tool within the same run would overwrite the result of the previous call, and at the end of the run, results would be missing triggering an Exception.

The issue is fixed by adding a sublevel to the list of tools called as part of a run (this list is stored on assistants_planner.py tool_map:dict) which does keep track of the unique call_id, so that their result values are not overwritten by subsequent calls to the same tool within the same run.

Linked issues

closes: #1705

Details

In order to trigger this issue, submit a prompt that triggers multiple calls to the same tool on the same run (i.e. which city is warmer today, LA or Chicago?). The same tool would be called once for each city (i.e. get-weather) but the data returned would only contain results for one city. This results on an Exception being raised. Steps can be found on #1705.

Change details

The fix consists on adding a second level to the dictionary of the tool calls (which contains the return values), so instead of being (pseudocode) tools[tool_id] it becomes tools[tool_id][call_id]. This way the return value of each and every call is retained.

Key change is that state.temp.action_outputs[command.action_id] becomes state.temp.action_outputs[command.action][command.action_id] on teams/ai/ai.py.

Attestation Checklist

  • My code follows the style guidelines of this project

  • I have checked for/fixed spelling, linting, and other errors

  • I have commented my code for clarity

  • I have made corresponding changes to the documentation (updating the doc strings in the code is sufficient)

  • My changes generate no new warnings

  • I have added tests that validates my changes, and provides sufficient test coverage. I have tested with:

    • Local testing
    • E2E testing in Teams
  • New and existing unit tests pass locally with my changes

Add a sublevel to the list of tools called as part of a run (this list is stored on assistants_planner.py `tool_map:dict`), so that their result values are not overwritten by subsequent calls to the same tool within the same run.

The current implementation did not account for multiple calls to the same tool as it did not keep track of the call id (`call.id`) which is different for each call, regardless of which tool. The list was only being indexed by `tool_id`. The result was that subsequent calls to the same tool would overwrite the result of the previous call.

The fix consists on adding a second level to the dictionary of the tool calls (which contains the return values), so instead of being (pseudocode) `tools[tool_id]` it becomes `tools[tool_id][call_id]`. This way the return value of each and every call is retained.
@andres-swax
Copy link
Author

@microsoft-github-policy-service agree

@corinagum
Copy link
Collaborator

@andres-swax thanks for the contribution! Could you please fix the failing tests and add a new one for the multiple tool calls scenario?

@andres-swax
Copy link
Author

@andres-swax thanks for the contribution! Could you please fix the failing tests and add a new one for the multiple tool calls scenario?

Hi, I'll try to do my best, I saw those failures yesterday and honestly, I could not understand a thing nor what I could do.

I felt the failures were on the Python libraries and hoping the messages were just noise.

My dev-ops software and workflow knowledge is still on Visual SourceSafe level or so. If I cannot make it work, i'll ask for the PR to be closed. On the meantime I'll give it a try for the next cpl of days.

@corinagum
Copy link
Collaborator

@andres-swax No rush, and we really appreciate your contribution! We may be able to keep the PR open and have someone help out the with tests, but I can't guarantee how soon that will be. Our bandwidth is pretty tight right now. Please add the tests if you can and feel free to ask questions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants