Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor to a helper class for the agent's history (ConversationMemory) #7008

Merged
merged 16 commits into from
Feb 28, 2025

Conversation

enyst
Copy link
Collaborator

@enyst enyst commented Feb 27, 2025

This PR proposes a ConversationMemory class, a partial refactoring of message_utils.py, which is:

  • an agent helper (it has access to state / state.history, can keep an instance of the agent's PromptManager)
  • doesn't have access to the event stream
  • the agent has a reference to it, just like it has to its prompt manager
  • it manages the agent's history, e.g.
    • makes sure there are no "unpaired" events sent to the LLM (tool call without result or the other way around)
    • or any other considerations that determine the actual events sent to the LLM
    • it gets agent's history and it doesn't modify it
    • can do the conversion from events to messages

In the PR Add RecallObservations for retrieval of prompt extensions, I hit the fact that the conversion event->message needs PromptManager templates for RecallObservations (example), so the options we have there seem to be: to send a PromptManager to the utils functions, or this PR (simple refactoring at the moment into an agent helper collaborating with the other agent helper, PromptManager)

The code isn't doing much, just refactor to encapsulate the message_utils functions that work on agent history.


openhands description:

This PR backports the conversation_memory.py class and related changes from PR #6909.

Changes included:

  1. Added ConversationMemory class that processes event history into a coherent conversation for the agent
  2. Modified CodeActAgent to use ConversationMemory instead of directly calling message_utils functions
  3. Added unit tests for ConversationMemory

This backport does not include RecallObservations or any changes to prompt_manager.py as requested.


To run this PR locally, use the following command:

docker run -it --rm   -p 3000:3000   -v /var/run/docker.sock:/var/run/docker.sock   --add-host host.docker.internal:host-gateway   -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:647aaa7-nikolaik   --name openhands-app-647aaa7   docker.all-hands.dev/all-hands-ai/openhands:647aaa7

@enyst enyst marked this pull request as ready for review February 27, 2025 21:10
@enyst enyst requested a review from csmith49 February 27, 2025 21:10
@enyst enyst changed the title Backport conversation_memory.py from PR #6909 Add a helper class for the agent's history (ConversationMemory) Feb 27, 2025
@enyst enyst changed the title Add a helper class for the agent's history (ConversationMemory) Refactor to a helper class for the agent's history (ConversationMemory) Feb 27, 2025
events = condensed_history

logger.debug(
f'Processing {len(events)} events from a total of {len(state.history)} events'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like the only place where the state is used. Do we expect more need for state access in this function or can we simplify the arguments?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did wonder that, and I think we may want State here, but my thoughts are not fully baked, because it also depends if we will want to keep condensers separate or trigger them from here:

  • maybe truncation_id (for the truncation that is now in the controller, is saved in state)
  • maybe start_id / end_id if we will manage delegates histories here (the "inner" delegate agent shares the same stream with the parent, but its history starts with a start_id, not with event 0)

Idk, there are plenty other things saved in state, and since this class is an agent helper explicitly, it has the "right" to them, so to speak. But it's entirely reasonable to remove it now, and bring it back when we need it!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No big deal either way, it's a simple change and I'm just trying to sort out the conceptual boundaries for this thing.

if we will want to keep condensers separate or trigger them from here

I expect it might be the latter, but I need to ponder for a bit. Might try a refactor if there's time this weekend.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

Copy link
Collaborator

@csmith49 csmith49 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the refactor away from message_utils.py, and it'll be good to have a conceptual place for managing the conversation.

I'm curious about the "memory" part of the name -- is this a holdover from the original PR or is there some state we plan on storing in the ConversationMemory? I can imagine an API whereby the ConversationMemory object takes in a state and parses all the necessary deltas to build an up-to-date conversation, but outside of the context window truncation I think that's already the State.history.

@enyst
Copy link
Collaborator Author

enyst commented Feb 27, 2025

I like the refactor away from message_utils.py, and it'll be good to have a conceptual place for managing the conversation.

It's minor and yet, encapsulation helps my slow brain 😊, and it has the right place I feel: linked to the agent, with all the agent has available, but cut off from the stream, and dealing with our event -> message issues.

I'm curious about the "memory" part of the name -- is this a holdover from the original PR or is there some state we plan on storing in the ConversationMemory? I can imagine an API whereby the ConversationMemory object takes in a state and parses all the necessary deltas to build an up-to-date conversation, but outside of the context window truncation I think that's already the State.history.

You're right it's a holdover, but I kinda like it... I do feel we could look at it differently than the Memory component proposed there though:

  • Memory is a component of the system (it has access to the stream) and its responsibility is information retrieval etc
  • ConversationMemory is an agent helper (it works with state or state.history or condensed_history; can request formatting appropriate for the context window) and it figures out the contents of the context window.

Please tell if this naive story makes sense: 🤔
The system components work with a stream of events. The agent controller insulates the agent from that, and gives it a filtered, static list of events in state.history. Yet, that's not what the LLM works with, either: at the other end, the LLM end, its memory is the context window; what it knows at every step is what the context window gets. ConversationMemory takes the history, maybe all, maybe truncated or condensed meanwhile, too, and makes the final pass [1][2][3], before deciding what exactly does the LLM remember.

Or to put it differently, the user and the agent/llm have a conversation:

  • from the user end, the user sees/remembers agent's state.history (named Conversation in the UI);
  • from the agent/llm end, the LLM remembers conversation_memory.

[1] it can include templating stuff)
[2] it can exclude some stuff still, even some events ("unpaired" ones).
[3] (just a hypothesis) if or when the user will have the ability to edit e.g. a micro agent .md file during a conversation, there is no concept of updating the stream; instead we would likely create a new Recall event, and it would be the job of this class to sort it out and send over only the last version... I think.

@enyst enyst added the integration-test Runs integration tests on the PR label Feb 28, 2025
Copy link
Contributor

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

Copy link
Contributor

Trigger by: Pull Request (integration-test label on PR #7008)
Commit: 25ce029
Integration Tests Report (Haiku)
Haiku LLM Test Results:
Success rate: 100.00% (7/7)

Total cost: USD 0.11

instance_id success reason cost error_message
t03_jupyter_write_file True 0.012
t07_interactive_commands True 0.013
t05_simple_browsing True 0.005
t06_github_pr_browsing True 0.031
t02_add_bash_hello True 0.015
t01_fix_simple_typo True 0.019
t04_git_staging True 0.015 RuntimeError: Agent reached maximum iteration in headless mode. Current iteration: 10, max iteration: 10

Integration Tests Report (DeepSeek)
DeepSeek LLM Test Results:
Success rate: 85.71% (6/7)

Total cost: USD 0.01

instance_id success reason cost error_message
t03_jupyter_write_file True 0.001 nan
t02_add_bash_hello True 0.001 nan
t01_fix_simple_typo True 0.002 nan
t04_git_staging True 0.001 nan
t05_simple_browsing False The answer is not found in any message. Total messages: 2. 0.002 nan
t06_github_pr_browsing True 0.004 nan
t07_interactive_commands True 0.003 nan

Integration Tests Report Delegator (Haiku)
Success rate: 50.00% (1/2)

Total cost: USD 0.06

instance_id success reason cost error_message
t02_add_bash_hello True 0.019
t01_fix_simple_typo False File not fixed: This is a silly typo. 0.042 RuntimeError: Agent reached maximum iteration in headless mode. Current iteration: 30, max iteration: 30
Really?
No more typos!
Enjoy!

Integration Tests Report Delegator (DeepSeek)
Success rate: 100.00% (2/2)

Total cost: USD 0.00

instance_id success reason cost error_message
t01_fix_simple_typo True 0.002 nan
t02_add_bash_hello True 0.002 nan

Integration Tests Report VisualBrowsing (DeepSeek)
Success rate: 100.00% (1/1)

Total cost: USD 0.00

instance_id success reason cost error_message
t05_simple_browsing True 0.001 nan

Download testing outputs (includes both Haiku and DeepSeek results): Download

@enyst enyst added the run-eval-s Runs evaluation with 5 instances label Feb 28, 2025
Copy link
Contributor

Running evaluation on the PR. Once eval is done, the results will be posted.

@mamoodi
Copy link
Collaborator

mamoodi commented Feb 28, 2025

Evaluation results (Auto Reply): ## Summary

@enyst
Copy link
Collaborator Author

enyst commented Feb 28, 2025

These are different instance ids since a PR earlier today. I don't see suspicious misses in the logs. Local eval works and has the prompts.

@enyst enyst merged commit 0f07805 into main Feb 28, 2025
25 checks passed
@enyst enyst deleted the backport-conversation-memory branch February 28, 2025 02:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
integration-test Runs integration tests on the PR run-eval-s Runs evaluation with 5 instances
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants