Refactor to a helper class for the agent's history (ConversationMemory) #7008

enyst · 2025-02-27T19:32:41Z

This PR proposes a ConversationMemory class, a partial refactoring of message_utils.py, which is:

an agent helper (it has access to state / state.history, can keep an instance of the agent's PromptManager)
doesn't have access to the event stream
the agent has a reference to it, just like it has to its prompt manager
it manages the agent's history, e.g.
- makes sure there are no "unpaired" events sent to the LLM (tool call without result or the other way around)
- or any other considerations that determine the actual events sent to the LLM
  - @csmith49 this is a small start proposal as discussed here
- it gets agent's history and it doesn't modify it
- can do the conversion from events to messages

In the PR Add RecallObservations for retrieval of prompt extensions, I hit the fact that the conversion event->message needs PromptManager templates for RecallObservations (example), so the options we have there seem to be: to send a PromptManager to the utils functions, or this PR (simple refactoring at the moment into an agent helper collaborating with the other agent helper, PromptManager)

The code isn't doing much, just refactor to encapsulate the message_utils functions that work on agent history.

`openhands` description:

This PR backports the conversation_memory.py class and related changes from PR #6909.

Changes included:

Added ConversationMemory class that processes event history into a coherent conversation for the agent
Modified CodeActAgent to use ConversationMemory instead of directly calling message_utils functions
Added unit tests for ConversationMemory

This backport does not include RecallObservations or any changes to prompt_manager.py as requested.

To run this PR locally, use the following command:

docker run -it --rm   -p 3000:3000   -v /var/run/docker.sock:/var/run/docker.sock   --add-host host.docker.internal:host-gateway   -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:647aaa7-nikolaik   --name openhands-app-647aaa7   docker.all-hands.dev/all-hands-ai/openhands:647aaa7

…e_utils.py

pyproject.toml

…debase

… error message

csmith49 · 2025-02-27T22:11:21Z

openhands/memory/conversation_memory.py

+        events = condensed_history
+
+        logger.debug(
+            f'Processing {len(events)} events from a total of {len(state.history)} events'


This looks like the only place where the state is used. Do we expect more need for state access in this function or can we simplify the arguments?

I did wonder that, and I think we may want State here, but my thoughts are not fully baked, because it also depends if we will want to keep condensers separate or trigger them from here:

maybe truncation_id (for the truncation that is now in the controller, is saved in state)

maybe start_id / end_id if we will manage delegates histories here (the "inner" delegate agent shares the same stream with the parent, but its history starts with a start_id, not with event 0)

Idk, there are plenty other things saved in state, and since this class is an agent helper explicitly, it has the "right" to them, so to speak. But it's entirely reasonable to remove it now, and bring it back when we need it!

No big deal either way, it's a simple change and I'm just trying to sort out the conceptual boundaries for this thing.

if we will want to keep condensers separate or trigger them from here

I expect it might be the latter, but I need to ponder for a bit. Might try a refactor if there's time this weekend.

openhands/agenthub/codeact_agent/codeact_agent.py

csmith49

I like the refactor away from message_utils.py, and it'll be good to have a conceptual place for managing the conversation.

I'm curious about the "memory" part of the name -- is this a holdover from the original PR or is there some state we plan on storing in the ConversationMemory? I can imagine an API whereby the ConversationMemory object takes in a state and parses all the necessary deltas to build an up-to-date conversation, but outside of the context window truncation I think that's already the State.history.

Co-authored-by: Calvin Smith <[email protected]>

enyst · 2025-02-27T23:21:45Z

I like the refactor away from message_utils.py, and it'll be good to have a conceptual place for managing the conversation.

It's minor and yet, encapsulation helps my slow brain 😊, and it has the right place I feel: linked to the agent, with all the agent has available, but cut off from the stream, and dealing with our event -> message issues.

I'm curious about the "memory" part of the name -- is this a holdover from the original PR or is there some state we plan on storing in the ConversationMemory? I can imagine an API whereby the ConversationMemory object takes in a state and parses all the necessary deltas to build an up-to-date conversation, but outside of the context window truncation I think that's already the State.history.

You're right it's a holdover, but I kinda like it... I do feel we could look at it differently than the Memory component proposed there though:

Memory is a component of the system (it has access to the stream) and its responsibility is information retrieval etc
ConversationMemory is an agent helper (it works with state or state.history or condensed_history; can request formatting appropriate for the context window) and it figures out the contents of the context window.

Please tell if this naive story makes sense: 🤔
The system components work with a stream of events. The agent controller insulates the agent from that, and gives it a filtered, static list of events in state.history. Yet, that's not what the LLM works with, either: at the other end, the LLM end, its memory is the context window; what it knows at every step is what the context window gets. ConversationMemory takes the history, maybe all, maybe truncated or condensed meanwhile, too, and makes the final pass [1][2][3], before deciding what exactly does the LLM remember.

Or to put it differently, the user and the agent/llm have a conversation:

from the user end, the user sees/remembers agent's state.history (named Conversation in the UI);
from the agent/llm end, the LLM remembers conversation_memory.

[1] it can include templating stuff)
[2] it can exclude some stuff still, even some events ("unpaired" ones).
[3] (just a hypothesis) if or when the user will have the ability to edit e.g. a micro agent .md file during a conversation, there is no concept of updating the stream; instead we would likely create a new Recall event, and it would be the job of this class to sort it out and send over only the last version... I think.

…t-conversation-memory

github-actions · 2025-02-28T00:33:30Z

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

github-actions · 2025-02-28T00:46:56Z

Trigger by: Pull Request (integration-test label on PR #7008)
Commit: 25ce029
Integration Tests Report (Haiku)
Haiku LLM Test Results:
Success rate: 100.00% (7/7)

Total cost: USD 0.11

instance_id	success	cost	error_message
t03_jupyter_write_file	True	0.012
t07_interactive_commands	True	0.013
t05_simple_browsing	True	0.005
t06_github_pr_browsing	True	0.031
t02_add_bash_hello	True	0.015
t01_fix_simple_typo	True	0.019
t04_git_staging	True	0.015	RuntimeError: Agent reached maximum iteration in headless mode. Current iteration: 10, max iteration: 10

Integration Tests Report (DeepSeek)
DeepSeek LLM Test Results:
Success rate: 85.71% (6/7)

Total cost: USD 0.01

instance_id	success	reason	cost	error_message
t03_jupyter_write_file	True		0.001	nan
t02_add_bash_hello	True		0.001	nan
t01_fix_simple_typo	True		0.002	nan
t04_git_staging	True		0.001	nan
t05_simple_browsing	False	The answer is not found in any message. Total messages: 2.	0.002	nan
t06_github_pr_browsing	True		0.004	nan
t07_interactive_commands	True		0.003	nan

Integration Tests Report Delegator (Haiku)
Success rate: 50.00% (1/2)

Total cost: USD 0.06

instance_id	success	reason	cost	error_message
t02_add_bash_hello	True		0.019
t01_fix_simple_typo	False	File not fixed: This is a silly typo.	0.042	RuntimeError: Agent reached maximum iteration in headless mode. Current iteration: 30, max iteration: 30
		Really?
		No more typos!
		Enjoy!

Integration Tests Report Delegator (DeepSeek)
Success rate: 100.00% (2/2)

Total cost: USD 0.00

instance_id	success	reason	cost	error_message
t01_fix_simple_typo	True		0.002	nan
t02_add_bash_hello	True		0.002	nan

Integration Tests Report VisualBrowsing (DeepSeek)
Success rate: 100.00% (1/1)

Total cost: USD 0.00

instance_id	success	reason	cost	error_message
t05_simple_browsing	True		0.001	nan

Download testing outputs (includes both Haiku and DeepSeek results): Download

github-actions · 2025-02-28T00:54:04Z

Running evaluation on the PR. Once eval is done, the results will be posted.

mamoodi · 2025-02-28T01:58:36Z

Evaluation results (Auto Reply): ## Summary

submitted instances: 5
empty patch instances: 0
resolved instances: 0
unresolved instances: 5
error instances: 5 - Download at: https://github.com/All-Hands-AI/pr-eval-results/releases/download/1.0.0/backport-conversation-memory-988_25-02-28-01-58.tar.gz

enyst · 2025-02-28T02:22:26Z

These are different instance ids since a PR earlier today. I don't see suspicious misses in the logs. Local eval works and has the prompts.

openhands-agent added 2 commits February 27, 2025 19:31

Backport conversation_memory.py from PR #6909

1cfb0aa

Fix formatting issues

10098d9

enyst mentioned this pull request Feb 27, 2025

Add RecallObservations for retrieval of prompt extensions #6909

Draft

6 tasks

openhands-agent added 6 commits February 27, 2025 19:36

Convert test_conversation_memory.py to use pytest instead of unittest

d90bc15

Expand test_conversation_memory.py with tests ported from test_messag…

9540757

…e_utils.py

Clean up message_utils.py to keep only token usage functions

a574aa5

Fix test failures in test_conversation_memory.py

5f49e12

Use pytest-mock instead of unittest.mock

9986f12

Apply formatting fixes from pre-commit

8fa5aec

enyst commented Feb 27, 2025

View reviewed changes

pyproject.toml Outdated Show resolved Hide resolved

Update pyproject.toml

8fe7af7

enyst commented Feb 27, 2025

View reviewed changes

pyproject.toml Outdated Show resolved Hide resolved

enyst and others added 2 commits February 27, 2025 21:46

Update pyproject.toml

9233dfd

Revert to using unittest.mock for consistency with the rest of the co…

fccb1bf

…debase

enyst marked this pull request as ready for review February 27, 2025 21:10

enyst requested a review from csmith49 February 27, 2025 21:10

Fix test_process_events_with_unknown_observation to match the correct…

4334845

… error message

enyst changed the title ~~Backport conversation_memory.py from PR #6909~~ Add a helper class for the agent's history (ConversationMemory) Feb 27, 2025

enyst changed the title ~~Add a helper class for the agent's history (ConversationMemory)~~ Refactor to a helper class for the agent's history (ConversationMemory) Feb 27, 2025

csmith49 reviewed Feb 27, 2025

View reviewed changes

openhands/agenthub/codeact_agent/codeact_agent.py Outdated Show resolved Hide resolved

csmith49 approved these changes Feb 27, 2025

View reviewed changes

enyst and others added 2 commits February 27, 2025 23:26

Update openhands/agenthub/codeact_agent/codeact_agent.py

b5e0166

Co-authored-by: Calvin Smith <[email protected]>

remove State

e4fd68d

enyst added 2 commits February 28, 2025 00:43

Merge branch 'main' of github.com:All-Hands-AI/OpenHands into backpor…

cacdde5

…t-conversation-memory

fix tests

647aaa7

enyst added the integration-test Runs integration tests on the PR label Feb 28, 2025

enyst added the run-eval-s Runs evaluation with 5 instances label Feb 28, 2025

enyst merged commit 0f07805 into main Feb 28, 2025
25 checks passed

enyst deleted the backport-conversation-memory branch February 28, 2025 02:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor to a helper class for the agent's history (ConversationMemory) #7008

Refactor to a helper class for the agent's history (ConversationMemory) #7008

enyst commented Feb 27, 2025 •

edited by github-actions bot

Loading

csmith49 Feb 27, 2025

enyst Feb 27, 2025

csmith49 Feb 27, 2025

enyst Feb 27, 2025

csmith49 left a comment

enyst commented Feb 27, 2025 •

edited

Loading

github-actions bot commented Feb 28, 2025

github-actions bot commented Feb 28, 2025

github-actions bot commented Feb 28, 2025

mamoodi commented Feb 28, 2025

enyst commented Feb 28, 2025 •

edited

Loading

Refactor to a helper class for the agent's history (ConversationMemory) #7008

Refactor to a helper class for the agent's history (ConversationMemory) #7008

Conversation

enyst commented Feb 27, 2025 • edited by github-actions bot Loading

openhands description:

csmith49 Feb 27, 2025

Choose a reason for hiding this comment

enyst Feb 27, 2025

Choose a reason for hiding this comment

csmith49 Feb 27, 2025

Choose a reason for hiding this comment

enyst Feb 27, 2025

Choose a reason for hiding this comment

csmith49 left a comment

Choose a reason for hiding this comment

enyst commented Feb 27, 2025 • edited Loading

github-actions bot commented Feb 28, 2025

github-actions bot commented Feb 28, 2025

github-actions bot commented Feb 28, 2025

mamoodi commented Feb 28, 2025

enyst commented Feb 28, 2025 • edited Loading

enyst commented Feb 27, 2025 •

edited by github-actions bot

Loading

`openhands` description:

enyst commented Feb 27, 2025 •

edited

Loading

enyst commented Feb 28, 2025 •

edited

Loading