Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Experimental] Memory-enabled agent #4510

Closed
wants to merge 101 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
101 commits
Select commit Hold shift + click to select a range
3775ce2
add memory-enabled agent
enyst Oct 13, 2024
744be40
Merge branch 'main' of github.com:All-Hands-AI/OpenHands into memory-…
enyst Oct 13, 2024
6f004c8
register agent
enyst Oct 13, 2024
97ef06f
try to use a list of events as history (ATTN will require tricks with…
enyst Oct 4, 2024
21f82e1
reset branch, tweak stream.py
enyst Oct 15, 2024
abda3f4
wip refactor methods
enyst Oct 15, 2024
0a7fb43
move compatibility method to evals
enyst Oct 16, 2024
981335c
retrieve history in the controller
enyst Oct 16, 2024
1ee26d7
adapt code to list
enyst Oct 16, 2024
267f3be
add filter by hidden
enyst Oct 16, 2024
9e5659c
remove history.py
enyst Oct 16, 2024
59c16d4
Merge branch 'main' of github.com:All-Hands-AI/OpenHands into enyst/e…
enyst Oct 16, 2024
6fc615f
fix types
enyst Oct 16, 2024
3a81363
refactoring in evals
enyst Oct 16, 2024
66f78d5
more adaptations in evals
enyst Oct 16, 2024
1de7b2b
rewrite history
enyst Oct 16, 2024
adc960f
actually remove history
enyst Oct 16, 2024
fac01d1
adapt stuck
enyst Oct 16, 2024
5eb3322
more adaptations
enyst Oct 16, 2024
21ede6d
fix delegate exclusion
enyst Oct 17, 2024
5f19a7c
create a delegate obs when the delegate ends with an error
enyst Oct 17, 2024
526190c
Merge branch 'main' of github.com:All-Hands-AI/OpenHands into enyst/e…
enyst Oct 17, 2024
696f5d1
fix merge
enyst Oct 17, 2024
fccc9f7
Merge branch 'enyst/eventstream-state' into memory-agent
enyst Oct 17, 2024
611d0e4
use event.id in memory, fix merge
enyst Oct 17, 2024
df3f0b6
wip add memory modules
enyst Oct 18, 2024
7b0a835
add get_last_user_message
enyst Oct 18, 2024
9e1cdcf
Merge branch 'enyst/eventstream-state' into memory-agent
enyst Oct 19, 2024
34a7b70
set user message
enyst Oct 19, 2024
cb60751
fix imports
enyst Oct 19, 2024
c235d61
fix objects
enyst Oct 19, 2024
7affbfd
add prompts
enyst Oct 19, 2024
386b835
rename, delete module we won't use
enyst Oct 19, 2024
bf8412a
fix prompting
enyst Oct 19, 2024
5e572db
tweaks to types
enyst Oct 19, 2024
225d330
added summarizer
khushvind Oct 20, 2024
4aedbc2
clean up duplicate
enyst Oct 20, 2024
2353c30
tweak prompts
enyst Oct 20, 2024
e04f77a
add action parser
enyst Oct 20, 2024
b77961b
added summary response
khushvind Oct 20, 2024
20c9fa8
added summary prompt
khushvind Oct 20, 2024
d36917b
tweak prompt
enyst Oct 20, 2024
143f16d
add strings
enyst Oct 20, 2024
fb90459
summarize and recall
enyst Oct 20, 2024
aad59fc
fix update
enyst Oct 20, 2024
10293e6
add these actions to history; in-context example
enyst Oct 20, 2024
1bf2d08
fix llm_config fallback
enyst Oct 15, 2024
16da4e2
unit tests
enyst Oct 16, 2024
53f7a78
fix schemas, utils
enyst Oct 20, 2024
f4ecd3a
add litellm embeddings for testing
enyst Oct 20, 2024
11b3242
fix var, run all stream embeddings on llama-index
enyst Oct 20, 2024
fcdfb19
add voyage ai embeddings
enyst Oct 21, 2024
8442841
fix template include
enyst Oct 21, 2024
67693a5
core memory split
enyst Oct 21, 2024
9ac47bf
tweak prompts
enyst Oct 21, 2024
5ad9ef4
fix leftover calls
enyst Oct 21, 2024
3742431
fix parser (o1 !!)
enyst Oct 21, 2024
083edd4
configurations wip
enyst Oct 21, 2024
b631e53
fixes; debugging test
enyst Oct 21, 2024
a060cbb
fix condensation; add debugging
enyst Oct 22, 2024
a25a867
add tokenizer from HF
enyst Oct 22, 2024
2a448f2
adapt action, prompt, some clean up logic
enyst Oct 22, 2024
6f9c922
remove eval script
enyst Oct 22, 2024
81b19c2
add script for testing, clean up obsolete content
enyst Oct 22, 2024
a858083
break down prompts; tweak core memory; rewrite algo
enyst Oct 23, 2024
1d582ac
fix tokenizer
enyst Oct 23, 2024
106bbb5
ruff
enyst Oct 24, 2024
b93c818
tweak template
enyst Oct 24, 2024
e75a489
add agent skills and yaml
enyst Oct 24, 2024
5df104d
break down agent skills
enyst Oct 24, 2024
7930457
create examples template
enyst Oct 24, 2024
9629a73
fix template loading
enyst Oct 24, 2024
bbd5211
remove obsolete md
enyst Oct 24, 2024
e2c343a
fix useless vars
enyst Oct 24, 2024
bf9b8ac
kill some whitespace
enyst Oct 24, 2024
6732359
strange leftover from another branch
enyst Oct 24, 2024
ada2ebd
tweak agent skill display
enyst Oct 24, 2024
1df7aaa
add user-defined template directory
enyst Oct 24, 2024
6141d0b
Merge branch 'enyst/refactor_template' into enyst/memory-agent
enyst Oct 24, 2024
4efcc02
ruff
enyst Oct 24, 2024
6f282b9
fix user prompt; bad coverage
enyst Oct 24, 2024
274ad61
Merge branch 'enyst/refactor_template' into enyst/memory-agent
enyst Oct 24, 2024
ad0b9b2
Merge branch 'main' of github.com:All-Hands-AI/OpenHands into enyst/e…
enyst Oct 24, 2024
11d82f2
save events as they happen
enyst Oct 25, 2024
54f60ac
clean up obsolete config var - sessions are always saved if filestore…
enyst Oct 25, 2024
d4d3aa0
Merge branch 'enyst/eventstream-state' into enyst/memory-agent
enyst Oct 25, 2024
99a257c
Merge branch 'main' of github.com:All-Hands-AI/OpenHands into enyst/e…
enyst Oct 25, 2024
c6a9028
init history for restored state
enyst Oct 26, 2024
9af6e5e
not worth caching delegates if only used once or twice per session
enyst Oct 26, 2024
04b6d70
init history from the event stream
enyst Oct 26, 2024
34e0f8a
remove script that got here by accident
enyst Oct 27, 2024
94c68be
save/restore state automatically
enyst Oct 27, 2024
93cfd32
tweak init/restore
enyst Oct 27, 2024
cfc158d
Merge branch 'main' of github.com:All-Hands-AI/OpenHands into enyst/e…
enyst Oct 27, 2024
f53e1cf
set delegates start explicitly; minor tweaks
enyst Oct 27, 2024
f42cbed
fix tests
enyst Oct 27, 2024
ebeab75
clean up verbose log
enyst Oct 27, 2024
a213c65
make extra sure we have a valid start
enyst Oct 27, 2024
41c03ad
Merge branch 'enyst/eventstream-state' into enyst/memory-agent
enyst Oct 27, 2024
63284d3
poetry lock
enyst Oct 27, 2024
4d05ab1
update summarize prompt
enyst Oct 31, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions config.template.toml
Original file line number Diff line number Diff line change
Expand Up @@ -171,6 +171,24 @@ model = "gpt-4o"
# If model is vision capable, this option allows to disable image processing (useful for cost reduction).
#disable_vision = true

# maximum number of messages in a conversation, after which they are truncated or summarized
# max_conversation_window = 10

# number of results when recalling message history
# conversation_top_k = 5

# fraction of the conversation window to summarize
# message_summary_trunc_tokens_fraction = 0.75

# summary LLM
[llm.summary]
model = "deepseek"

# default LLM
[llm.default]
model = "claude"


[llm.gpt4o-mini]
api_key = "your-api-key"
model = "gpt-4o"
Expand Down
7 changes: 4 additions & 3 deletions evaluation/EDA/run_infer.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
from evaluation.utils.shared import (
EvalMetadata,
EvalOutput,
compatibility_for_eval_history_pairs,
make_metadata,
prepare_dataset,
reset_logger_for_multiprocessing,
Expand All @@ -34,7 +35,7 @@ def codeact_user_response_eda(state: State) -> str:

# retrieve the latest model message from history
if state.history:
model_guess = state.history.get_last_agent_message()
model_guess = state.get_last_agent_message()

assert game is not None, 'Game is not initialized.'
msg = game.generate_user_response(model_guess)
Expand Down Expand Up @@ -139,7 +140,7 @@ def process_instance(
if state is None:
raise ValueError('State should not be None.')

final_message = state.history.get_last_agent_message()
final_message = state.get_last_agent_message()

logger.info(f'Final message: {final_message} | Ground truth: {instance["text"]}')
test_result = game.reward()
Expand All @@ -148,7 +149,7 @@ def process_instance(
# history is now available as a stream of events, rather than list of pairs of (Action, Observation)
# for compatibility with the existing output format, we can remake the pairs here
# remove when it becomes unnecessary
histories = state.history.compatibility_for_eval_history_pairs()
histories = compatibility_for_eval_history_pairs(state.history)

# Save the output
output = EvalOutput(
Expand Down
5 changes: 3 additions & 2 deletions evaluation/agent_bench/run_infer.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
from evaluation.utils.shared import (
EvalMetadata,
EvalOutput,
compatibility_for_eval_history_pairs,
make_metadata,
prepare_dataset,
reset_logger_for_multiprocessing,
Expand Down Expand Up @@ -242,7 +243,7 @@ def process_instance(
raw_ans = ''

# retrieve the last agent message or thought
for event in state.history.get_events(reverse=True):
for event in reversed(state.history):
if event.source == 'agent':
if isinstance(event, AgentFinishAction):
raw_ans = event.thought
Expand Down Expand Up @@ -271,7 +272,7 @@ def process_instance(
# history is now available as a stream of events, rather than list of pairs of (Action, Observation)
# for compatibility with the existing output format, we can remake the pairs here
# remove when it becomes unnecessary
histories = state.history.compatibility_for_eval_history_pairs()
histories = compatibility_for_eval_history_pairs(state.history)

metrics = state.metrics.get() if state.metrics else None

Expand Down
3 changes: 2 additions & 1 deletion evaluation/aider_bench/run_infer.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
from evaluation.utils.shared import (
EvalMetadata,
EvalOutput,
compatibility_for_eval_history_pairs,
make_metadata,
prepare_dataset,
reset_logger_for_multiprocessing,
Expand Down Expand Up @@ -250,7 +251,7 @@ def process_instance(
# history is now available as a stream of events, rather than list of pairs of (Action, Observation)
# for compatibility with the existing output format, we can remake the pairs here
# remove when it becomes unnecessary
histories = state.history.compatibility_for_eval_history_pairs()
histories = compatibility_for_eval_history_pairs(state.history)
metrics = state.metrics.get() if state.metrics else None

# Save the output
Expand Down
3 changes: 2 additions & 1 deletion evaluation/biocoder/run_infer.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
EvalMetadata,
EvalOutput,
codeact_user_response,
compatibility_for_eval_history_pairs,
make_metadata,
prepare_dataset,
reset_logger_for_multiprocessing,
Expand Down Expand Up @@ -299,7 +300,7 @@ def process_instance(
# history is now available as a stream of events, rather than list of pairs of (Action, Observation)
# for compatibility with the existing output format, we can remake the pairs here
# remove when it becomes unnecessary
histories = state.history.compatibility_for_eval_history_pairs()
histories = compatibility_for_eval_history_pairs(state.history)

test_result['generated'] = test_result['metadata']['1_copy_change_code']

Expand Down
5 changes: 3 additions & 2 deletions evaluation/bird/run_infer.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
from evaluation.utils.shared import (
EvalMetadata,
EvalOutput,
compatibility_for_eval_history_pairs,
make_metadata,
prepare_dataset,
reset_logger_for_multiprocessing,
Expand Down Expand Up @@ -46,7 +47,7 @@ def codeact_user_response(state: State) -> str:
# check if the agent has tried to talk to the user 3 times, if so, let the agent know it can give up
user_msgs = [
event
for event in state.history.get_events()
for event in state.history
if isinstance(event, MessageAction) and event.source == 'user'
]
if len(user_msgs) > 2:
Expand Down Expand Up @@ -431,7 +432,7 @@ def execute_sql(db_path, sql):
# history is now available as a stream of events, rather than list of pairs of (Action, Observation)
# for compatibility with the existing output format, we can remake the pairs here
# remove when it becomes unnecessary
histories = state.history.compatibility_for_eval_history_pairs()
histories = compatibility_for_eval_history_pairs(state.history)

# Save the output
output = EvalOutput(
Expand Down
3 changes: 2 additions & 1 deletion evaluation/browsing_delegation/run_infer.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
from evaluation.utils.shared import (
EvalMetadata,
EvalOutput,
compatibility_for_eval_history_pairs,
make_metadata,
prepare_dataset,
reset_logger_for_multiprocessing,
Expand Down Expand Up @@ -89,7 +90,7 @@ def process_instance(
# history is now available as a stream of events, rather than list of pairs of (Action, Observation)
# for compatibility with the existing output format, we can remake the pairs here
# remove when it becomes unnecessary
histories = state.history.compatibility_for_eval_history_pairs()
histories = compatibility_for_eval_history_pairs(state.history)

# find the last delegate action
last_delegate_action = None
Expand Down
5 changes: 3 additions & 2 deletions evaluation/gaia/run_infer.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
EvalMetadata,
EvalOutput,
codeact_user_response,
compatibility_for_eval_history_pairs,
make_metadata,
prepare_dataset,
reset_logger_for_multiprocessing,
Expand Down Expand Up @@ -166,7 +167,7 @@ def process_instance(

model_answer_raw = ''
# get the last message or thought from the agent
for event in state.history.get_events(reverse=True):
for event in reversed(state.history):
if event.source == 'agent':
if isinstance(event, AgentFinishAction):
model_answer_raw = event.thought
Expand Down Expand Up @@ -203,7 +204,7 @@ def process_instance(
# history is now available as a stream of events, rather than list of pairs of (Action, Observation)
# for compatibility with the existing output format, we can remake the pairs here
# remove when it becomes unnecessary
histories = state.history.compatibility_for_eval_history_pairs()
histories = compatibility_for_eval_history_pairs(state.history)

# Save the output
output = EvalOutput(
Expand Down
5 changes: 3 additions & 2 deletions evaluation/gorilla/run_infer.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
EvalMetadata,
EvalOutput,
codeact_user_response,
compatibility_for_eval_history_pairs,
make_metadata,
prepare_dataset,
reset_logger_for_multiprocessing,
Expand Down Expand Up @@ -101,7 +102,7 @@ def process_instance(
raise ValueError('State should not be None.')

# retrieve the last message from the agent
model_answer_raw = state.history.get_last_agent_message()
model_answer_raw = state.get_last_agent_message()

# attempt to parse model_answer
ast_eval_fn = instance['ast_eval']
Expand All @@ -114,7 +115,7 @@ def process_instance(
# history is now available as a stream of events, rather than list of pairs of (Action, Observation)
# for compatibility with the existing output format, we can remake the pairs here
# remove when it becomes unnecessary
histories = state.history.compatibility_for_eval_history_pairs()
histories = compatibility_for_eval_history_pairs(state.history)

output = EvalOutput(
instance_id=instance_id,
Expand Down
5 changes: 3 additions & 2 deletions evaluation/gpqa/run_infer.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@
from evaluation.utils.shared import (
EvalMetadata,
EvalOutput,
compatibility_for_eval_history_pairs,
make_metadata,
prepare_dataset,
reset_logger_for_multiprocessing,
Expand Down Expand Up @@ -244,7 +245,7 @@ def process_instance(
'C': False,
'D': False,
}
for event in state.history.get_events(reverse=True):
for event in reversed(state.history):
if (
isinstance(event, AgentFinishAction)
and event.source != 'user'
Expand Down Expand Up @@ -300,7 +301,7 @@ def process_instance(
instance_id=str(instance.instance_id),
instruction=instruction,
metadata=metadata,
history=state.history.compatibility_for_eval_history_pairs(),
history=compatibility_for_eval_history_pairs(state.history),
metrics=metrics,
error=state.last_error if state and state.last_error else None,
test_result={
Expand Down
3 changes: 2 additions & 1 deletion evaluation/humanevalfix/run_infer.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
EvalMetadata,
EvalOutput,
codeact_user_response,
compatibility_for_eval_history_pairs,
make_metadata,
prepare_dataset,
reset_logger_for_multiprocessing,
Expand Down Expand Up @@ -255,7 +256,7 @@ def process_instance(
# history is now available as a stream of events, rather than list of pairs of (Action, Observation)
# for compatibility with the existing output format, we can remake the pairs here
# remove when it becomes unnecessary
histories = state.history.compatibility_for_eval_history_pairs()
histories = compatibility_for_eval_history_pairs(state.history)

# Save the output
output = EvalOutput(
Expand Down
2 changes: 1 addition & 1 deletion evaluation/integration_tests/run_infer.py
Original file line number Diff line number Diff line change
Expand Up @@ -122,7 +122,7 @@ def process_instance(
# # result evaluation
# # =============================================

histories = state.history.get_events()
histories = state.history
test_result: TestResult = test_class.verify_result(runtime, histories)
metrics = state.metrics.get() if state.metrics else None

Expand Down
5 changes: 3 additions & 2 deletions evaluation/logic_reasoning/run_infer.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
EvalMetadata,
EvalOutput,
codeact_user_response,
compatibility_for_eval_history_pairs,
make_metadata,
prepare_dataset,
reset_logger_for_multiprocessing,
Expand Down Expand Up @@ -225,7 +226,7 @@ def process_instance(
raise ValueError('State should not be None.')

final_message = ''
for event in state.history.get_events(reverse=True):
for event in reversed(state.history):
if isinstance(event, AgentFinishAction):
final_message = event.thought
break
Expand All @@ -247,7 +248,7 @@ def process_instance(
# history is now available as a stream of events, rather than list of pairs of (Action, Observation)
# for compatibility with the existing output format, we can remake the pairs here
# remove when it becomes unnecessary
histories = state.history.compatibility_for_eval_history_pairs()
histories = compatibility_for_eval_history_pairs(state.history)

# Save the output
output = EvalOutput(
Expand Down
5 changes: 3 additions & 2 deletions evaluation/miniwob/run_infer.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
from evaluation.utils.shared import (
EvalMetadata,
EvalOutput,
compatibility_for_eval_history_pairs,
make_metadata,
prepare_dataset,
reset_logger_for_multiprocessing,
Expand Down Expand Up @@ -152,7 +153,7 @@ def process_instance(

# Instruction is the first message from the USER
instruction = ''
for event in state.history.get_events():
for event in state.history:
if isinstance(event, MessageAction):
instruction = event.content
break
Expand All @@ -164,7 +165,7 @@ def process_instance(
# history is now available as a stream of events, rather than list of pairs of (Action, Observation)
# for compatibility with the existing output format, we can remake the pairs here
# remove when it becomes unnecessary
histories = state.history.compatibility_for_eval_history_pairs()
histories = compatibility_for_eval_history_pairs(state.history)

# Save the output
output = EvalOutput(
Expand Down
9 changes: 7 additions & 2 deletions evaluation/mint/run_infer.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
from evaluation.utils.shared import (
EvalMetadata,
EvalOutput,
compatibility_for_eval_history_pairs,
make_metadata,
prepare_dataset,
reset_logger_for_multiprocessing,
Expand All @@ -28,6 +29,7 @@
from openhands.core.logger import openhands_logger as logger
from openhands.core.main import create_runtime, run_controller
from openhands.events.action import (
Action,
CmdRunAction,
MessageAction,
)
Expand All @@ -45,7 +47,10 @@ def codeact_user_response_mint(state: State, task: Task, task_config: dict[str,
task=task,
task_config=task_config,
)
last_action = state.history.get_last_action()
last_action = next(
(event for event in reversed(state.history) if isinstance(event, Action)),
None,
)
result_state: TaskState = env.step(last_action.message or '')

state.extra_data['task_state'] = result_state
Expand Down Expand Up @@ -202,7 +207,7 @@ def process_instance(
# history is now available as a stream of events, rather than list of pairs of (Action, Observation)
# for compatibility with the existing output format, we can remake the pairs here
# remove when it becomes unnecessary
histories = state.history.compatibility_for_eval_history_pairs()
histories = compatibility_for_eval_history_pairs(state.history)

# Save the output
output = EvalOutput(
Expand Down
3 changes: 2 additions & 1 deletion evaluation/ml_bench/run_infer.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
EvalMetadata,
EvalOutput,
codeact_user_response,
compatibility_for_eval_history_pairs,
make_metadata,
prepare_dataset,
reset_logger_for_multiprocessing,
Expand Down Expand Up @@ -256,7 +257,7 @@ def process_instance(instance: Any, metadata: EvalMetadata, reset_logger: bool =
# history is now available as a stream of events, rather than list of pairs of (Action, Observation)
# for compatibility with the existing output format, we can remake the pairs here
# remove when it becomes unnecessary
histories = state.history.compatibility_for_eval_history_pairs()
histories = compatibility_for_eval_history_pairs(state.history)

# Save the output
output = EvalOutput(
Expand Down
3 changes: 2 additions & 1 deletion evaluation/swe_bench/run_infer.py
Original file line number Diff line number Diff line change
Expand Up @@ -430,7 +430,8 @@ def process_instance(
if state is None:
raise ValueError('State should not be None.')

histories = [event_to_dict(event) for event in state.history.get_events()]
# NOTE: this is NO LONGER the event stream, but an agent history that includes delegate agent's events
histories = [event_to_dict(event) for event in state.history]
metrics = state.metrics.get() if state.metrics else None

# Save the output
Expand Down
Loading