Trajectory replay: Fix a few corner cases #6380

li-boxuan · 2025-01-21T06:34:01Z

End-user friendly description of the problem this fixes or functionality that this introduces

Include this change in the Release Notes. If checked, you must provide an end-user friendly description for your change below

Fix two corner cases handling in trajectory replay feature.

Give a summary of what the PR does, explaining any non-trivial design decisions

Two corner cases were missing in the previous PR #6215:

When there's a wait_for_response message, replay gets stuck, waiting for user's response, which doesn't make sense when in the middle of a replay. This is demonstrated in demo2.json and demo3.json.
The trajectory dumped from the GUI would contain environmental actions, which shall be skipped during replay. This is demonstrated in demo1.json (Note: trajectory export from GUI is not available yet; demo1.json is downloaded using the PR (feat) Add button to export trajectory on chat panel #6378).

demo1.json - GUI mode: downloaded from web GUI
demo2.json - Headless mode: after demo1 replay, add a user message, and finish
demo3.json - Headless mode: a replay of demo2. Note: demo2.json and demo3.json only differ in step id, timestamp, hostname, and wait_for_response attribute.

Link of any specific issues this addresses

Part of #6049

To run this PR locally, use the following command:

docker run -it --rm   -p 3000:3000   -v /var/run/docker.sock:/var/run/docker.sock   --add-host host.docker.internal:host-gateway   -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:aff6136-nikolaik   --name openhands-app-aff6136   docker.all-hands.dev/all-hands-ai/openhands:aff6136

openhands/controller/replay.py

enyst · 2025-01-21T07:01:23Z

Looking at demo2, this seems strange:

{"id": 13, "timestamp": "2025-01-20T22:23:36.002374", "source": "agent", "action": "message", "args": {"content": null, "image_urls": null, "wait_for_response": true}, "timeout": 120}

A MessageAction with null content, null image, and wait_for_response = true ?

Aaahh I think I see how that happened, you literally said it, you added something. The previous is a MessageAction with content where the agent is asking the user a question, but its wait_for_response = false... because this PR is setting it false, right?

li-boxuan · 2025-01-21T08:14:30Z

Looking at demo2, this seems strange:

{"id": 13, "timestamp": "2025-01-20T22:23:36.002374", "source": "agent", "action": "message", "args": {"content": null, "image_urls": null, "wait_for_response": true}, "timeout": 120}

A MessageAction with null content, null image, and wait_for_response = true ?

Aaahh I think I see how that happened, you literally said it, you added something. The previous is a MessageAction with content where the agent is asking the user a question, but its wait_for_response = false... because this PR is setting it false, right?

That's correct!

li-boxuan · 2025-01-21T08:25:27Z

Aside, I do realize this would become a bug farm... and I'll make sure to add some E2E tests before checking in the user-facing replay functionality in #6348

…aj-replay

enyst · 2025-02-02T05:28:16Z

openhands/controller/replay.py

@@ -27,7 +51,6 @@ def _replayable(self) -> bool:
            self.replay_events is not None
            and self.replay_index < len(self.replay_events)
            and isinstance(self.replay_events[self.replay_index], Action)
-            and self.replay_events[self.replay_index].source != EventSource.USER


(not a 'review' comment, and this is cool anyway)
Just the other day, I was playing with Gemini-2.0-thinking, and it's been a lot of fun for coding-adjacent tasks! Among others, it explored a lot of openhands repo, tracked down every occurrence of oh_action and followed the execution flow up in frontend, downstream in backend, until it figured out everything about them. It makes itself mini-plans on the fly and does follow up, very cool!

Anyway, so in the server, all those are set with source USER, but they're quite different, e.g. agent change actions, prompt confirmations, CmdRunActions (ran by user in terminal), MessageActions. I think none should be a problem, and cmd run actions are good for replay! We do want to replay those, if we want to achieve a similar state (hopefully), and of course, they'd be in context.

Yeah I think I did this check as a hack at the beginning - probably just to work around the wait_for_confirmation thing. It's been more and more clear that source USER events should be replayed too.

enyst

Thank you, this feature is a thing of beauty!

Trajectory replay: Fix a few corner cases

7bbad2d

li-boxuan requested review from xingyaoww and enyst January 21, 2025 06:34

enyst reviewed Jan 21, 2025

View reviewed changes

openhands/controller/replay.py Show resolved Hide resolved

Fix a typo in comment

e1a7c46

li-boxuan mentioned this pull request Jan 29, 2025

Add tests for trajectory replay #6513

Merged

1 task

Merge remote-tracking branch 'upstream/main' into boxuanli/improve-tr…

b862c5a

…aj-replay

li-boxuan marked this pull request as draft January 31, 2025 07:41

li-boxuan added 3 commits January 31, 2025 23:44

Fix, and add basic GUI replay test

5868759

Add test to ensure user interactions are included in the history

553cecb

Fix

feccd5e

li-boxuan marked this pull request as ready for review February 2, 2025 05:03

li-boxuan marked this pull request as draft February 2, 2025 05:24

enyst reviewed Feb 2, 2025

View reviewed changes

enyst approved these changes Feb 2, 2025

View reviewed changes

li-boxuan added 2 commits February 1, 2025 21:34

Bug fix and stricter tests

9b29063

Clean up

aff6136

li-boxuan marked this pull request as ready for review February 2, 2025 05:39

li-boxuan merged commit e487008 into main Feb 2, 2025
17 checks passed

li-boxuan deleted the boxuanli/improve-traj-replay branch February 2, 2025 08:27

zchn pushed a commit to zchn/OpenHands that referenced this pull request Feb 4, 2025

Trajectory replay: Fix a few corner cases (All-Hands-AI#6380)

592cf4c

adityasoni9998 pushed a commit to adityasoni9998/OpenHands that referenced this pull request Feb 7, 2025

Trajectory replay: Fix a few corner cases (All-Hands-AI#6380)

d4721ff

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trajectory replay: Fix a few corner cases #6380

Trajectory replay: Fix a few corner cases #6380

li-boxuan commented Jan 21, 2025 •

edited by github-actions bot

Loading

enyst commented Jan 21, 2025

li-boxuan commented Jan 21, 2025

li-boxuan commented Jan 21, 2025

enyst Feb 2, 2025

li-boxuan Feb 2, 2025

enyst left a comment

Trajectory replay: Fix a few corner cases #6380

Trajectory replay: Fix a few corner cases #6380

Conversation

li-boxuan commented Jan 21, 2025 • edited by github-actions bot Loading

enyst commented Jan 21, 2025

li-boxuan commented Jan 21, 2025

li-boxuan commented Jan 21, 2025

enyst Feb 2, 2025

Choose a reason for hiding this comment

li-boxuan Feb 2, 2025

Choose a reason for hiding this comment

enyst left a comment

Choose a reason for hiding this comment

li-boxuan commented Jan 21, 2025 •

edited by github-actions bot

Loading