Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trajectory replay: Fix a few corner cases #6380

Merged
merged 8 commits into from
Feb 2, 2025

Conversation

li-boxuan
Copy link
Collaborator

@li-boxuan li-boxuan commented Jan 21, 2025

End-user friendly description of the problem this fixes or functionality that this introduces

  • Include this change in the Release Notes. If checked, you must provide an end-user friendly description for your change below

Fix two corner cases handling in trajectory replay feature.


Give a summary of what the PR does, explaining any non-trivial design decisions

Two corner cases were missing in the previous PR #6215:

  1. When there's a wait_for_response message, replay gets stuck, waiting for user's response, which doesn't make sense when in the middle of a replay. This is demonstrated in demo2.json and demo3.json.
  2. The trajectory dumped from the GUI would contain environmental actions, which shall be skipped during replay. This is demonstrated in demo1.json (Note: trajectory export from GUI is not available yet; demo1.json is downloaded using the PR (feat) Add button to export trajectory on chat panel #6378).

demo1.json - GUI mode: downloaded from web GUI
demo2.json - Headless mode: after demo1 replay, add a user message, and finish
demo3.json - Headless mode: a replay of demo2. Note: demo2.json and demo3.json only differ in step id, timestamp, hostname, and wait_for_response attribute.


Link of any specific issues this addresses

Part of #6049


To run this PR locally, use the following command:

docker run -it --rm   -p 3000:3000   -v /var/run/docker.sock:/var/run/docker.sock   --add-host host.docker.internal:host-gateway   -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:aff6136-nikolaik   --name openhands-app-aff6136   docker.all-hands.dev/all-hands-ai/openhands:aff6136

@li-boxuan li-boxuan requested review from xingyaoww and enyst January 21, 2025 06:34
@enyst
Copy link
Collaborator

enyst commented Jan 21, 2025

Looking at demo2, this seems strange:

{"id": 13, "timestamp": "2025-01-20T22:23:36.002374", "source": "agent", "action": "message", "args": {"content": null, "image_urls": null, "wait_for_response": true}, "timeout": 120}

A MessageAction with null content, null image, and wait_for_response = true ?

Aaahh I think I see how that happened, you literally said it, you added something. The previous is a MessageAction with content where the agent is asking the user a question, but its wait_for_response = false... because this PR is setting it false, right?

@li-boxuan
Copy link
Collaborator Author

Looking at demo2, this seems strange:

{"id": 13, "timestamp": "2025-01-20T22:23:36.002374", "source": "agent", "action": "message", "args": {"content": null, "image_urls": null, "wait_for_response": true}, "timeout": 120}

A MessageAction with null content, null image, and wait_for_response = true ?

Aaahh I think I see how that happened, you literally said it, you added something. The previous is a MessageAction with content where the agent is asking the user a question, but its wait_for_response = false... because this PR is setting it false, right?

That's correct!

@li-boxuan
Copy link
Collaborator Author

Aside, I do realize this would become a bug farm... and I'll make sure to add some E2E tests before checking in the user-facing replay functionality in #6348

@li-boxuan li-boxuan mentioned this pull request Jan 29, 2025
1 task
@li-boxuan li-boxuan marked this pull request as draft January 31, 2025 07:41
@li-boxuan li-boxuan marked this pull request as ready for review February 2, 2025 05:03
@li-boxuan li-boxuan marked this pull request as draft February 2, 2025 05:24
@@ -27,7 +51,6 @@ def _replayable(self) -> bool:
self.replay_events is not None
and self.replay_index < len(self.replay_events)
and isinstance(self.replay_events[self.replay_index], Action)
and self.replay_events[self.replay_index].source != EventSource.USER
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(not a 'review' comment, and this is cool anyway)
Just the other day, I was playing with Gemini-2.0-thinking, and it's been a lot of fun for coding-adjacent tasks! Among others, it explored a lot of openhands repo, tracked down every occurrence of oh_action and followed the execution flow up in frontend, downstream in backend, until it figured out everything about them. It makes itself mini-plans on the fly and does follow up, very cool!

Anyway, so in the server, all those are set with source USER, but they're quite different, e.g. agent change actions, prompt confirmations, CmdRunActions (ran by user in terminal), MessageActions. I think none should be a problem, and cmd run actions are good for replay! We do want to replay those, if we want to achieve a similar state (hopefully), and of course, they'd be in context.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think I did this check as a hack at the beginning - probably just to work around the wait_for_confirmation thing. It's been more and more clear that source USER events should be replayed too.

Copy link
Collaborator

@enyst enyst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, this feature is a thing of beauty!

@li-boxuan li-boxuan marked this pull request as ready for review February 2, 2025 05:39
@li-boxuan li-boxuan merged commit e487008 into main Feb 2, 2025
17 checks passed
@li-boxuan li-boxuan deleted the boxuanli/improve-traj-replay branch February 2, 2025 08:27
zchn pushed a commit to zchn/OpenHands that referenced this pull request Feb 4, 2025
adityasoni9998 pushed a commit to adityasoni9998/OpenHands that referenced this pull request Feb 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants