fix: incorrect soft-timeout implementation & fix hard-timeout follow-up command #6280

xingyaoww · 2025-01-15T00:18:20Z

End-user friendly description of the problem this fixes or functionality that this introduces

Include this change in the Release Notes. If checked, you must provide an end-user friendly description for your change below

Give a summary of what the PR does, explaining any non-trivial design decisions

Previously, we set .blocking = True for all .timeout = assignment -- this basically turn EVERY command with hard timeout = 120 sec (default value) -- and the soft timeout were not correctly enabled.

In this PR, we:

add two methods add_default_timeout and add_hard_timeout to set timeout better.
replace existing implementation of .timeout accordingly

Another big issue before is that when agent:

Runs a long command
That long command somehow get stuck (and exceed 120 sec timeout)
The agent tries to run the next (unrelated) command (e.g., ls) -- Because the previous command is NOT killed, the follow-up command will be stuck in the shell and not get executed anymore.

To fix this in the PR, we add an error message to remind the agent to kill the previous command properly before continuing.

metadata.suffix = (
    f'\n[Your command "{command}" is NOT executed. '
    f'The previous command was timed out but still running. Above is the output of the previous command. '
    "You may wait longer to see additional output of the previous command by sending empty command '', "
    'send other commands to interact with the current process, '
    'or send keys ("C-c", "C-z", "C-d") to interrupt/kill the previous command before sending your new command.]'
)

We also add a new test to stress test the bash terminal in loop for:

Long command output
Command that triggers soft timeout
Command that triggers long timeout

Link of any specific issues this addresses

#6259

#6218

To run this PR locally, use the following command:

docker run -it --rm   -p 3000:3000   -v /var/run/docker.sock:/var/run/docker.sock   --add-host host.docker.internal:host-gateway   -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:8ca0021-nikolaik   --name openhands-app-8ca0021   docker.all-hands.dev/all-hands-ai/openhands:8ca0021

openhands/runtime/impl/action_execution/action_execution_client.py

openhands/runtime/utils/bash.py

enyst · 2025-01-15T23:10:35Z

openhands/runtime/utils/bash.py

+                    "You may wait longer to see additional output of the previous command by sending empty command '', "
+                    'send other commands to interact with the current process, '
+                    'or send keys ("C-c", "C-z", "C-d") to interrupt/kill the previous command before sending your new command.]'
+                )


In my understanding CmdOutputMetadata is a fairly complex BaseModel object that maps the output of ps1, but here we alter its structure and give it a different content, a rather large message for the LLM from us? (a prompt tweak)

Could we think about structuring this situation in some other way? Like, maybe don't save it in the action, and add an attribute to the CmdOutputObservation... 🤔 "instruction", or "error_detail" or "timeout_detail". Idk, but this is an Obs to the new action, and yet it contains deep buried info about the old action? If so, maybe we can surface it, make it super-clear in the obs

yeah i think these are really the info that we should show the user. @rbren had concerns early about directly displaying these in terminal so they should not go into .content, but maybe it make sense to move these suffix/and prefix to the CmdOutputObservation level of info

Good point! I think maybe a slightly different perspective is from a client developer / agent developer point of view. How do we define metadata and how easy is it for people to work with it for their purposes?
(I'm not sure why we call it metadata, if it's terminal output, maybe it would be easier to understand if it was, dunno, terminal_output. 😅)

I'm inclined to agree with enyst that adding prompt-y instructions here is a bit out of place. Ideally everything that's speaking to the agent would go inside prompt_manager or CodeAct itself, rather than being hard-coded into the CommandOutputObservation

I'll reluctantly approve for now though since I can't really think of a better way to fix off the top of my head, and this is currently very broken

prompt-y instructions

This sounds to me is actually "a type of error message the agent receives when using the tool" - this is no different than the agent runs npm install and then runs into any error message from npm (which is a type of "prompt-y instruction" written for human). For this type of usecase, I think it will over complicate things a lot if we try to move it out of bash implementation.

But I also agree with engel that we should at least bring it out of .metadata and keep it as additional attributes under CmdOutputObservation

This reverts commit 8795ee6.

frontend/src/state/chat-slice.ts

This reverts commit 76ec28d.

This reverts commit 5f30388.

…up command (All-Hands-AI#6280)

add bash stress test to debug for #6259

2dd420e

xingyaoww changed the title ~~add bash stress test to debug for #6259~~ [WIP] fix: bash performance issue Jan 15, 2025

xingyaoww added 6 commits January 14, 2025 19:21

fix test

7681a53

add timer for iteration

e63d68f

update

7be2991

increase char per line

56770be

fix soft timeout and cleanup all the timeout set method in the repo

df5cad3

handle case for hard-timeout + unfinished process

81930f0

xingyaoww changed the title ~~[WIP] fix: bash performance issue~~ fix: incorrect soft-timeout implementation & fix hard-timeout follow-up command Jan 15, 2025

xingyaoww marked this pull request as ready for review January 15, 2025 20:54

Merge branch 'main' into xw/stress-bash

e4e992c

rbren reviewed Jan 15, 2025

View reviewed changes

openhands/runtime/impl/action_execution/action_execution_client.py Outdated Show resolved Hide resolved

rbren reviewed Jan 15, 2025

View reviewed changes

openhands/runtime/utils/bash.py Show resolved Hide resolved

xingyaoww added 2 commits January 15, 2025 16:10

replace set_default_timeout with set_hard_timeout blocking=false

26b7ff1

show metadata table

5f30388

enyst reviewed Jan 15, 2025

View reviewed changes

All-Hands-AI deleted a comment Jan 16, 2025

xingyaoww force-pushed the xw/stress-bash branch from 818bfde to 5f30388 Compare January 16, 2025 05:03

xingyaoww added 2 commits January 16, 2025 00:06

show prefix and suffix in frontend

76ec28d

Revert "Fix closing sessions (#6114)"

43ddad4

This reverts commit 8795ee6.

rbren reviewed Jan 16, 2025

View reviewed changes

frontend/src/state/chat-slice.ts Outdated Show resolved Hide resolved

xingyaoww and others added 6 commits January 16, 2025 11:27

Merge branch 'main' into xw/stress-bash

390d007

Revert "show prefix and suffix in frontend"

7ad3ff6

This reverts commit 76ec28d.

Revert "show metadata table"

0979e8b

This reverts commit 5f30388.

update prefix

af1934f

fix continue on hard-timeout bug

e02802c

fix interactive test

8ca0021

rbren approved these changes Jan 16, 2025

View reviewed changes

xingyaoww merged commit 0bed177 into main Jan 16, 2025
16 checks passed

xingyaoww deleted the xw/stress-bash branch January 16, 2025 17:27

csmith49 pushed a commit to csmith49/OpenHands that referenced this pull request Jan 19, 2025

fix: incorrect soft-timeout implementation & fix hard-timeout follow-…

308b360

…up command (All-Hands-AI#6280)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: incorrect soft-timeout implementation & fix hard-timeout follow-up command #6280

fix: incorrect soft-timeout implementation & fix hard-timeout follow-up command #6280

xingyaoww commented Jan 15, 2025 •

edited by github-actions bot

Loading

enyst Jan 15, 2025

xingyaoww Jan 15, 2025 •

edited

Loading

enyst Jan 15, 2025

rbren Jan 16, 2025

xingyaoww Jan 16, 2025 •

edited

Loading

fix: incorrect soft-timeout implementation & fix hard-timeout follow-up command #6280

fix: incorrect soft-timeout implementation & fix hard-timeout follow-up command #6280

Conversation

xingyaoww commented Jan 15, 2025 • edited by github-actions bot Loading

enyst Jan 15, 2025

Choose a reason for hiding this comment

xingyaoww Jan 15, 2025 • edited Loading

Choose a reason for hiding this comment

enyst Jan 15, 2025

Choose a reason for hiding this comment

rbren Jan 16, 2025

Choose a reason for hiding this comment

xingyaoww Jan 16, 2025 • edited Loading

Choose a reason for hiding this comment

xingyaoww commented Jan 15, 2025 •

edited by github-actions bot

Loading

xingyaoww Jan 15, 2025 •

edited

Loading

xingyaoww Jan 16, 2025 •

edited

Loading