Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: incorrect soft-timeout implementation & fix hard-timeout follow-up command #6280

Merged
merged 18 commits into from
Jan 16, 2025

Conversation

xingyaoww
Copy link
Collaborator

@xingyaoww xingyaoww commented Jan 15, 2025

End-user friendly description of the problem this fixes or functionality that this introduces

  • Include this change in the Release Notes. If checked, you must provide an end-user friendly description for your change below

Give a summary of what the PR does, explaining any non-trivial design decisions

Previously, we set .blocking = True for all .timeout = assignment -- this basically turn EVERY command with hard timeout = 120 sec (default value) -- and the soft timeout were not correctly enabled.

In this PR, we:

  • add two methods add_default_timeout and add_hard_timeout to set timeout better.
  • replace existing implementation of .timeout accordingly

Another big issue before is that when agent:

  1. Runs a long command
  2. That long command somehow get stuck (and exceed 120 sec timeout)
  3. The agent tries to run the next (unrelated) command (e.g., ls) -- Because the previous command is NOT killed, the follow-up command will be stuck in the shell and not get executed anymore.

To fix this in the PR, we add an error message to remind the agent to kill the previous command properly before continuing.

metadata.suffix = (
    f'\n[Your command "{command}" is NOT executed. '
    f'The previous command was timed out but still running. Above is the output of the previous command. '
    "You may wait longer to see additional output of the previous command by sending empty command '', "
    'send other commands to interact with the current process, '
    'or send keys ("C-c", "C-z", "C-d") to interrupt/kill the previous command before sending your new command.]'
)

We also add a new test to stress test the bash terminal in loop for:

  1. Long command output
  2. Command that triggers soft timeout
  3. Command that triggers long timeout

Link of any specific issues this addresses

#6259

#6218


To run this PR locally, use the following command:

docker run -it --rm   -p 3000:3000   -v /var/run/docker.sock:/var/run/docker.sock   --add-host host.docker.internal:host-gateway   -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:8ca0021-nikolaik   --name openhands-app-8ca0021   docker.all-hands.dev/all-hands-ai/openhands:8ca0021

@xingyaoww xingyaoww changed the title add bash stress test to debug for #6259 [WIP] fix: bash performance issue Jan 15, 2025
@xingyaoww xingyaoww changed the title [WIP] fix: bash performance issue fix: incorrect soft-timeout implementation & fix hard-timeout follow-up command Jan 15, 2025
@xingyaoww xingyaoww marked this pull request as ready for review January 15, 2025 20:54
"You may wait longer to see additional output of the previous command by sending empty command '', "
'send other commands to interact with the current process, '
'or send keys ("C-c", "C-z", "C-d") to interrupt/kill the previous command before sending your new command.]'
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my understanding CmdOutputMetadata is a fairly complex BaseModel object that maps the output of ps1, but here we alter its structure and give it a different content, a rather large message for the LLM from us? (a prompt tweak)

Could we think about structuring this situation in some other way? Like, maybe don't save it in the action, and add an attribute to the CmdOutputObservation... 🤔 "instruction", or "error_detail" or "timeout_detail". Idk, but this is an Obs to the new action, and yet it contains deep buried info about the old action? If so, maybe we can surface it, make it super-clear in the obs

Copy link
Collaborator Author

@xingyaoww xingyaoww Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i think these are really the info that we should show the user. @rbren had concerns early about directly displaying these in terminal so they should not go into .content, but maybe it make sense to move these suffix/and prefix to the CmdOutputObservation level of info

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! I think maybe a slightly different perspective is from a client developer / agent developer point of view. How do we define metadata and how easy is it for people to work with it for their purposes?
(I'm not sure why we call it metadata, if it's terminal output, maybe it would be easier to understand if it was, dunno, terminal_output. 😅)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm inclined to agree with enyst that adding prompt-y instructions here is a bit out of place. Ideally everything that's speaking to the agent would go inside prompt_manager or CodeAct itself, rather than being hard-coded into the CommandOutputObservation

I'll reluctantly approve for now though since I can't really think of a better way to fix off the top of my head, and this is currently very broken

Copy link
Collaborator Author

@xingyaoww xingyaoww Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prompt-y instructions

This sounds to me is actually "a type of error message the agent receives when using the tool" - this is no different than the agent runs npm install and then runs into any error message from npm (which is a type of "prompt-y instruction" written for human). For this type of usecase, I think it will over complicate things a lot if we try to move it out of bash implementation.

But I also agree with engel that we should at least bring it out of .metadata and keep it as additional attributes under CmdOutputObservation

@xingyaoww xingyaoww merged commit 0bed177 into main Jan 16, 2025
16 checks passed
@xingyaoww xingyaoww deleted the xw/stress-bash branch January 16, 2025 17:27
csmith49 pushed a commit to csmith49/OpenHands that referenced this pull request Jan 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants