-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Condenser for Browser Output Observations #6578
base: main
Are you sure you want to change the base?
Condenser for Browser Output Observations #6578
Conversation
… use when providing context to the LLM.
…dates (All-Hands-AI#6617) Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: amanape <[email protected]>
Co-authored-by: sp.wack <[email protected]>
Co-authored-by: openhands <[email protected]> Co-authored-by: Ray Myers <[email protected]>
…sary (All-Hands-AI#6618) Co-authored-by: openhands <[email protected]>
Co-authored-by: openhands <[email protected]> Co-authored-by: Xingyao Wang <[email protected]>
…dates (All-Hands-AI#6617) Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: amanape <[email protected]>
Co-authored-by: sp.wack <[email protected]>
Co-authored-by: openhands <[email protected]> Co-authored-by: Ray Myers <[email protected]>
Modulo the discussion on combining observation condensers (which we can take care of in a future PR when another bespoke observation masking strategy is needed), this looks good. Can you extend the unit tests in |
Done! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting to see condensers extended for particular use cases. @adityasoni9998 could you perhaps share a bit about how this PR was tested?
I'd love to know if @csmith49 is also okay with it.
I'm good to approve this pretty quick, but it would be good to hear any data (anecdotal or otherwise) that this condenser is helpful in the stated context. @adityasoni9998 Have you managed to run the agent with this condenser and see "better" browsing behavior? |
Co-authored-by: Engel Nyst <[email protected]>
Hi @csmith49 and @enyst. As of now, I am yet to evaluate this condenser on downstream benchmarks and I do not have any quantitative metrics comparing the two situations where we use and don't use a condenser. However, we have prior results of browsing agents working reasonably well without having older accessibility trees and screenshots from previous steps (for eg. consider the performance of VisualBrowsingAgent on VisualWebArena). browser-use also has a similar approach in their browsing agent wherein they only provide observations from the most recent action. Also while evaluating the default CodeAct agent with full history on The Agent Company and GAIA, the agent struggles with large context sizes due to longer trajectories involving interactions with the browser which results in hallucinations/forgetting. Anyways, using this condenser is an optional choice being made by the user, and the default behaviour of CodeAct's browsing still remains unchanged. Alternatively, once I am done evaluating this condenser on more recent benchmarks, I can comment on this PR if that helps. |
End-user friendly description of the problem this fixes or functionality that this introduces
Developed a condenser that allows the user to only keep the most recent
attention_window
number of browser outputs in the LLM's context.Give a summary of what the PR does, explaining any non-trivial design decisions
Designed the
BrowserOutputCondenser
class for this functionality. This is helpful for long trajectories involving (possibly screenshot-based) web navigation to avoid context window exceeded errors and control inference cost. Previously implemented condensers do not allow masking a specific type of observation. Since, browser observations are generally very large, this might be helpful.Link of any specific issues this addresses