Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Condenser for Browser Output Observations #6578

Open
wants to merge 75 commits into
base: main
Choose a base branch
from

Conversation

adityasoni9998
Copy link
Contributor

End-user friendly description of the problem this fixes or functionality that this introduces

  • Include this change in the Release Notes. If checked, you must provide an end-user friendly description for your change below
    Developed a condenser that allows the user to only keep the most recent attention_window number of browser outputs in the LLM's context.

Give a summary of what the PR does, explaining any non-trivial design decisions
Designed the BrowserOutputCondenser class for this functionality. This is helpful for long trajectories involving (possibly screenshot-based) web navigation to avoid context window exceeded errors and control inference cost. Previously implemented condensers do not allow masking a specific type of observation. Since, browser observations are generally very large, this might be helpful.


Link of any specific issues this addresses

@adityasoni9998 adityasoni9998 marked this pull request as ready for review February 2, 2025 02:51
@xingyaoww xingyaoww requested a review from csmith49 February 2, 2025 05:24
PeterDaveHello and others added 18 commits February 6, 2025 20:43
…dates (All-Hands-AI#6617)

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: amanape <[email protected]>
…dates (All-Hands-AI#6617)

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: amanape <[email protected]>
@csmith49
Copy link
Collaborator

Modulo the discussion on combining observation condensers (which we can take care of in a future PR when another bespoke observation masking strategy is needed), this looks good.

Can you extend the unit tests in tests/unit/test_condenser.py to handle this class as well?

@adityasoni9998
Copy link
Contributor Author

Modulo the discussion on combining observation condensers (which we can take care of in a future PR when another bespoke observation masking strategy is needed), this looks good.

Can you extend the unit tests in tests/unit/test_condenser.py to handle this class as well?

Done!

Copy link
Collaborator

@enyst enyst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting to see condensers extended for particular use cases. @adityasoni9998 could you perhaps share a bit about how this PR was tested?

I'd love to know if @csmith49 is also okay with it.

@csmith49
Copy link
Collaborator

I'm good to approve this pretty quick, but it would be good to hear any data (anecdotal or otherwise) that this condenser is helpful in the stated context. @adityasoni9998 Have you managed to run the agent with this condenser and see "better" browsing behavior?

@adityasoni9998
Copy link
Contributor Author

Hi @csmith49 and @enyst. As of now, I am yet to evaluate this condenser on downstream benchmarks and I do not have any quantitative metrics comparing the two situations where we use and don't use a condenser. However, we have prior results of browsing agents working reasonably well without having older accessibility trees and screenshots from previous steps (for eg. consider the performance of VisualBrowsingAgent on VisualWebArena). browser-use also has a similar approach in their browsing agent wherein they only provide observations from the most recent action.

Also while evaluating the default CodeAct agent with full history on The Agent Company and GAIA, the agent struggles with large context sizes due to longer trajectories involving interactions with the browser which results in hallucinations/forgetting. Anyways, using this condenser is an optional choice being made by the user, and the default behaviour of CodeAct's browsing still remains unchanged. Alternatively, once I am done evaluating this condenser on more recent benchmarks, I can comment on this PR if that helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.