Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Experimental] Screenshot-based browsing #5324

Closed

Conversation

ryanhoangt
Copy link
Contributor

End-user friendly description of the problem this fixes or functionality that this introduces

  • Include this change in the Release Notes. If checked, you must provide an end-user friendly description for your change below

Give a summary of what the PR does, explaining any non-trivial design decisions

This PR is to:

  • Add a screen-shot based browsing mechanism based on Anthropic's computer use feature.

Link of any specific issues this addresses

Fix #4570

@ryanhoangt
Copy link
Contributor Author

Asking the agent to draw a square, and it currently looks pretty weird...

Screenshot 2024-12-09 at 21 17 34

@mamoodi
Copy link
Collaborator

mamoodi commented Dec 23, 2024

@ryanhoangt just a gentle ping to check if this is still a work in progress

@ryanhoangt
Copy link
Contributor Author

Yes, I'll try to get back to this after finishing the other PR.

Copy link
Contributor

github-actions bot commented Feb 3, 2025

This PR is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the Stale Inactive for 30 days label Feb 3, 2025
@mamoodi
Copy link
Collaborator

mamoodi commented Feb 3, 2025

@ryanhoangt this was just tagged as stale. The related PR seems merged. Will you be continuing this when time permits now?

@ryanhoangt
Copy link
Contributor Author

Yeah I discussed with the team and we should close this PR for now. Given some limitations of Browsergym to implement computer use and Aditya has added visual browsing capability to CodeAct in #6464, and we can probably try that first before moving on to computer use, which will require some changes to the runtime to use Playwright directly instead of via Browsergym.

@ryanhoangt ryanhoangt closed this Feb 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Stale Inactive for 30 days
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Agent] Support browser control via screenshots
3 participants