-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Agent] Support browser control via screenshots #4570
Comments
here is what you need #4581 |
@ryanhoangt Can you also self-assign this one? |
Why not just use computer use directly? |
I think we may want to first try to use |
Hey @ryx2 - I think that's a good idea and i've been discussing with @ryanhoangt to make computer-control the next low hanging fruit we could pursue to improve browsing experience (at least for using claude) |
Computer use can become extremely expensive if screenshots (images) are being used, compared to text-only approach. |
I looked into OpenRouter pricing and seems like it's $4.8 / 1k images for Sonnet-3.5 🤔 |
Hmm you have a link to where it says per "1K"? |
Ohh, you're right, I missed that notation. |
Qwen2.5-VL is good, if you guys are concern about the price. It beats the old version 4o and sonnet3.5. see https://qwenlm.github.io/blog/qwen2-vl/ it can be self-hosted |
This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days. |
Reference implementation: https://github.com/invariantlabs-ai/playwright-computer-use |
What problem or use case are you trying to solve?
Implement a tool similar to the computer tool & allow it to control the browser directly.
Describe the UX of the solution you'd like
Do you have thoughts on the technical implementation?
Describe alternatives you've considered
Additional context
The text was updated successfully, but these errors were encountered: