Skip to content

CLI support for AI agents in browser-use/web-ui so Cursor Agent can use it as a tool

License

Notifications You must be signed in to change notification settings

drumnation/browser-use-cli

 
 

Repository files navigation

Fork Purpose

This fork of browser-use/web-ui adds CLI support specifically designed for AI agents like Cursor Agent. It enables direct command-line interaction with browser automation tasks, making it ideal for integration with AI development environments and automated workflows.

CLI Documentation

See CLI Guide for comprehensive documentation on:

  • Available LLM providers and models
  • Detailed command reference
  • Environment configuration
  • Example usage patterns

Quick Start

# Run a task (browser will auto-start if needed)
browser-use run "go to example.com and create a report about the page structure"

# Run with specific provider and vision capabilities
browser-use run "analyze the layout and visual elements" --provider Google --vision

# Run with specific model selection
browser-use run "analyze the page" --provider Anthropic --model-index 1

# Explicitly start browser with custom options (optional)
browser-use start --headless --window-size 1920x1080

# Close browser when done
browser-use close

Supported LLM Providers

  • OpenAI (gpt-4o) - Vision-capable model for advanced analysis
  • Anthropic (claude-3-5-sonnet-latest, claude-3-5-sonnet-20241022) - Advanced language understanding
  • Google (gemini-1.5-pro, gemini-2.0-flash) - Fast and efficient processing
  • DeepSeek (deepseek-chat) - Cost-effective default option

See the CLI Guide for detailed provider configuration and usage examples.

CLI Commands

  • start - (Optional) Initialize browser session with custom options:

    • --headless - Run in headless mode
    • --window-size - Set window dimensions (e.g., "1920x1080")
    • --disable-security - Disable security features
    • --user-data-dir - Use custom Chrome profile
    • --proxy - Set proxy server
  • run - Execute tasks (auto-starts browser if needed):

    • --model - Choose LLM (deepseek-chat, gemini, gpt-4, claude-3)
    • --vision - Enable visual analysis
    • --record - Record browser session
    • --trace-path - Save debugging traces
    • --max-steps - Limit task steps
    • --add-info - Provide additional context
  • close - Clean up browser session

Example Tasks

The browser-tasks-example.ts provides ready-to-use task sequences for:

  • Product research automation
  • Documentation analysis
  • Page structure analysis
  • Debug sessions with tracing

Configuration

See .env.example for all available configuration options, including:

  • API keys for different LLM providers
  • Browser settings
  • Session persistence options

Browser Use Web UI


GitHub stars Discord Documentation WarmShao

This project builds upon the foundation of the browser-use, which is designed to make websites accessible for AI agents.

We would like to officially thank WarmShao for his contribution to this project.

WebUI: is built on Gradio and supports a most of browser-use functionalities. This UI is designed to be user-friendly and enables easy interaction with the browser agent.

Expanded LLM Support: We've integrated support for various Large Language Models (LLMs), including: Gemini, OpenAI, Azure OpenAI, Anthropic, DeepSeek, Ollama etc. And we plan to add support for even more models in the future.

Custom Browser Support: You can use your own browser with our tool, eliminating the need to re-login to sites or deal with other authentication challenges. This feature also supports high-definition screen recording.

Persistent Browser Sessions: You can choose to keep the browser window open between AI tasks, allowing you to see the complete history and state of AI interactions.

bu-webui-demo.mp4

Installation Options

Option 1: Local Installation

Read the quickstart guide or follow the steps below to get started.

Python 3.11 or higher is required.

First, we recommend using uv to setup the Python environment.

uv venv --python 3.11

and activate it with:

source .venv/bin/activate

Install the dependencies:

uv pip install -r requirements.txt

Then install playwright:

playwright install

About

CLI support for AI agents in browser-use/web-ui so Cursor Agent can use it as a tool

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 95.6%
  • TypeScript 3.1%
  • Other 1.3%