A Model Context Protocol (MCP) server that provides AI-powered web automation capabilities using Stagehand. This server enables LLMs to interact with web pages, perform actions, extract data, and observe possible actions in a real browser environment.
-
Run
npm install
to install the necessary dependencies, then runnpm run build
to getdist/index.js
. -
Set up your Claude Desktop configuration to use the server.
{
"mcpServers": {
"stagehand": {
"command": "node",
"args": ["path/to/mcp-server-browserbase/stagehand/dist/index.js"],
"env": {
"BROWSERBASE_API_KEY": "<YOUR_BROWSERBASE_API_KEY>",
"BROWSERBASE_PROJECT_ID": "<YOUR_BROWSERBASE_PROJECT_ID>",
"OPENAI_API_KEY": "<YOUR_OPENAI_API_KEY>",
}
}
}
}
-
Restart your Claude Desktop app and you should see the tools available clicking the 🔨 icon.
-
Start using the tools! Below is a demo video of Claude doing a Google search for OpenAI using stagehand MCP server and Browserbase for a remote headless browser.
-
stagehand_navigate
- Navigate to any URL in the browser
- Input:
url
(string): The URL to navigate to
-
stagehand_act
- Perform an action on the web page
- Inputs:
action
(string): The action to perform (e.g., "click the login button")variables
(object, optional): Variables used in the action template
-
stagehand_extract
- Extract data from the web page based on an instruction and schema
- Inputs:
instruction
(string): Instruction for extraction (e.g., "extract the price of the item")schema
(object): JSON schema for the extracted data
-
stagehand_observe
- Observe actions that can be performed on the web page
- Input:
instruction
(string, optional): Instruction for observation
The server provides access to two types of resources:
-
Console Logs (
console://logs
)- Browser console output in text format
- Includes all console messages from the browser
-
Screenshots (
screenshot://<name>
)- PNG images of captured screenshots
- Accessible via the screenshot name specified during capture
- AI-powered web automation
- Perform actions on web pages
- Extract structured data from web pages
- Observe possible actions on web pages
- Simple and extensible API
- Model-agnostic support for various LLM providers
Licensed under the MIT License.
Copyright 2024 Browserbase, Inc.