Skip to content

Latest commit

 

History

History
64 lines (46 loc) · 2.32 KB

README.md

File metadata and controls

64 lines (46 loc) · 2.32 KB

Browserbase Haystack Fetcher

Browserbase is a developer platform to reliably run, manage, and monitor headless browsers.

Power your AI data retrievals with:

Installation and setup

  • Get an API key and Project ID from browserbase.com and set it in environment variables (BROWSERBASE_API_KEY, BROWSERBASE_PROJECT_ID).
  • Install the required dependencies:
pip install browserbase-haystack

Usage

You can load webpages into Haystack using BrowserbaseFetcher. Optionally, you can set text_content parameter to convert the pages to text-only representation.

Standalone

from browserbase_haystack import BrowserbaseFetcher

browserbase_fetcher = BrowserbaseFetcher()
browserbase_fetcher.run(urls=["https://example.com"], text_content=False)

In a pipeline

from haystack import Pipeline
from haystack.components.generators import OpenAIGenerator
from haystack.components.builders import PromptBuilder
from browserbase_haystack import BrowserbaseFetcher

prompt_template = (
    "Tell me the titles of the given pages. Pages: {{ documents }}"
)
prompt_builder = PromptBuilder(template=prompt_template)
llm = OpenAIGenerator()

browserbase_fetcher = BrowserbaseFetcher()

pipe = Pipeline()
pipe.add_component("fetcher", browserbase_fetcher)
pipe.add_component("prompt_builder", prompt_builder)
pipe.add_component("llm", llm)

pipe.connect("fetcher.documents", "prompt_builder.documents")
pipe.connect("prompt_builder.prompt", "llm.prompt")
result = pipe.run(data={"fetcher": {"urls": ["https://example.com"]}})

Parameters

  • urls Required. A list of URLs to fetch.
  • text_content Retrieve only text content. Default is False.
  • session_id Optional. Provide an existing Session ID.
  • proxy Optional. Enable/Disable Proxies.