diff --git a/.DS_Store b/.DS_Store
index 2a0f023..15ae7a9 100644
Binary files a/.DS_Store and b/.DS_Store differ
diff --git a/.gitignore b/.gitignore
new file mode 100644
index 0000000..2348049
--- /dev/null
+++ b/.gitignore
@@ -0,0 +1,5 @@
+.env
+# Ignore .DS_Store files anywhere in the repository
+.DS_Store
+**/.DS_Store
+*.csv
diff --git a/cookbook/company-info/scrapegraph_llama_index.ipynb b/cookbook/company-info/scrapegraph_llama_index.ipynb
new file mode 100644
index 0000000..f198c71
--- /dev/null
+++ b/cookbook/company-info/scrapegraph_llama_index.ipynb
@@ -0,0 +1,1807 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "ReBHQ5_834pZ"
+ },
+ "source": [
+ "\n",
+ "
\n",
+ ""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "jEkuKbcRrPcK"
+ },
+ "source": [
+ "## 🕷️ Extract Company Info with `llama-index-tools-scrapegraphai`"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ ""
+ ],
+ "metadata": {
+ "id": "RaHnNF9bYX8N"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "IzsyDXEWwPVt"
+ },
+ "source": [
+ "### 🔧 Install `dependencies`"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "os_vm0MkIxr9"
+ },
+ "outputs": [],
+ "source": [
+ "%%capture\n",
+ "!pip install llama-index-tools-scrapegraphai"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "apBsL-L2KzM7"
+ },
+ "source": [
+ "### 🔑 Import `ScrapeGraph` API key"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "ol9gQbAFkh9b"
+ },
+ "source": [
+ "You can find the Scrapegraph API key [here](https://dashboard.scrapegraphai.com/)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "sffqFG2EJ8bI"
+ },
+ "outputs": [],
+ "source": [
+ "import getpass\n",
+ "import os\n",
+ "\n",
+ "if not os.environ.get(\"SGAI_API_KEY\"):\n",
+ " os.environ[\"SGAI_API_KEY\"] = getpass.getpass(\"Scrapegraph API key:\\n\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "jnqMB2-xVYQ7"
+ },
+ "source": [
+ "### 📝 Defining an `Output Schema` for Webpage Content Extraction\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "VZvxbjfXvbgd"
+ },
+ "source": [
+ "If you already know what you want to extract from a webpage, you can **define an output schema** using **Pydantic**. This schema acts as a \"blueprint\" that tells the AI how to structure the response.\n",
+ "\n",
+ "\n",
+ " Pydantic Schema Quick Guide
\n",
+ "\n",
+ "Types of Schemas \n",
+ "\n",
+ "1. Simple Schema \n",
+ "Use this when you want to extract straightforward information, such as a single piece of content. \n",
+ "\n",
+ "```python\n",
+ "from pydantic import BaseModel, Field\n",
+ "\n",
+ "# Simple schema for a single webpage\n",
+ "class PageInfoSchema(BaseModel):\n",
+ " title: str = Field(description=\"The title of the webpage\")\n",
+ " description: str = Field(description=\"The description of the webpage\")\n",
+ "\n",
+ "# Example Output JSON after AI extraction\n",
+ "{\n",
+ " \"title\": \"ScrapeGraphAI: The Best Content Extraction Tool\",\n",
+ " \"description\": \"ScrapeGraphAI provides powerful tools for structured content extraction from websites.\"\n",
+ "}\n",
+ "```\n",
+ "\n",
+ "2. Complex Schema (Nested) \n",
+ "If you need to extract structured information with multiple related items (like a list of repositories), you can **nest schemas**.\n",
+ "\n",
+ "```python\n",
+ "from pydantic import BaseModel, Field\n",
+ "from typing import List\n",
+ "\n",
+ "# Define a schema for a single repository\n",
+ "class RepositorySchema(BaseModel):\n",
+ " name: str = Field(description=\"Name of the repository (e.g., 'owner/repo')\")\n",
+ " description: str = Field(description=\"Description of the repository\")\n",
+ " stars: int = Field(description=\"Star count of the repository\")\n",
+ " forks: int = Field(description=\"Fork count of the repository\")\n",
+ " today_stars: int = Field(description=\"Stars gained today\")\n",
+ " language: str = Field(description=\"Programming language used\")\n",
+ "\n",
+ "# Define a schema for a list of repositories\n",
+ "class ListRepositoriesSchema(BaseModel):\n",
+ " repositories: List[RepositorySchema] = Field(description=\"List of GitHub trending repositories\")\n",
+ "\n",
+ "# Example Output JSON after AI extraction\n",
+ "{\n",
+ " \"repositories\": [\n",
+ " {\n",
+ " \"name\": \"google-gemini/cookbook\",\n",
+ " \"description\": \"Examples and guides for using the Gemini API\",\n",
+ " \"stars\": 8036,\n",
+ " \"forks\": 1001,\n",
+ " \"today_stars\": 649,\n",
+ " \"language\": \"Jupyter Notebook\"\n",
+ " },\n",
+ " {\n",
+ " \"name\": \"TEN-framework/TEN-Agent\",\n",
+ " \"description\": \"TEN Agent is a conversational AI powered by TEN, integrating Gemini 2.0 Multimodal Live API, OpenAI Realtime API, RTC, and more.\",\n",
+ " \"stars\": 3224,\n",
+ " \"forks\": 311,\n",
+ " \"today_stars\": 361,\n",
+ " \"language\": \"Python\"\n",
+ " }\n",
+ " ]\n",
+ "}\n",
+ "```\n",
+ "\n",
+ "Key Takeaways \n",
+ "- **Simple Schema**: Perfect for small, straightforward extractions. \n",
+ "- **Complex Schema**: Use nesting to extract lists or structured data, like \"a list of repositories.\" \n",
+ "\n",
+ "Both approaches give the AI a clear structure to follow, ensuring that the extracted content matches exactly what you need.\n",
+ " \n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "dlrOEgZk_8V4"
+ },
+ "outputs": [],
+ "source": [
+ "from pydantic import BaseModel, Field\n",
+ "from typing import List, Dict, Optional\n",
+ "\n",
+ "# Schema for founder information\n",
+ "class FounderSchema(BaseModel):\n",
+ " name: str = Field(description=\"Name of the founder\")\n",
+ " role: str = Field(description=\"Role of the founder in the company\")\n",
+ " linkedin: str = Field(description=\"LinkedIn profile of the founder\")\n",
+ "\n",
+ "# Schema for pricing plans\n",
+ "class PricingPlanSchema(BaseModel):\n",
+ " tier: str = Field(description=\"Name of the pricing tier\")\n",
+ " price: str = Field(description=\"Price of the plan\")\n",
+ " credits: int = Field(description=\"Number of credits included in the plan\")\n",
+ "\n",
+ "# Schema for social links\n",
+ "class SocialLinksSchema(BaseModel):\n",
+ " linkedin: str = Field(description=\"LinkedIn page of the company\")\n",
+ " twitter: str = Field(description=\"Twitter page of the company\")\n",
+ " github: str = Field(description=\"GitHub page of the company\")\n",
+ "\n",
+ "# Schema for company information\n",
+ "class CompanyInfoSchema(BaseModel):\n",
+ " company_name: str = Field(description=\"Name of the company\")\n",
+ " description: str = Field(description=\"Brief description of the company\")\n",
+ " founders: List[FounderSchema] = Field(description=\"List of company founders\")\n",
+ " logo: str = Field(description=\"Logo URL of the company\")\n",
+ " partners: List[str] = Field(description=\"List of company partners\")\n",
+ " pricing_plans: List[PricingPlanSchema] = Field(description=\"Details of pricing plans\")\n",
+ " contact_emails: List[str] = Field(description=\"Contact emails of the company\")\n",
+ " social_links: SocialLinksSchema = Field(description=\"Social links of the company\")\n",
+ " privacy_policy: str = Field(description=\"URL to the privacy policy\")\n",
+ " terms_of_service: str = Field(description=\"URL to the terms of service\")\n",
+ " api_status: str = Field(description=\"API status page URL\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "cDGH0b2DkY63"
+ },
+ "source": [
+ "### 🚀 Initialize `ScrapegraphToolSpec` tools and start extraction"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "M1KSXffZopUD"
+ },
+ "source": [
+ "Here we use `scrapegraph_smartscraper` to extract structured data using AI from a webpage.\n",
+ "\n",
+ "\n",
+ "> If you already have an HTML file, you can upload it and use `scrapegraph_local_scrape` instead.\n",
+ "\n",
+ "You can find more info in the [official llama-index documentation](https://docs.llamaindex.ai/en/stable/api_reference/tools/scrapegraph/)\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "aiWfiMa9g8dB"
+ },
+ "outputs": [],
+ "source": [
+ "from llama_index.tools.scrapegraph.base import ScrapegraphToolSpec\n",
+ "\n",
+ "scrapegraph_tool = ScrapegraphToolSpec()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "2FIKomclLNFx"
+ },
+ "outputs": [],
+ "source": [
+ "# Make the API call to scrape news articles\n",
+ "response = scrapegraph_tool.scrapegraph_smartscraper(\n",
+ " prompt=\"Extract info about the company\",\n",
+ " url=\"https://scrapegraphai.com/\",\n",
+ " api_key=os.environ.get(\"SGAI_API_KEY\"),\n",
+ " schema=CompanyInfoSchema,\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "YZz1bqCIpoL8"
+ },
+ "source": [
+ "Print the response"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "F1VfD8B4LPc8",
+ "outputId": "6dc8012c-d80e-4db3-915c-3276c5f1d8fc"
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Company Info:\n",
+ "{\n",
+ " \"request_id\": \"2f41ad97-b0d3-4d44-8c0d-9734c347f595\",\n",
+ " \"status\": \"completed\",\n",
+ " \"website_url\": \"https://www.wired.com/tag/science/\",\n",
+ " \"user_prompt\": \"Extract information about science news articles\",\n",
+ " \"result\": {\n",
+ " \"company_name\": \"WIRED\",\n",
+ " \"description\": \"WIRED is a magazine that covers the intersection of technology, science, culture, and politics. It is the essential source of information and ideas that make sense of a world in constant transformation, illuminating how technology is changing every aspect of our lives.\",\n",
+ " \"founders\": [\n",
+ " {\n",
+ " \"name\": \"Louis Rossetto\",\n",
+ " \"role\": \"Co-founder\",\n",
+ " \"linkedin\": \"NA\"\n",
+ " },\n",
+ " {\n",
+ " \"name\": \"Jane Metcalfe\",\n",
+ " \"role\": \"Co-founder\",\n",
+ " \"linkedin\": \"NA\"\n",
+ " }\n",
+ " ],\n",
+ " \"logo\": \"https://www.wired.com/verso/static/wired-us/assets/logo-header.svg\",\n",
+ " \"partners\": [],\n",
+ " \"pricing_plans\": [],\n",
+ " \"contact_emails\": [],\n",
+ " \"social_links\": {\n",
+ " \"linkedin\": \"NA\",\n",
+ " \"twitter\": \"https://twitter.com/wired/\",\n",
+ " \"github\": \"NA\"\n",
+ " },\n",
+ " \"privacy_policy\": \"http://www.condenast.com/privacy-policy#privacypolicy\",\n",
+ " \"terms_of_service\": \"https://www.condenast.com/user-agreement/\",\n",
+ " \"api_status\": \"NA\"\n",
+ " },\n",
+ " \"error\": \"\"\n",
+ "}\n"
+ ]
+ }
+ ],
+ "source": [
+ "import json\n",
+ "\n",
+ "print(\"Company Info:\")\n",
+ "print(json.dumps(response, indent=2))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "2as65QLypwdb"
+ },
+ "source": [
+ "### 💾 Save the output to a `CSV` file"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "HTLVFgbVLLBR"
+ },
+ "source": [
+ "Let's create a pandas dataframe and show the tables with the extracted content"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "1lS9O1KOI51y"
+ },
+ "outputs": [],
+ "source": [
+ "import pandas as pd\n",
+ "\n",
+ "# Flatten and save main company information\n",
+ "company_info = {\n",
+ " \"company_name\": response[\"result\"][\"company_name\"],\n",
+ " \"description\": response[\"result\"][\"description\"],\n",
+ " \"logo\": response[\"result\"][\"logo\"],\n",
+ " \"contact_emails\": \", \".join(response[\"result\"][\"contact_emails\"]),\n",
+ " \"privacy_policy\": response[\"result\"][\"privacy_policy\"],\n",
+ " \"terms_of_service\": response[\"result\"][\"terms_of_service\"],\n",
+ " \"api_status\": response[\"result\"][\"api_status\"],\n",
+ " \"linkedin\": response[\"result\"][\"social_links\"][\"linkedin\"],\n",
+ " \"twitter\": response[\"result\"][\"social_links\"][\"twitter\"],\n",
+ " \"github\": response[\"result\"][\"social_links\"].get(\"github\", None)\n",
+ "}\n",
+ "\n",
+ "# Creating dataframes\n",
+ "df_company = pd.DataFrame([company_info])\n",
+ "df_founders = pd.DataFrame(response[\"result\"][\"founders\"])\n",
+ "df_pricing = pd.DataFrame(response[\"result\"][\"pricing_plans\"])\n",
+ "df_partners = pd.DataFrame({\"partner\": response[\"result\"][\"partners\"]})"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "JJI9huPkOY9t"
+ },
+ "source": [
+ "Show flattened tables"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 153
+ },
+ "id": "vZs8ZutKOT63",
+ "outputId": "63920d65-5dcb-4e9c-db9b-89786115a5e8"
+ },
+ "outputs": [
+ {
+ "data": {
+ "application/vnd.google.colaboratory.intrinsic+json": {
+ "summary": "{\n \"name\": \"df_company\",\n \"rows\": 1,\n \"fields\": [\n {\n \"column\": \"company_name\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 1,\n \"samples\": [\n \"ScrapeGraphAI\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"description\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 1,\n \"samples\": [\n \"ScrapeGraphAI is a powerful AI scraping API designed for efficient web data extraction to power LLM applications and AI agents. It enables developers to perform intelligent AI scraping and extract structured information from websites using advanced AI techniques.\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"logo\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 1,\n \"samples\": [\n \"https://scrapegraphai.com/images/scrapegraphai_logo.svg\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"contact_emails\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 1,\n \"samples\": [\n \"contact@scrapegraphai.com\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"privacy_policy\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 1,\n \"samples\": [\n \"https://scrapegraphai.com/privacy\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"terms_of_service\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 1,\n \"samples\": [\n \"https://scrapegraphai.com/terms\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"api_status\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 1,\n \"samples\": [\n \"https://scrapegraphapi.openstatus.dev\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"linkedin\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 1,\n \"samples\": [\n \"https://www.linkedin.com/company/101881123\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"twitter\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 1,\n \"samples\": [\n \"https://x.com/scrapegraphai\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"github\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 1,\n \"samples\": [\n \"https://github.com/ScrapeGraphAI/Scrapegraph-ai\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}",
+ "type": "dataframe",
+ "variable_name": "df_company"
+ },
+ "text/html": [
+ "\n",
+ "
\n",
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " company_name | \n",
+ " description | \n",
+ " logo | \n",
+ " contact_emails | \n",
+ " privacy_policy | \n",
+ " terms_of_service | \n",
+ " api_status | \n",
+ " linkedin | \n",
+ " twitter | \n",
+ " github | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 | \n",
+ " ScrapeGraphAI | \n",
+ " ScrapeGraphAI is a powerful AI scraping API de... | \n",
+ " https://scrapegraphai.com/images/scrapegraphai... | \n",
+ " contact@scrapegraphai.com | \n",
+ " https://scrapegraphai.com/privacy | \n",
+ " https://scrapegraphai.com/terms | \n",
+ " https://scrapegraphapi.openstatus.dev | \n",
+ " https://www.linkedin.com/company/101881123 | \n",
+ " https://x.com/scrapegraphai | \n",
+ " https://github.com/ScrapeGraphAI/Scrapegraph-ai | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
\n",
+ "
\n",
+ "
\n"
+ ],
+ "text/plain": [
+ " company_name description \\\n",
+ "0 ScrapeGraphAI ScrapeGraphAI is a powerful AI scraping API de... \n",
+ "\n",
+ " logo \\\n",
+ "0 https://scrapegraphai.com/images/scrapegraphai... \n",
+ "\n",
+ " contact_emails privacy_policy \\\n",
+ "0 contact@scrapegraphai.com https://scrapegraphai.com/privacy \n",
+ "\n",
+ " terms_of_service api_status \\\n",
+ "0 https://scrapegraphai.com/terms https://scrapegraphapi.openstatus.dev \n",
+ "\n",
+ " linkedin twitter \\\n",
+ "0 https://www.linkedin.com/company/101881123 https://x.com/scrapegraphai \n",
+ "\n",
+ " github \n",
+ "0 https://github.com/ScrapeGraphAI/Scrapegraph-ai "
+ ]
+ },
+ "execution_count": 8,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "df_company"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 143
+ },
+ "id": "QR-fyx5cOetl",
+ "outputId": "6c46d45c-974c-417f-fd8f-957b0246c978"
+ },
+ "outputs": [
+ {
+ "data": {
+ "application/vnd.google.colaboratory.intrinsic+json": {
+ "summary": "{\n \"name\": \"df_founders\",\n \"rows\": 3,\n \"fields\": [\n {\n \"column\": \"name\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 3,\n \"samples\": [\n \"Marco Perini\",\n \"Marco Vinciguerra\",\n \"Lorenzo Padoan\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"role\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 3,\n \"samples\": [\n \"Founder & Technical Lead\",\n \"Founder & Software Engineer\",\n \"Founder & Product Engineer\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"linkedin\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 3,\n \"samples\": [\n \"https://www.linkedin.com/in/perinim/\",\n \"https://www.linkedin.com/in/marco-vinciguerra-7ba365242/\",\n \"https://www.linkedin.com/in/lorenzo-padoan-4521a2154/\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}",
+ "type": "dataframe",
+ "variable_name": "df_founders"
+ },
+ "text/html": [
+ "\n",
+ " \n",
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " name | \n",
+ " role | \n",
+ " linkedin | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 | \n",
+ " Marco Perini | \n",
+ " Founder & Technical Lead | \n",
+ " https://www.linkedin.com/in/perinim/ | \n",
+ "
\n",
+ " \n",
+ " 1 | \n",
+ " Marco Vinciguerra | \n",
+ " Founder & Software Engineer | \n",
+ " https://www.linkedin.com/in/marco-vinciguerra-... | \n",
+ "
\n",
+ " \n",
+ " 2 | \n",
+ " Lorenzo Padoan | \n",
+ " Founder & Product Engineer | \n",
+ " https://www.linkedin.com/in/lorenzo-padoan-452... | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
\n",
+ "
\n",
+ "
\n"
+ ],
+ "text/plain": [
+ " name role \\\n",
+ "0 Marco Perini Founder & Technical Lead \n",
+ "1 Marco Vinciguerra Founder & Software Engineer \n",
+ "2 Lorenzo Padoan Founder & Product Engineer \n",
+ "\n",
+ " linkedin \n",
+ "0 https://www.linkedin.com/in/perinim/ \n",
+ "1 https://www.linkedin.com/in/marco-vinciguerra-... \n",
+ "2 https://www.linkedin.com/in/lorenzo-padoan-452... "
+ ]
+ },
+ "execution_count": 9,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "df_founders"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 175
+ },
+ "id": "SWpCvl53OgyQ",
+ "outputId": "751d2e9d-3c98-4c66-d213-181a5d856a56"
+ },
+ "outputs": [
+ {
+ "data": {
+ "application/vnd.google.colaboratory.intrinsic+json": {
+ "summary": "{\n \"name\": \"df_pricing\",\n \"rows\": 4,\n \"fields\": [\n {\n \"column\": \"tier\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 4,\n \"samples\": [\n \"Starter\",\n \"Pro\",\n \"Free\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"price\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 4,\n \"samples\": [\n \"$20/month\",\n \"$500/month\",\n \"$0\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"credits\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 118819,\n \"min\": 100,\n \"max\": 250000,\n \"num_unique_values\": 4,\n \"samples\": [\n 5000,\n 250000,\n 100\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}",
+ "type": "dataframe",
+ "variable_name": "df_pricing"
+ },
+ "text/html": [
+ "\n",
+ " \n",
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " tier | \n",
+ " price | \n",
+ " credits | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 | \n",
+ " Free | \n",
+ " $0 | \n",
+ " 100 | \n",
+ "
\n",
+ " \n",
+ " 1 | \n",
+ " Starter | \n",
+ " $20/month | \n",
+ " 5000 | \n",
+ "
\n",
+ " \n",
+ " 2 | \n",
+ " Growth | \n",
+ " $100/month | \n",
+ " 40000 | \n",
+ "
\n",
+ " \n",
+ " 3 | \n",
+ " Pro | \n",
+ " $500/month | \n",
+ " 250000 | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
\n",
+ "
\n",
+ "
\n"
+ ],
+ "text/plain": [
+ " tier price credits\n",
+ "0 Free $0 100\n",
+ "1 Starter $20/month 5000\n",
+ "2 Growth $100/month 40000\n",
+ "3 Pro $500/month 250000"
+ ]
+ },
+ "execution_count": 10,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "df_pricing"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 363
+ },
+ "id": "jNLaHXlEOisi",
+ "outputId": "0dd7f4dc-9fee-444c-ac1c-a6f829063f76"
+ },
+ "outputs": [
+ {
+ "data": {
+ "application/vnd.google.colaboratory.intrinsic+json": {
+ "summary": "{\n \"name\": \"df_partners\",\n \"rows\": 10,\n \"fields\": [\n {\n \"column\": \"partner\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 10,\n \"samples\": [\n \"Medium\",\n \"AWS\",\n \"Browserbase\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}",
+ "type": "dataframe",
+ "variable_name": "df_partners"
+ },
+ "text/html": [
+ "\n",
+ " \n",
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " partner | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 | \n",
+ " PostHog | \n",
+ "
\n",
+ " \n",
+ " 1 | \n",
+ " AWS | \n",
+ "
\n",
+ " \n",
+ " 2 | \n",
+ " NVIDIA | \n",
+ "
\n",
+ " \n",
+ " 3 | \n",
+ " JinaAI | \n",
+ "
\n",
+ " \n",
+ " 4 | \n",
+ " DagWorks | \n",
+ "
\n",
+ " \n",
+ " 5 | \n",
+ " Browserbase | \n",
+ "
\n",
+ " \n",
+ " 6 | \n",
+ " ScrapeDo | \n",
+ "
\n",
+ " \n",
+ " 7 | \n",
+ " HackerNews | \n",
+ "
\n",
+ " \n",
+ " 8 | \n",
+ " Medium | \n",
+ "
\n",
+ " \n",
+ " 9 | \n",
+ " HackADay | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
\n",
+ "
\n",
+ "
\n"
+ ],
+ "text/plain": [
+ " partner\n",
+ "0 PostHog\n",
+ "1 AWS\n",
+ "2 NVIDIA\n",
+ "3 JinaAI\n",
+ "4 DagWorks\n",
+ "5 Browserbase\n",
+ "6 ScrapeDo\n",
+ "7 HackerNews\n",
+ "8 Medium\n",
+ "9 HackADay"
+ ]
+ },
+ "execution_count": 11,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "df_partners"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "v0CBYVk7qA5Z"
+ },
+ "source": [
+ "Save the responses to CSV"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "BtEbB9pmQGhO",
+ "outputId": "c25648bc-ba04-4e32-e15f-2a650d3ac3ba"
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Data saved to CSV files\n"
+ ]
+ }
+ ],
+ "source": [
+ "# Save the DataFrames to a CSV file\n",
+ "df_company.to_csv(\"company_info.csv\", index=False)\n",
+ "df_founders.to_csv(\"founders.csv\", index=False)\n",
+ "df_pricing.to_csv(\"pricing_plans.csv\", index=False)\n",
+ "df_partners.to_csv(\"partners.csv\", index=False)\n",
+ "# Print confirmation\n",
+ "print(\"Data saved to CSV files\")\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "-1SZT8VzTZNd"
+ },
+ "source": [
+ "## 🔗 Resources"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "dUi2LtMLRDDR"
+ },
+ "source": [
+ "\n",
+ "\n",
+ "
\n",
+ "
\n",
+ "\n",
+ "\n",
+ "- 🚀 **Get your API Key:** [ScrapeGraphAI Dashboard](https://dashboard.scrapegraphai.com) \n",
+ "- 🐙 **GitHub:** [ScrapeGraphAI GitHub](https://github.com/scrapegraphai) \n",
+ "- 💼 **LinkedIn:** [ScrapeGraphAI LinkedIn](https://www.linkedin.com/company/scrapegraphai/) \n",
+ "- 🐦 **Twitter:** [ScrapeGraphAI Twitter](https://twitter.com/scrapegraphai) \n",
+ "- 💬 **Discord:** [Join our Discord Community](https://discord.gg/uJN7TYcpNa) \n",
+ "- 🦙 **LlamaIndex:** [ScrapeGraph docs](https://docs.llamaindex.ai/en/stable/api_reference/tools/scrapegraph/)\n",
+ "\n",
+ "Made with ❤️ by the [ScrapeGraphAI](https://scrapegraphai.com) Team \n"
+ ]
+ }
+ ],
+ "metadata": {
+ "colab": {
+ "provenance": []
+ },
+ "kernelspec": {
+ "display_name": "Python 3",
+ "name": "python3"
+ },
+ "language_info": {
+ "name": "python"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}
\ No newline at end of file
diff --git a/cookbook/company-info/scrapegraph_sdk.ipynb b/cookbook/company-info/scrapegraph_sdk.ipynb
deleted file mode 100644
index 09f7f1b..0000000
--- a/cookbook/company-info/scrapegraph_sdk.ipynb
+++ /dev/null
@@ -1 +0,0 @@
-{"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"provenance":[],"collapsed_sections":["IzsyDXEWwPVt"],"authorship_tag":"ABX9TyO57uo4LpNqAm10rmE0B6Q5"},"kernelspec":{"name":"python3","display_name":"Python 3"},"language_info":{"name":"python"}},"cells":[{"cell_type":"markdown","source":["\n","
\n",""],"metadata":{"id":"ReBHQ5_834pZ"}},{"cell_type":"markdown","source":["## 🕷️ Extract Company Info with Official Scrapegraph SDK"],"metadata":{"id":"jEkuKbcRrPcK"}},{"cell_type":"markdown","source":[""],"metadata":{"id":"3Q5VM3SsRlxO"}},{"cell_type":"markdown","source":["### 🔧 Install `dependencies`"],"metadata":{"id":"IzsyDXEWwPVt"}},{"cell_type":"code","execution_count":null,"metadata":{"id":"os_vm0MkIxr9"},"outputs":[],"source":["%%capture\n","!pip install scrapegraph-py"]},{"cell_type":"markdown","source":["### 🔑 Import `ScrapeGraph` API key"],"metadata":{"id":"apBsL-L2KzM7"}},{"cell_type":"markdown","source":["You can find the Scrapegraph API key [here](https://dashboard.scrapegraphai.com/)"],"metadata":{"id":"ol9gQbAFkh9b"}},{"cell_type":"code","source":["import getpass\n","import os\n","\n","if not os.environ.get(\"SGAI_API_KEY\"):\n"," os.environ[\"SGAI_API_KEY\"] = getpass.getpass(\"Scrapegraph API key:\\n\")"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"sffqFG2EJ8bI","executionInfo":{"status":"ok","timestamp":1734532300517,"user_tz":-60,"elapsed":6877,"user":{"displayName":"ScrapeGraphAI","userId":"10474323355016263615"}},"outputId":"f6b837cd-0f00-49cc-cb6f-f2bca57544f5"},"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["SGAI_API_KEY not found in environment.\n","Please enter your SGAI_API_KEY: ··········\n","SGAI_API_KEY has been set in the environment.\n"]}]},{"cell_type":"markdown","source":["### 📝 Defining an `Output Schema` for Webpage Content Extraction\n"],"metadata":{"id":"jnqMB2-xVYQ7"}},{"cell_type":"markdown","source":["If you already know what you want to extract from a webpage, you can **define an output schema** using **Pydantic**. This schema acts as a \"blueprint\" that tells the AI how to structure the response.\n","\n","\n"," Pydantic Schema Quick Guide
\n","\n","Types of Schemas \n","\n","1. Simple Schema \n","Use this when you want to extract straightforward information, such as a single piece of content. \n","\n","```python\n","from pydantic import BaseModel, Field\n","\n","# Simple schema for a single webpage\n","class PageInfoSchema(BaseModel):\n"," title: str = Field(description=\"The title of the webpage\")\n"," description: str = Field(description=\"The description of the webpage\")\n","\n","# Example Output JSON after AI extraction\n","{\n"," \"title\": \"ScrapeGraphAI: The Best Content Extraction Tool\",\n"," \"description\": \"ScrapeGraphAI provides powerful tools for structured content extraction from websites.\"\n","}\n","```\n","\n","2. Complex Schema (Nested) \n","If you need to extract structured information with multiple related items (like a list of repositories), you can **nest schemas**.\n","\n","```python\n","from pydantic import BaseModel, Field\n","from typing import List\n","\n","# Define a schema for a single repository\n","class RepositorySchema(BaseModel):\n"," name: str = Field(description=\"Name of the repository (e.g., 'owner/repo')\")\n"," description: str = Field(description=\"Description of the repository\")\n"," stars: int = Field(description=\"Star count of the repository\")\n"," forks: int = Field(description=\"Fork count of the repository\")\n"," today_stars: int = Field(description=\"Stars gained today\")\n"," language: str = Field(description=\"Programming language used\")\n","\n","# Define a schema for a list of repositories\n","class ListRepositoriesSchema(BaseModel):\n"," repositories: List[RepositorySchema] = Field(description=\"List of GitHub trending repositories\")\n","\n","# Example Output JSON after AI extraction\n","{\n"," \"repositories\": [\n"," {\n"," \"name\": \"google-gemini/cookbook\",\n"," \"description\": \"Examples and guides for using the Gemini API\",\n"," \"stars\": 8036,\n"," \"forks\": 1001,\n"," \"today_stars\": 649,\n"," \"language\": \"Jupyter Notebook\"\n"," },\n"," {\n"," \"name\": \"TEN-framework/TEN-Agent\",\n"," \"description\": \"TEN Agent is a conversational AI powered by TEN, integrating Gemini 2.0 Multimodal Live API, OpenAI Realtime API, RTC, and more.\",\n"," \"stars\": 3224,\n"," \"forks\": 311,\n"," \"today_stars\": 361,\n"," \"language\": \"Python\"\n"," }\n"," ]\n","}\n","```\n","\n","Key Takeaways \n","- **Simple Schema**: Perfect for small, straightforward extractions. \n","- **Complex Schema**: Use nesting to extract lists or structured data, like \"a list of repositories.\" \n","\n","Both approaches give the AI a clear structure to follow, ensuring that the extracted content matches exactly what you need.\n"," \n"],"metadata":{"id":"VZvxbjfXvbgd"}},{"cell_type":"code","source":["from pydantic import BaseModel, Field\n","from typing import List, Dict, Optional\n","\n","# Schema for founder information\n","class FounderSchema(BaseModel):\n"," name: str = Field(description=\"Name of the founder\")\n"," role: str = Field(description=\"Role of the founder in the company\")\n"," linkedin: str = Field(description=\"LinkedIn profile of the founder\")\n","\n","# Schema for pricing plans\n","class PricingPlanSchema(BaseModel):\n"," tier: str = Field(description=\"Name of the pricing tier\")\n"," price: str = Field(description=\"Price of the plan\")\n"," credits: int = Field(description=\"Number of credits included in the plan\")\n","\n","# Schema for social links\n","class SocialLinksSchema(BaseModel):\n"," linkedin: str = Field(description=\"LinkedIn page of the company\")\n"," twitter: str = Field(description=\"Twitter page of the company\")\n"," github: str = Field(description=\"GitHub page of the company\")\n","\n","# Schema for company information\n","class CompanyInfoSchema(BaseModel):\n"," company_name: str = Field(description=\"Name of the company\")\n"," description: str = Field(description=\"Brief description of the company\")\n"," founders: List[FounderSchema] = Field(description=\"List of company founders\")\n"," logo: str = Field(description=\"Logo URL of the company\")\n"," partners: List[str] = Field(description=\"List of company partners\")\n"," pricing_plans: List[PricingPlanSchema] = Field(description=\"Details of pricing plans\")\n"," contact_emails: List[str] = Field(description=\"Contact emails of the company\")\n"," social_links: SocialLinksSchema = Field(description=\"Social links of the company\")\n"," privacy_policy: str = Field(description=\"URL to the privacy policy\")\n"," terms_of_service: str = Field(description=\"URL to the terms of service\")\n"," api_status: str = Field(description=\"API status page URL\")"],"metadata":{"id":"dlrOEgZk_8V4"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["### 🚀 Initialize `SGAI Client` and start extraction"],"metadata":{"id":"cDGH0b2DkY63"}},{"cell_type":"markdown","source":["Initialize the client for scraping (there's also an async version [here](https://github.com/ScrapeGraphAI/scrapegraph-sdk/blob/main/scrapegraph-py/examples/async_smartscraper_example.py))"],"metadata":{"id":"4SLJgXgcob6L"}},{"cell_type":"code","source":["from scrapegraph_py import Client\n","\n","# Initialize the client with explicit API key\n","sgai_client = Client(api_key=sgai_api_key)"],"metadata":{"id":"PQI25GZvoCSk"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["Here we use `Smartscraper` service to extract structured data using AI from a webpage.\n","\n","\n","> If you already have an HTML file, you can upload it and use `Localscraper` instead.\n","\n","\n","\n"],"metadata":{"id":"M1KSXffZopUD"}},{"cell_type":"code","source":["# Request for Trending Repositories\n","repo_response = sgai_client.smartscraper(\n"," website_url=\"https://scrapegraphai.com/\",\n"," user_prompt=\"Extract info about the company\",\n"," output_schema=CompanyInfoSchema,\n",")"],"metadata":{"id":"2FIKomclLNFx"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["Print the response"],"metadata":{"id":"YZz1bqCIpoL8"}},{"cell_type":"code","source":["import json\n","\n","# Print the response\n","request_id = repo_response['request_id']\n","result = repo_response['result']\n","\n","print(f\"Request ID: {request_id}\")\n","print(\"Company Info:\")\n","print(json.dumps(result, indent=2))"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"F1VfD8B4LPc8","executionInfo":{"status":"ok","timestamp":1734532533318,"user_tz":-60,"elapsed":339,"user":{"displayName":"ScrapeGraphAI","userId":"10474323355016263615"}},"outputId":"8d7b2955-1569-4b3a-8ffe-014a8442dd12"},"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["Request ID: 87a7ea1a-9dd4-4d1d-ae76-b419ead57c11\n","Company Info:\n","{\n"," \"company_name\": \"ScrapeGraphAI\",\n"," \"description\": \"ScrapeGraphAI is a powerful AI scraping API designed for efficient web data extraction to power LLM applications and AI agents. It enables developers to perform intelligent AI scraping and extract structured information from websites using advanced AI techniques.\",\n"," \"founders\": [\n"," {\n"," \"name\": \"Marco Perini\",\n"," \"role\": \"Founder & Technical Lead\",\n"," \"linkedin\": \"https://www.linkedin.com/in/perinim/\"\n"," },\n"," {\n"," \"name\": \"Marco Vinciguerra\",\n"," \"role\": \"Founder & Software Engineer\",\n"," \"linkedin\": \"https://www.linkedin.com/in/marco-vinciguerra-7ba365242/\"\n"," },\n"," {\n"," \"name\": \"Lorenzo Padoan\",\n"," \"role\": \"Founder & Product Engineer\",\n"," \"linkedin\": \"https://www.linkedin.com/in/lorenzo-padoan-4521a2154/\"\n"," }\n"," ],\n"," \"logo\": \"https://scrapegraphai.com/images/scrapegraphai_logo.svg\",\n"," \"partners\": [\n"," \"PostHog\",\n"," \"AWS\",\n"," \"NVIDIA\",\n"," \"JinaAI\",\n"," \"DagWorks\",\n"," \"Browserbase\",\n"," \"ScrapeDo\",\n"," \"HackerNews\",\n"," \"Medium\",\n"," \"HackADay\"\n"," ],\n"," \"pricing_plans\": [\n"," {\n"," \"tier\": \"Free\",\n"," \"price\": \"$0\",\n"," \"credits\": 100\n"," },\n"," {\n"," \"tier\": \"Starter\",\n"," \"price\": \"$20/month\",\n"," \"credits\": 5000\n"," },\n"," {\n"," \"tier\": \"Growth\",\n"," \"price\": \"$100/month\",\n"," \"credits\": 40000\n"," },\n"," {\n"," \"tier\": \"Pro\",\n"," \"price\": \"$500/month\",\n"," \"credits\": 250000\n"," }\n"," ],\n"," \"contact_emails\": [\n"," \"contact@scrapegraphai.com\"\n"," ],\n"," \"social_links\": {\n"," \"linkedin\": \"https://www.linkedin.com/company/101881123\",\n"," \"twitter\": \"https://x.com/scrapegraphai\",\n"," \"github\": \"https://github.com/ScrapeGraphAI/Scrapegraph-ai\"\n"," },\n"," \"privacy_policy\": \"https://scrapegraphai.com/privacy\",\n"," \"terms_of_service\": \"https://scrapegraphai.com/terms\",\n"," \"api_status\": \"https://scrapegraphapi.openstatus.dev\"\n","}\n"]}]},{"cell_type":"markdown","source":["### 💾 Save the output to a `CSV` file"],"metadata":{"id":"2as65QLypwdb"}},{"cell_type":"markdown","source":["Let's create a pandas dataframe and show the tables with the extracted content"],"metadata":{"id":"HTLVFgbVLLBR"}},{"cell_type":"code","source":["import pandas as pd\n","\n","# Flatten and save main company information\n","company_info = {\n"," \"company_name\": result[\"company_name\"],\n"," \"description\": result[\"description\"],\n"," \"logo\": result[\"logo\"],\n"," \"contact_emails\": \", \".join(result[\"contact_emails\"]),\n"," \"privacy_policy\": result[\"privacy_policy\"],\n"," \"terms_of_service\": result[\"terms_of_service\"],\n"," \"api_status\": result[\"api_status\"],\n"," \"linkedin\": result[\"social_links\"][\"linkedin\"],\n"," \"twitter\": result[\"social_links\"][\"twitter\"],\n"," \"github\": result[\"social_links\"].get(\"github\", None)\n","}\n","\n","# Creating dataframes\n","df_company = pd.DataFrame([company_info])\n","df_founders = pd.DataFrame(result[\"founders\"])\n","df_pricing = pd.DataFrame(result[\"pricing_plans\"])\n","df_partners = pd.DataFrame({\"partner\": result[\"partners\"]})"],"metadata":{"id":"1lS9O1KOI51y"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["Show flattened tables"],"metadata":{"id":"JJI9huPkOY9t"}},{"cell_type":"code","source":["df_company"],"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":153},"id":"vZs8ZutKOT63","executionInfo":{"status":"ok","timestamp":1734533012061,"user_tz":-60,"elapsed":199,"user":{"displayName":"ScrapeGraphAI","userId":"10474323355016263615"}},"outputId":"1278a9b9-2ab8-4150-8d37-328d4eb27e49"},"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":[" company_name description \\\n","0 ScrapeGraphAI ScrapeGraphAI is a powerful AI scraping API de... \n","\n"," logo \\\n","0 https://scrapegraphai.com/images/scrapegraphai... \n","\n"," contact_emails privacy_policy \\\n","0 contact@scrapegraphai.com https://scrapegraphai.com/privacy \n","\n"," terms_of_service api_status \\\n","0 https://scrapegraphai.com/terms https://scrapegraphapi.openstatus.dev \n","\n"," linkedin twitter \\\n","0 https://www.linkedin.com/company/101881123 https://x.com/scrapegraphai \n","\n"," github \n","0 https://github.com/ScrapeGraphAI/Scrapegraph-ai "],"text/html":["\n"," \n","
\n","\n","
\n"," \n"," \n"," | \n"," company_name | \n"," description | \n"," logo | \n"," contact_emails | \n"," privacy_policy | \n"," terms_of_service | \n"," api_status | \n"," linkedin | \n"," twitter | \n"," github | \n","
\n"," \n"," \n"," \n"," 0 | \n"," ScrapeGraphAI | \n"," ScrapeGraphAI is a powerful AI scraping API de... | \n"," https://scrapegraphai.com/images/scrapegraphai... | \n"," contact@scrapegraphai.com | \n"," https://scrapegraphai.com/privacy | \n"," https://scrapegraphai.com/terms | \n"," https://scrapegraphapi.openstatus.dev | \n"," https://www.linkedin.com/company/101881123 | \n"," https://x.com/scrapegraphai | \n"," https://github.com/ScrapeGraphAI/Scrapegraph-ai | \n","
\n"," \n","
\n","
\n","
\n","
\n"],"application/vnd.google.colaboratory.intrinsic+json":{"type":"dataframe","variable_name":"df_company","summary":"{\n \"name\": \"df_company\",\n \"rows\": 1,\n \"fields\": [\n {\n \"column\": \"company_name\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 1,\n \"samples\": [\n \"ScrapeGraphAI\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"description\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 1,\n \"samples\": [\n \"ScrapeGraphAI is a powerful AI scraping API designed for efficient web data extraction to power LLM applications and AI agents. It enables developers to perform intelligent AI scraping and extract structured information from websites using advanced AI techniques.\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"logo\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 1,\n \"samples\": [\n \"https://scrapegraphai.com/images/scrapegraphai_logo.svg\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"contact_emails\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 1,\n \"samples\": [\n \"contact@scrapegraphai.com\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"privacy_policy\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 1,\n \"samples\": [\n \"https://scrapegraphai.com/privacy\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"terms_of_service\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 1,\n \"samples\": [\n \"https://scrapegraphai.com/terms\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"api_status\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 1,\n \"samples\": [\n \"https://scrapegraphapi.openstatus.dev\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"linkedin\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 1,\n \"samples\": [\n \"https://www.linkedin.com/company/101881123\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"twitter\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 1,\n \"samples\": [\n \"https://x.com/scrapegraphai\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"github\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 1,\n \"samples\": [\n \"https://github.com/ScrapeGraphAI/Scrapegraph-ai\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}"}},"metadata":{},"execution_count":10}]},{"cell_type":"code","source":["df_founders"],"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":143},"id":"QR-fyx5cOetl","executionInfo":{"status":"ok","timestamp":1734533051319,"user_tz":-60,"elapsed":304,"user":{"displayName":"ScrapeGraphAI","userId":"10474323355016263615"}},"outputId":"4b7d55ed-9ef4-44f9-9008-688d734ca820"},"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":[" name role \\\n","0 Marco Perini Founder & Technical Lead \n","1 Marco Vinciguerra Founder & Software Engineer \n","2 Lorenzo Padoan Founder & Product Engineer \n","\n"," linkedin \n","0 https://www.linkedin.com/in/perinim/ \n","1 https://www.linkedin.com/in/marco-vinciguerra-... \n","2 https://www.linkedin.com/in/lorenzo-padoan-452... "],"text/html":["\n"," \n","
\n","\n","
\n"," \n"," \n"," | \n"," name | \n"," role | \n"," linkedin | \n","
\n"," \n"," \n"," \n"," 0 | \n"," Marco Perini | \n"," Founder & Technical Lead | \n"," https://www.linkedin.com/in/perinim/ | \n","
\n"," \n"," 1 | \n"," Marco Vinciguerra | \n"," Founder & Software Engineer | \n"," https://www.linkedin.com/in/marco-vinciguerra-... | \n","
\n"," \n"," 2 | \n"," Lorenzo Padoan | \n"," Founder & Product Engineer | \n"," https://www.linkedin.com/in/lorenzo-padoan-452... | \n","
\n"," \n","
\n","
\n","
\n","
\n"],"application/vnd.google.colaboratory.intrinsic+json":{"type":"dataframe","variable_name":"df_founders","summary":"{\n \"name\": \"df_founders\",\n \"rows\": 3,\n \"fields\": [\n {\n \"column\": \"name\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 3,\n \"samples\": [\n \"Marco Perini\",\n \"Marco Vinciguerra\",\n \"Lorenzo Padoan\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"role\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 3,\n \"samples\": [\n \"Founder & Technical Lead\",\n \"Founder & Software Engineer\",\n \"Founder & Product Engineer\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"linkedin\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 3,\n \"samples\": [\n \"https://www.linkedin.com/in/perinim/\",\n \"https://www.linkedin.com/in/marco-vinciguerra-7ba365242/\",\n \"https://www.linkedin.com/in/lorenzo-padoan-4521a2154/\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}"}},"metadata":{},"execution_count":11}]},{"cell_type":"code","source":["df_pricing"],"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":175},"id":"SWpCvl53OgyQ","executionInfo":{"status":"ok","timestamp":1734533059550,"user_tz":-60,"elapsed":312,"user":{"displayName":"ScrapeGraphAI","userId":"10474323355016263615"}},"outputId":"c256f5e5-227a-4df4-da16-d0021aaf03a1"},"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":[" tier price credits\n","0 Free $0 100\n","1 Starter $20/month 5000\n","2 Growth $100/month 40000\n","3 Pro $500/month 250000"],"text/html":["\n"," \n","
\n","\n","
\n"," \n"," \n"," | \n"," tier | \n"," price | \n"," credits | \n","
\n"," \n"," \n"," \n"," 0 | \n"," Free | \n"," $0 | \n"," 100 | \n","
\n"," \n"," 1 | \n"," Starter | \n"," $20/month | \n"," 5000 | \n","
\n"," \n"," 2 | \n"," Growth | \n"," $100/month | \n"," 40000 | \n","
\n"," \n"," 3 | \n"," Pro | \n"," $500/month | \n"," 250000 | \n","
\n"," \n","
\n","
\n","
\n","
\n"],"application/vnd.google.colaboratory.intrinsic+json":{"type":"dataframe","variable_name":"df_pricing","summary":"{\n \"name\": \"df_pricing\",\n \"rows\": 4,\n \"fields\": [\n {\n \"column\": \"tier\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 4,\n \"samples\": [\n \"Starter\",\n \"Pro\",\n \"Free\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"price\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 4,\n \"samples\": [\n \"$20/month\",\n \"$500/month\",\n \"$0\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"credits\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 118819,\n \"min\": 100,\n \"max\": 250000,\n \"num_unique_values\": 4,\n \"samples\": [\n 5000,\n 250000,\n 100\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}"}},"metadata":{},"execution_count":12}]},{"cell_type":"code","source":["df_partners"],"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":363},"id":"jNLaHXlEOisi","executionInfo":{"status":"ok","timestamp":1734533067079,"user_tz":-60,"elapsed":216,"user":{"displayName":"ScrapeGraphAI","userId":"10474323355016263615"}},"outputId":"6f075db5-fc3f-437d-9aaa-d6f8e3085c49"},"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":[" partner\n","0 PostHog\n","1 AWS\n","2 NVIDIA\n","3 JinaAI\n","4 DagWorks\n","5 Browserbase\n","6 ScrapeDo\n","7 HackerNews\n","8 Medium\n","9 HackADay"],"text/html":["\n"," \n","
\n","\n","
\n"," \n"," \n"," | \n"," partner | \n","
\n"," \n"," \n"," \n"," 0 | \n"," PostHog | \n","
\n"," \n"," 1 | \n"," AWS | \n","
\n"," \n"," 2 | \n"," NVIDIA | \n","
\n"," \n"," 3 | \n"," JinaAI | \n","
\n"," \n"," 4 | \n"," DagWorks | \n","
\n"," \n"," 5 | \n"," Browserbase | \n","
\n"," \n"," 6 | \n"," ScrapeDo | \n","
\n"," \n"," 7 | \n"," HackerNews | \n","
\n"," \n"," 8 | \n"," Medium | \n","
\n"," \n"," 9 | \n"," HackADay | \n","
\n"," \n","
\n","
\n","
\n","
\n"],"application/vnd.google.colaboratory.intrinsic+json":{"type":"dataframe","variable_name":"df_partners","summary":"{\n \"name\": \"df_partners\",\n \"rows\": 10,\n \"fields\": [\n {\n \"column\": \"partner\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 10,\n \"samples\": [\n \"Medium\",\n \"AWS\",\n \"Browserbase\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}"}},"metadata":{},"execution_count":13}]},{"cell_type":"markdown","source":["Save the results to CSV"],"metadata":{"id":"v0CBYVk7qA5Z"}},{"cell_type":"code","source":["# Save the DataFrames to a CSV file\n","df_company.to_csv(\"company_info.csv\", index=False)\n","df_founders.to_csv(\"founders.csv\", index=False)\n","df_pricing.to_csv(\"pricing_plans.csv\", index=False)\n","df_partners.to_csv(\"partners.csv\", index=False)\n","# Print confirmation\n","print(\"Data saved to CSV files\")\n"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"BtEbB9pmQGhO","executionInfo":{"status":"ok","timestamp":1734533092882,"user_tz":-60,"elapsed":213,"user":{"displayName":"ScrapeGraphAI","userId":"10474323355016263615"}},"outputId":"3f05c8ba-7b34-4b53-ab20-bfcc78060557"},"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["Data saved to CSV files\n"]}]},{"cell_type":"markdown","source":["## 🔗 Resources"],"metadata":{"id":"-1SZT8VzTZNd"}},{"cell_type":"markdown","source":["\n","\n","
\n","
\n","\n","\n","- 🚀 **Get your API Key:** [ScrapeGraphAI Dashboard](https://dashboard.scrapegraphai.com) \n","- 🐙 **GitHub:** [ScrapeGraphAI GitHub](https://github.com/scrapegraphai) \n","- 💼 **LinkedIn:** [ScrapeGraphAI LinkedIn](https://www.linkedin.com/company/scrapegraphai/) \n","- 🐦 **Twitter:** [ScrapeGraphAI Twitter](https://twitter.com/scrapegraphai) \n","- 💬 **Discord:** [Join our Discord Community](https://discord.gg/uJN7TYcpNa) \n","\n","Made with ❤️ by the [ScrapeGraphAI](https://scrapegraphai.com) Team \n"],"metadata":{"id":"dUi2LtMLRDDR"}}]}
\ No newline at end of file
diff --git a/cookbook/github-trending/scrapegraph_llama_index.ipynb b/cookbook/github-trending/scrapegraph_llama_index.ipynb
new file mode 100644
index 0000000..25fe793
--- /dev/null
+++ b/cookbook/github-trending/scrapegraph_llama_index.ipynb
@@ -0,0 +1,999 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "ReBHQ5_834pZ"
+ },
+ "source": [
+ "\n",
+ "
\n",
+ ""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "jEkuKbcRrPcK"
+ },
+ "source": [
+ "## 🕷️ Extract Github Trending Repositories with llama index and scrapegraphai's APIs\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "IhozYNwsgJzt"
+ },
+ "source": [
+ ""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "IzsyDXEWwPVt"
+ },
+ "source": [
+ "### 🔧 Install `dependencies`"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "os_vm0MkIxr9"
+ },
+ "outputs": [],
+ "source": [
+ "%%capture\n",
+ "!pip install llama-index\n",
+ "!pip install llama-index-tools-scrapegraphai\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "apBsL-L2KzM7"
+ },
+ "source": [
+ "### 🔑 Import `ScrapeGraph` API key"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "ol9gQbAFkh9b"
+ },
+ "source": [
+ "You can find the Scrapegraph API key [here](https://dashboard.scrapegraphai.com/)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "sffqFG2EJ8bI",
+ "outputId": "07af4bbe-c226-4fb3-8f68-7ccd429f59cc"
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "SGAI_API_KEY not found in environment.\n",
+ "SGAI_API_KEY has been set in the environment.\n"
+ ]
+ }
+ ],
+ "source": [
+ "import os\n",
+ "from getpass import getpass\n",
+ "\n",
+ "# Check if the API key is already set in the environment\n",
+ "sgai_api_key = os.getenv(\"SGAI_API_KEY\")\n",
+ "\n",
+ "if sgai_api_key:\n",
+ " print(\"SGAI_API_KEY found in environment.\")\n",
+ "else:\n",
+ " print(\"SGAI_API_KEY not found in environment.\")\n",
+ " # Prompt the user to input the API key securely (hidden input)\n",
+ " sgai_api_key = getpass(\"Please enter your SGAI_API_KEY: \").strip()\n",
+ " if sgai_api_key:\n",
+ " # Set the API key in the environment\n",
+ " os.environ[\"SGAI_API_KEY\"] = sgai_api_key\n",
+ " print(\"SGAI_API_KEY has been set in the environment.\")\n",
+ " else:\n",
+ " print(\"No API key entered. Please set the API key to continue.\")\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "jnqMB2-xVYQ7"
+ },
+ "source": [
+ "### 📝 Defining an `Output Schema` for Webpage Content Extraction\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "VZvxbjfXvbgd"
+ },
+ "source": [
+ "If you already know what you want to extract from a webpage, you can **define an output schema** using **Pydantic**. This schema acts as a \"blueprint\" that tells the AI how to structure the response.\n",
+ "\n",
+ "\n",
+ " Pydantic Schema Quick Guide
\n",
+ "\n",
+ "Types of Schemas \n",
+ "\n",
+ "1. Simple Schema \n",
+ "Use this when you want to extract straightforward information, such as a single piece of content. \n",
+ "\n",
+ "```python\n",
+ "from pydantic import BaseModel, Field\n",
+ "\n",
+ "# Simple schema for a single webpage\n",
+ "class PageInfoSchema(BaseModel):\n",
+ " title: str = Field(description=\"The title of the webpage\")\n",
+ " description: str = Field(description=\"The description of the webpage\")\n",
+ "\n",
+ "# Example Output JSON after AI extraction\n",
+ "{\n",
+ " \"title\": \"ScrapeGraphAI: The Best Content Extraction Tool\",\n",
+ " \"description\": \"ScrapeGraphAI provides powerful tools for structured content extraction from websites.\"\n",
+ "}\n",
+ "```\n",
+ "\n",
+ "2. Complex Schema (Nested) \n",
+ "If you need to extract structured information with multiple related items (like a list of repositories), you can **nest schemas**.\n",
+ "\n",
+ "```python\n",
+ "from pydantic import BaseModel, Field\n",
+ "from typing import List\n",
+ "\n",
+ "# Define a schema for a single repository\n",
+ "class RepositorySchema(BaseModel):\n",
+ " name: str = Field(description=\"Name of the repository (e.g., 'owner/repo')\")\n",
+ " description: str = Field(description=\"Description of the repository\")\n",
+ " stars: int = Field(description=\"Star count of the repository\")\n",
+ " forks: int = Field(description=\"Fork count of the repository\")\n",
+ " today_stars: int = Field(description=\"Stars gained today\")\n",
+ " language: str = Field(description=\"Programming language used\")\n",
+ "\n",
+ "# Define a schema for a list of repositories\n",
+ "class ListRepositoriesSchema(BaseModel):\n",
+ " repositories: List[RepositorySchema] = Field(description=\"List of GitHub trending repositories\")\n",
+ "\n",
+ "# Example Output JSON after AI extraction\n",
+ "{\n",
+ " \"repositories\": [\n",
+ " {\n",
+ " \"name\": \"google-gemini/cookbook\",\n",
+ " \"description\": \"Examples and guides for using the Gemini API\",\n",
+ " \"stars\": 8036,\n",
+ " \"forks\": 1001,\n",
+ " \"today_stars\": 649,\n",
+ " \"language\": \"Jupyter Notebook\"\n",
+ " },\n",
+ " {\n",
+ " \"name\": \"TEN-framework/TEN-Agent\",\n",
+ " \"description\": \"TEN Agent is a conversational AI powered by TEN, integrating Gemini 2.0 Multimodal Live API, OpenAI Realtime API, RTC, and more.\",\n",
+ " \"stars\": 3224,\n",
+ " \"forks\": 311,\n",
+ " \"today_stars\": 361,\n",
+ " \"language\": \"Python\"\n",
+ " }\n",
+ " ]\n",
+ "}\n",
+ "```\n",
+ "\n",
+ "Key Takeaways \n",
+ "- **Simple Schema**: Perfect for small, straightforward extractions. \n",
+ "- **Complex Schema**: Use nesting to extract lists or structured data, like \"a list of repositories.\" \n",
+ "\n",
+ "Both approaches give the AI a clear structure to follow, ensuring that the extracted content matches exactly what you need.\n",
+ " \n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "dlrOEgZk_8V4"
+ },
+ "outputs": [],
+ "source": [
+ "from pydantic import BaseModel, Field\n",
+ "from typing import List\n",
+ "\n",
+ "# Schema for Trending Repositories\n",
+ "# This defines only the structure of how a single repository should look like\n",
+ "class RepositorySchema(BaseModel):\n",
+ " name: str = Field(description=\"Name of the repository (e.g., 'owner/repo')\")\n",
+ " description: str = Field(description=\"Description of the repository\")\n",
+ " stars: int = Field(description=\"Star count of the repository\")\n",
+ " forks: int = Field(description=\"Fork count of the repository\")\n",
+ " today_stars: int = Field(description=\"Stars gained today\")\n",
+ " language: str = Field(description=\"Programming language used\")\n",
+ "\n",
+ "# Schema that contains a list of repositories\n",
+ "# This references the previous schema\n",
+ "class ListRepositoriesSchema(BaseModel):\n",
+ " repositories: List[RepositorySchema] = Field(description=\"List of github trending repositories\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "cDGH0b2DkY63"
+ },
+ "source": [
+ "### 🚀 Initialize `ScrapegraphToolSpec` tools and start extraction"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "M1KSXffZopUD"
+ },
+ "source": [
+ "Here we use `scrapegraph_smartscraper` to extract structured data using AI from a webpage.\n",
+ "\n",
+ "\n",
+ "> If you already have an HTML file, you can upload it and use `scrapegraph_local_scrape` instead.\n",
+ "\n",
+ "You can find more info in the [official llama-index documentation](https://docs.llamaindex.ai/en/stable/api_reference/tools/scrapegraph/)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "bNt9QkkEncIA"
+ },
+ "outputs": [],
+ "source": [
+ "from llama_index.tools.scrapegraph.base import ScrapegraphToolSpec"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "2FIKomclLNFx"
+ },
+ "outputs": [],
+ "source": [
+ "scrapegraph_tool = ScrapegraphToolSpec()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "XGAEX1ZPY7b7"
+ },
+ "source": [
+ "`Invoke` the tool"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "GZQr_Y59Y0df"
+ },
+ "outputs": [],
+ "source": [
+ "response = scrapegraph_tool.scrapegraph_smartscraper(\n",
+ " prompt=\"Extract only the first ten github trending repositories\",\n",
+ " url=\"https://github.com/trending\",\n",
+ " api_key=os.environ.get(\"SGAI_API_KEY\"),\n",
+ " schema=ListRepositoriesSchema,\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "-6YKuEqiZcPC"
+ },
+ "source": [
+ "> As you may have noticed, we are not passing the `llm_output_schema` while invoking the tool, this will make life easier to `AI agents` since they will not need to generate one themselves with high risk of failure. Instead, we force the tool to return always a structured output that follows your previously defined schema. To find out more, check the following [README](https://github.com/ScrapeGraphAI/langchain-scrapegraph)\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "YZz1bqCIpoL8"
+ },
+ "source": [
+ "Print the response"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "F1VfD8B4LPc8",
+ "outputId": "1faad90e-9f9a-496a-e771-d92007e06b0e"
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Trending Repositories:\n",
+ "{\n",
+ " \"repositories\": [\n",
+ " {\n",
+ " \"name\": \"XiaoMi/ha_xiaomi_home\",\n",
+ " \"description\": \"Xiaomi Home Integration for Home Assistant\",\n",
+ " \"stars\": 11097,\n",
+ " \"forks\": 472,\n",
+ " \"today_stars\": 3023,\n",
+ " \"language\": \"Python\"\n",
+ " },\n",
+ " {\n",
+ " \"name\": \"comet-ml/opik\",\n",
+ " \"description\": \"Open-source end-to-end LLM Development Platform\",\n",
+ " \"stars\": 2741,\n",
+ " \"forks\": 169,\n",
+ " \"today_stars\": 91,\n",
+ " \"language\": \"Java\"\n",
+ " },\n",
+ " {\n",
+ " \"name\": \"EbookFoundation/free-programming-books\",\n",
+ " \"description\": \"\\ud83d\\udcda Freely available programming books\",\n",
+ " \"stars\": 341919,\n",
+ " \"forks\": 62038,\n",
+ " \"today_stars\": 225,\n",
+ " \"language\": \"HTML\"\n",
+ " },\n",
+ " {\n",
+ " \"name\": \"konfig-dev/konfig\",\n",
+ " \"description\": \"Sunset as of December 2024\",\n",
+ " \"stars\": 689,\n",
+ " \"forks\": 192,\n",
+ " \"today_stars\": 224,\n",
+ " \"language\": \"TypeScript\"\n",
+ " },\n",
+ " {\n",
+ " \"name\": \"anoma/anoma\",\n",
+ " \"description\": \"Reference implementation of Anoma\",\n",
+ " \"stars\": 9451,\n",
+ " \"forks\": 452,\n",
+ " \"today_stars\": 4129,\n",
+ " \"language\": \"Elixir\"\n",
+ " },\n",
+ " {\n",
+ " \"name\": \"stripe/stripe-ios\",\n",
+ " \"description\": \"Stripe iOS SDK\",\n",
+ " \"stars\": 2292,\n",
+ " \"forks\": 1004,\n",
+ " \"today_stars\": 49,\n",
+ " \"language\": \"Swift\"\n",
+ " },\n",
+ " {\n",
+ " \"name\": \"Guovin/iptv-api\",\n",
+ " \"description\": \"IPTV live TV source update tool\",\n",
+ " \"stars\": 9385,\n",
+ " \"forks\": 2010,\n",
+ " \"today_stars\": 91,\n",
+ " \"language\": \"Python\"\n",
+ " },\n",
+ " {\n",
+ " \"name\": \"facebookresearch/AnimatedDrawings\",\n",
+ " \"description\": \"Code to accompany \\\"A Method for Animating Children's Drawings of the Human Figure\\\"\",\n",
+ " \"stars\": 11473,\n",
+ " \"forks\": 988,\n",
+ " \"today_stars\": 398,\n",
+ " \"language\": \"Python\"\n",
+ " },\n",
+ " {\n",
+ " \"name\": \"apache/airflow\",\n",
+ " \"description\": \"Apache Airflow - A platform to programmatically author, schedule, and monitor workflows\",\n",
+ " \"stars\": 37690,\n",
+ " \"forks\": 14411,\n",
+ " \"today_stars\": 25,\n",
+ " \"language\": \"Python\"\n",
+ " },\n",
+ " {\n",
+ " \"name\": \"seleniumbase/SeleniumBase\",\n",
+ " \"description\": \"Python APIs for web automation, testing, and bypassing bot-detection.\",\n",
+ " \"stars\": 6646,\n",
+ " \"forks\": 1028,\n",
+ " \"today_stars\": 624,\n",
+ " \"language\": \"Python\"\n",
+ " }\n",
+ " ]\n",
+ "}\n"
+ ]
+ }
+ ],
+ "source": [
+ "import json\n",
+ "\n",
+ "print(\"Trending Repositories:\")\n",
+ "print(json.dumps(response, indent=2))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "2as65QLypwdb"
+ },
+ "source": [
+ "### 💾 Save the output to a `CSV` file"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "HTLVFgbVLLBR"
+ },
+ "source": [
+ "Let's create a pandas dataframe and show the table with the extracted content"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 730
+ },
+ "id": "1lS9O1KOI51y",
+ "outputId": "4dc6a8db-7f6c-49b7-90fa-0e74c2cf77dd"
+ },
+ "outputs": [
+ {
+ "data": {
+ "application/vnd.google.colaboratory.intrinsic+json": {
+ "summary": "{\n \"name\": \"df\",\n \"rows\": 10,\n \"fields\": [\n {\n \"column\": \"name\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 10,\n \"samples\": [\n \"apache/airflow\",\n \"comet-ml/opik\",\n \"stripe/stripe-ios\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"description\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 10,\n \"samples\": [\n \"Apache Airflow - A platform to programmatically author, schedule, and monitor workflows\",\n \"Open-source end-to-end LLM Development Platform\",\n \"Stripe iOS SDK\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"stars\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 105428,\n \"min\": 689,\n \"max\": 341919,\n \"num_unique_values\": 10,\n \"samples\": [\n 37690,\n 2741,\n 2292\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"forks\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 19376,\n \"min\": 169,\n \"max\": 62038,\n \"num_unique_values\": 10,\n \"samples\": [\n 14411,\n 169,\n 1004\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"today_stars\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 1451,\n \"min\": 25,\n \"max\": 4129,\n \"num_unique_values\": 9,\n \"samples\": [\n 25,\n 91,\n 49\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"language\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 6,\n \"samples\": [\n \"Python\",\n \"Java\",\n \"Swift\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}",
+ "type": "dataframe",
+ "variable_name": "df"
+ },
+ "text/html": [
+ "\n",
+ " \n",
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " name | \n",
+ " description | \n",
+ " stars | \n",
+ " forks | \n",
+ " today_stars | \n",
+ " language | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 | \n",
+ " XiaoMi/ha_xiaomi_home | \n",
+ " Xiaomi Home Integration for Home Assistant | \n",
+ " 11097 | \n",
+ " 472 | \n",
+ " 3023 | \n",
+ " Python | \n",
+ "
\n",
+ " \n",
+ " 1 | \n",
+ " comet-ml/opik | \n",
+ " Open-source end-to-end LLM Development Platform | \n",
+ " 2741 | \n",
+ " 169 | \n",
+ " 91 | \n",
+ " Java | \n",
+ "
\n",
+ " \n",
+ " 2 | \n",
+ " EbookFoundation/free-programming-books | \n",
+ " 📚 Freely available programming books | \n",
+ " 341919 | \n",
+ " 62038 | \n",
+ " 225 | \n",
+ " HTML | \n",
+ "
\n",
+ " \n",
+ " 3 | \n",
+ " konfig-dev/konfig | \n",
+ " Sunset as of December 2024 | \n",
+ " 689 | \n",
+ " 192 | \n",
+ " 224 | \n",
+ " TypeScript | \n",
+ "
\n",
+ " \n",
+ " 4 | \n",
+ " anoma/anoma | \n",
+ " Reference implementation of Anoma | \n",
+ " 9451 | \n",
+ " 452 | \n",
+ " 4129 | \n",
+ " Elixir | \n",
+ "
\n",
+ " \n",
+ " 5 | \n",
+ " stripe/stripe-ios | \n",
+ " Stripe iOS SDK | \n",
+ " 2292 | \n",
+ " 1004 | \n",
+ " 49 | \n",
+ " Swift | \n",
+ "
\n",
+ " \n",
+ " 6 | \n",
+ " Guovin/iptv-api | \n",
+ " IPTV live TV source update tool | \n",
+ " 9385 | \n",
+ " 2010 | \n",
+ " 91 | \n",
+ " Python | \n",
+ "
\n",
+ " \n",
+ " 7 | \n",
+ " facebookresearch/AnimatedDrawings | \n",
+ " Code to accompany \"A Method for Animating Chil... | \n",
+ " 11473 | \n",
+ " 988 | \n",
+ " 398 | \n",
+ " Python | \n",
+ "
\n",
+ " \n",
+ " 8 | \n",
+ " apache/airflow | \n",
+ " Apache Airflow - A platform to programmaticall... | \n",
+ " 37690 | \n",
+ " 14411 | \n",
+ " 25 | \n",
+ " Python | \n",
+ "
\n",
+ " \n",
+ " 9 | \n",
+ " seleniumbase/SeleniumBase | \n",
+ " Python APIs for web automation, testing, and b... | \n",
+ " 6646 | \n",
+ " 1028 | \n",
+ " 624 | \n",
+ " Python | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
\n",
+ "
\n",
+ "
\n"
+ ],
+ "text/plain": [
+ " name \\\n",
+ "0 XiaoMi/ha_xiaomi_home \n",
+ "1 comet-ml/opik \n",
+ "2 EbookFoundation/free-programming-books \n",
+ "3 konfig-dev/konfig \n",
+ "4 anoma/anoma \n",
+ "5 stripe/stripe-ios \n",
+ "6 Guovin/iptv-api \n",
+ "7 facebookresearch/AnimatedDrawings \n",
+ "8 apache/airflow \n",
+ "9 seleniumbase/SeleniumBase \n",
+ "\n",
+ " description stars forks \\\n",
+ "0 Xiaomi Home Integration for Home Assistant 11097 472 \n",
+ "1 Open-source end-to-end LLM Development Platform 2741 169 \n",
+ "2 📚 Freely available programming books 341919 62038 \n",
+ "3 Sunset as of December 2024 689 192 \n",
+ "4 Reference implementation of Anoma 9451 452 \n",
+ "5 Stripe iOS SDK 2292 1004 \n",
+ "6 IPTV live TV source update tool 9385 2010 \n",
+ "7 Code to accompany \"A Method for Animating Chil... 11473 988 \n",
+ "8 Apache Airflow - A platform to programmaticall... 37690 14411 \n",
+ "9 Python APIs for web automation, testing, and b... 6646 1028 \n",
+ "\n",
+ " today_stars language \n",
+ "0 3023 Python \n",
+ "1 91 Java \n",
+ "2 225 HTML \n",
+ "3 224 TypeScript \n",
+ "4 4129 Elixir \n",
+ "5 49 Swift \n",
+ "6 91 Python \n",
+ "7 398 Python \n",
+ "8 25 Python \n",
+ "9 624 Python "
+ ]
+ },
+ "execution_count": 13,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "import pandas as pd\n",
+ "\n",
+ "# Convert dictionary to DataFrame\n",
+ "df = pd.DataFrame(response[\"repositories\"])\n",
+ "df"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "v0CBYVk7qA5Z"
+ },
+ "source": [
+ "Save it to CSV"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "BtEbB9pmQGhO",
+ "outputId": "bf8a22dc-e35e-4bae-948b-71c4c4622504"
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Data saved to trending_repositories.csv\n"
+ ]
+ }
+ ],
+ "source": [
+ "# Save the DataFrame to a CSV file\n",
+ "csv_file = \"trending_repositories.csv\"\n",
+ "df.to_csv(csv_file, index=False)\n",
+ "print(f\"Data saved to {csv_file}\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "-1SZT8VzTZNd"
+ },
+ "source": [
+ "## 🔗 Resources"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "dUi2LtMLRDDR"
+ },
+ "source": [
+ "\n",
+ "\n",
+ "
\n",
+ "
\n",
+ "\n",
+ "\n",
+ "- 🚀 **Get your API Key:** [ScrapeGraphAI Dashboard](https://dashboard.scrapegraphai.com) \n",
+ "- 🐙 **GitHub:** [ScrapeGraphAI GitHub](https://github.com/scrapegraphai) \n",
+ "- 💼 **LinkedIn:** [ScrapeGraphAI LinkedIn](https://www.linkedin.com/company/scrapegraphai/) \n",
+ "- 🐦 **Twitter:** [ScrapeGraphAI Twitter](https://twitter.com/scrapegraphai) \n",
+ "- 💬 **Discord:** [Join our Discord Community](https://discord.gg/uJN7TYcpNa) \n",
+ "- 🦙 **LlamaIndex:** [ScrapeGraph docs](https://docs.llamaindex.ai/en/stable/api_reference/tools/scrapegraph/)\n",
+ "\n",
+ "Made with ❤️ by the [ScrapeGraphAI](https://scrapegraphai.com) Team \n"
+ ]
+ }
+ ],
+ "metadata": {
+ "colab": {
+ "collapsed_sections": [
+ "jnqMB2-xVYQ7"
+ ],
+ "provenance": []
+ },
+ "kernelspec": {
+ "display_name": "Python 3",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.10.14"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}
\ No newline at end of file
diff --git a/cookbook/github-trending/scrapegraph_sdk.ipynb b/cookbook/github-trending/scrapegraph_sdk.ipynb
deleted file mode 100644
index ae59d0b..0000000
--- a/cookbook/github-trending/scrapegraph_sdk.ipynb
+++ /dev/null
@@ -1 +0,0 @@
-{"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"provenance":[],"collapsed_sections":["IzsyDXEWwPVt","jnqMB2-xVYQ7","cDGH0b2DkY63","2as65QLypwdb"],"authorship_tag":"ABX9TyM1qXPrrrWt8sAHKB8wCDas"},"kernelspec":{"name":"python3","display_name":"Python 3"},"language_info":{"name":"python"}},"cells":[{"cell_type":"markdown","source":["\n","
\n",""],"metadata":{"id":"ReBHQ5_834pZ"}},{"cell_type":"markdown","source":["## 🕷️ Extract Github Trending Repositories with Official Scrapegraph SDK"],"metadata":{"id":"jEkuKbcRrPcK"}},{"cell_type":"markdown","source":[""],"metadata":{"id":"d7Zro0xiuo-l"}},{"cell_type":"markdown","source":["### 🔧 Install `dependencies`"],"metadata":{"id":"IzsyDXEWwPVt"}},{"cell_type":"code","execution_count":null,"metadata":{"id":"os_vm0MkIxr9"},"outputs":[],"source":["%%capture\n","!pip install scrapegraph-py"]},{"cell_type":"markdown","source":["### 🔑 Import `ScrapeGraph` API key"],"metadata":{"id":"apBsL-L2KzM7"}},{"cell_type":"markdown","source":["You can find the Scrapegraph API key [here](https://dashboard.scrapegraphai.com/)"],"metadata":{"id":"ol9gQbAFkh9b"}},{"cell_type":"code","source":["import getpass\n","import os\n","\n","if not os.environ.get(\"SGAI_API_KEY\"):\n"," os.environ[\"SGAI_API_KEY\"] = getpass.getpass(\"Scrapegraph API key:\\n\")"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"sffqFG2EJ8bI","executionInfo":{"status":"ok","timestamp":1734439787062,"user_tz":-60,"elapsed":5826,"user":{"displayName":"ScrapeGraphAI","userId":"10474323355016263615"}},"outputId":"ab74193e-e746-4de6-d65d-33a2a26b5d86"},"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["SGAI_API_KEY not found in environment.\n","Please enter your SGAI_API_KEY: ··········\n","SGAI_API_KEY has been set in the environment.\n"]}]},{"cell_type":"markdown","source":["### 📝 Defining an `Output Schema` for Webpage Content Extraction\n"],"metadata":{"id":"jnqMB2-xVYQ7"}},{"cell_type":"markdown","source":["If you already know what you want to extract from a webpage, you can **define an output schema** using **Pydantic**. This schema acts as a \"blueprint\" that tells the AI how to structure the response.\n","\n","\n"," Pydantic Schema Quick Guide
\n","\n","Types of Schemas \n","\n","1. Simple Schema \n","Use this when you want to extract straightforward information, such as a single piece of content. \n","\n","```python\n","from pydantic import BaseModel, Field\n","\n","# Simple schema for a single webpage\n","class PageInfoSchema(BaseModel):\n"," title: str = Field(description=\"The title of the webpage\")\n"," description: str = Field(description=\"The description of the webpage\")\n","\n","# Example Output JSON after AI extraction\n","{\n"," \"title\": \"ScrapeGraphAI: The Best Content Extraction Tool\",\n"," \"description\": \"ScrapeGraphAI provides powerful tools for structured content extraction from websites.\"\n","}\n","```\n","\n","2. Complex Schema (Nested) \n","If you need to extract structured information with multiple related items (like a list of repositories), you can **nest schemas**.\n","\n","```python\n","from pydantic import BaseModel, Field\n","from typing import List\n","\n","# Define a schema for a single repository\n","class RepositorySchema(BaseModel):\n"," name: str = Field(description=\"Name of the repository (e.g., 'owner/repo')\")\n"," description: str = Field(description=\"Description of the repository\")\n"," stars: int = Field(description=\"Star count of the repository\")\n"," forks: int = Field(description=\"Fork count of the repository\")\n"," today_stars: int = Field(description=\"Stars gained today\")\n"," language: str = Field(description=\"Programming language used\")\n","\n","# Define a schema for a list of repositories\n","class ListRepositoriesSchema(BaseModel):\n"," repositories: List[RepositorySchema] = Field(description=\"List of GitHub trending repositories\")\n","\n","# Example Output JSON after AI extraction\n","{\n"," \"repositories\": [\n"," {\n"," \"name\": \"google-gemini/cookbook\",\n"," \"description\": \"Examples and guides for using the Gemini API\",\n"," \"stars\": 8036,\n"," \"forks\": 1001,\n"," \"today_stars\": 649,\n"," \"language\": \"Jupyter Notebook\"\n"," },\n"," {\n"," \"name\": \"TEN-framework/TEN-Agent\",\n"," \"description\": \"TEN Agent is a conversational AI powered by TEN, integrating Gemini 2.0 Multimodal Live API, OpenAI Realtime API, RTC, and more.\",\n"," \"stars\": 3224,\n"," \"forks\": 311,\n"," \"today_stars\": 361,\n"," \"language\": \"Python\"\n"," }\n"," ]\n","}\n","```\n","\n","Key Takeaways \n","- **Simple Schema**: Perfect for small, straightforward extractions. \n","- **Complex Schema**: Use nesting to extract lists or structured data, like \"a list of repositories.\" \n","\n","Both approaches give the AI a clear structure to follow, ensuring that the extracted content matches exactly what you need.\n"," \n"],"metadata":{"id":"VZvxbjfXvbgd"}},{"cell_type":"code","source":["from pydantic import BaseModel, Field\n","from typing import List\n","\n","# Schema for Trending Repositories\n","# This defines only the structure of how a single repository should look like\n","class RepositorySchema(BaseModel):\n"," name: str = Field(description=\"Name of the repository (e.g., 'owner/repo')\")\n"," description: str = Field(description=\"Description of the repository\")\n"," stars: int = Field(description=\"Star count of the repository\")\n"," forks: int = Field(description=\"Fork count of the repository\")\n"," today_stars: int = Field(description=\"Stars gained today\")\n"," language: str = Field(description=\"Programming language used\")\n","\n","# Schema that contains a list of repositories\n","# This references the previous schema\n","class ListRepositoriesSchema(BaseModel):\n"," repositories: List[RepositorySchema] = Field(description=\"List of github trending repositories\")"],"metadata":{"id":"dlrOEgZk_8V4"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["### 🚀 Initialize `SGAI Client` and start extraction"],"metadata":{"id":"cDGH0b2DkY63"}},{"cell_type":"markdown","source":["Initialize the client for scraping (there's also an async version [here](https://github.com/ScrapeGraphAI/scrapegraph-sdk/blob/main/scrapegraph-py/examples/async_smartscraper_example.py))"],"metadata":{"id":"4SLJgXgcob6L"}},{"cell_type":"code","source":["from scrapegraph_py import Client\n","\n","# Initialize the client with explicit API key\n","sgai_client = Client(api_key=sgai_api_key)"],"metadata":{"id":"PQI25GZvoCSk"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["Here we use `Smartscraper` service to extract structured data using AI from a webpage.\n","\n","\n","> If you already have an HTML file, you can upload it and use `Localscraper` instead.\n","\n","\n","\n"],"metadata":{"id":"M1KSXffZopUD"}},{"cell_type":"code","source":["# Request for Trending Repositories\n","repo_response = sgai_client.smartscraper(\n"," website_url=\"https://github.com/trending\",\n"," user_prompt=\"Extract only the visible github trending repositories\",\n"," output_schema=ListRepositoriesSchema,\n",")"],"metadata":{"id":"2FIKomclLNFx"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["Print the response"],"metadata":{"id":"YZz1bqCIpoL8"}},{"cell_type":"code","source":["import json\n","\n","# Print the response\n","request_id = repo_response['request_id']\n","result = repo_response['result']\n","\n","print(f\"Request ID: {request_id}\")\n","print(\"Trending Repositories:\")\n","print(json.dumps(result, indent=2))"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"F1VfD8B4LPc8","executionInfo":{"status":"ok","timestamp":1734439624722,"user_tz":-60,"elapsed":266,"user":{"displayName":"ScrapeGraphAI","userId":"10474323355016263615"}},"outputId":"6b4db540-076e-4d3f-a5ef-a29e14fbb233"},"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["Request ID: 1e3b00ff-4b55-497c-8046-8ec5503cdafd\n","Trending Repositories:\n","{\n"," \"repositories\": [\n"," {\n"," \"name\": \"Byaidu/PDFMathTranslate\",\n"," \"description\": \"PDF scientific paper translation with preserved formats - \\u57fa\\u4e8e AI \\u5b8c\\u6574\\u4fdd\\u7559\\u6392\\u7248\\u7684 PDF \\u6587\\u6863\\u5168\\u6587\\u53cc\\u8bed\\u7ffb\\u8bd1\\uff0c\\u652f\\u6301 Google/DeepL/Ollama/OpenAI \\u7b49\\u670d\\u52a1\\uff0c\\u63d0\\u4f9b CLI/GUI/Docker\",\n"," \"stars\": 8902,\n"," \"forks\": 633,\n"," \"today_stars\": 816,\n"," \"language\": \"Python\"\n"," },\n"," {\n"," \"name\": \"bigskysoftware/htmx\",\n"," \"description\": \"htmx - high power tools for HTML\",\n"," \"stars\": 39143,\n"," \"forks\": 1324,\n"," \"today_stars\": 186,\n"," \"language\": \"JavaScript\"\n"," },\n"," {\n"," \"name\": \"commaai/openpilot\",\n"," \"description\": \"openpilot is an operating system for robotics. Currently, it upgrades the driver assistance system on 275+ supported cars.\",\n"," \"stars\": 50945,\n"," \"forks\": 9206,\n"," \"today_stars\": 132,\n"," \"language\": \"Python\"\n"," },\n"," {\n"," \"name\": \"google-gemini/cookbook\",\n"," \"description\": \"Examples and guides for using the Gemini API\",\n"," \"stars\": 8108,\n"," \"forks\": 1011,\n"," \"today_stars\": 1221,\n"," \"language\": \"Jupyter Notebook\"\n"," },\n"," {\n"," \"name\": \"stripe/stripe-ios\",\n"," \"description\": \"Stripe iOS SDK\",\n"," \"stars\": 2179,\n"," \"forks\": 994,\n"," \"today_stars\": 19,\n"," \"language\": \"Swift\"\n"," },\n"," {\n"," \"name\": \"RIOT-OS/RIOT\",\n"," \"description\": \"RIOT - The friendly OS for IoT\",\n"," \"stars\": 5234,\n"," \"forks\": 2017,\n"," \"today_stars\": 168,\n"," \"language\": \"C\"\n"," },\n"," {\n"," \"name\": \"zju3dv/EasyVolcap\",\n"," \"description\": \"EasyVolcap: Accelerating Neural Volumetric Video Research\",\n"," \"stars\": 802,\n"," \"forks\": 52,\n"," \"today_stars\": 30,\n"," \"language\": \"Python\"\n"," },\n"," {\n"," \"name\": \"TEN-framework/TEN-Agent\",\n"," \"description\": \"TEN Agent is a conversational AI powered by TEN, integrating Gemini 2.0 Multimodal Live API, OpenAI Realtime API, RTC, and more. It offers real-time capabilities to see, hear, and speak, along with advanced tools like weather checks, web search, and RAG.\",\n"," \"stars\": 3245,\n"," \"forks\": 313,\n"," \"today_stars\": 296,\n"," \"language\": \"Python\"\n"," },\n"," {\n"," \"name\": \"DS4SD/docling\",\n"," \"description\": \"Get your documents ready for gen AI\",\n"," \"stars\": 15201,\n"," \"forks\": 774,\n"," \"today_stars\": 281,\n"," \"language\": \"Python\"\n"," },\n"," {\n"," \"name\": \"Guovin/iptv-api\",\n"," \"description\": \"\\ud83d\\udcfaIPTV\\u7535\\u89c6\\u76f4\\u64ad\\u6e90\\u66f4\\u65b0\\u5de5\\u5177\\uff1a\\u2728\\u592e\\u89c6\\u9891\\u3001\\ud83d\\udcf1\\u536b\\u89c6\\u3001\\u2615\\u5404\\u7701\\u4efd\\u5730\\u65b9\\u53f0\\u3001\\ud83c\\udf0f\\u6e2f\\u00b7\\u6fb3\\u00b7\\u53f0\\u3001\\ud83c\\udfa5\\u7535\\u5f71\\u3001\\ud83c\\udfae\\u6e38\\u620f\\u3001\\ud83c\\udfb5\\u97f3\\u4e50\\u3001\\ud83c\\udfad\\u7ecf\\u5178\\u5267\\u573a\\uff1b\\u652f\\u6301IPv4/IPv6\\uff1b\\u652f\\u6301\\u81ea\\u5b9a\\u4e49\\u589e\\u52a0\\u9891\\u9053\\uff1b\\u652f\\u6301\\u805a\\u5408\\u6e90\\u3001\\u4ee3\\u7406\\u6e90\\u3001\\u8ba2\\u9605\\u6e90\\u3001\\u5173\\u952e\\u5b57\\u641c\\u7d22\\uff1b\\u6bcf\\u5929\\u81ea\\u52a8\\u66f4\\u65b0\\u4e24\\u6b21\\uff0c\\u7ed3\\u679c\\u53ef\\u7528\\u4e8eTVBox\\u7b49\\u64ad\\u653e\\u8f6f\\u4ef6\\uff1b\\u652f\\u6301\\u5de5\\u4f5c\\u6d41\\u3001Docker(amd64/arm64/arm v7)\\u3001\\u547d\\u4ee4\\u884c\\u3001GUI\\u8fd0\\u884c\\u65b9\\u5f0f | IPTV live TV source update tool\",\n"," \"stars\": 9046,\n"," \"forks\": 1938,\n"," \"today_stars\": 101,\n"," \"language\": \"Python\"\n"," },\n"," {\n"," \"name\": \"fatedier/frp\",\n"," \"description\": \"A fast reverse proxy to help you expose a local server behind a NAT or firewall to the internet.\",\n"," \"stars\": 87828,\n"," \"forks\": 13502,\n"," \"today_stars\": 64,\n"," \"language\": \"Go\"\n"," },\n"," {\n"," \"name\": \"facebookresearch/AnimatedDrawings\",\n"," \"description\": \"Code to accompany \\\"A Method for Animating Children's Drawings of the Human Figure\\\"\",\n"," \"stars\": 10766,\n"," \"forks\": 955,\n"," \"today_stars\": 38,\n"," \"language\": \"Python\"\n"," },\n"," {\n"," \"name\": \"gorilla/websocket\",\n"," \"description\": \"Package gorilla/websocket is a fast, well-tested and widely used WebSocket implementation for Go.\",\n"," \"stars\": 22633,\n"," \"forks\": 3495,\n"," \"today_stars\": 13,\n"," \"language\": \"Go\"\n"," },\n"," {\n"," \"name\": \"DefiLlama/chainlist\",\n"," \"description\": \"NA\",\n"," \"stars\": 2368,\n"," \"forks\": 2476,\n"," \"today_stars\": 5,\n"," \"language\": \"JavaScript\"\n"," },\n"," {\n"," \"name\": \"open-telemetry/opentelemetry-collector\",\n"," \"description\": \"OpenTelemetry Collector\",\n"," \"stars\": 4570,\n"," \"forks\": 1497,\n"," \"today_stars\": 4,\n"," \"language\": \"Go\"\n"," },\n"," {\n"," \"name\": \"RocketChat/Rocket.Chat\",\n"," \"description\": \"The communications platform that puts data protection first.\",\n"," \"stars\": 41169,\n"," \"forks\": 10877,\n"," \"today_stars\": 73,\n"," \"language\": \"TypeScript\"\n"," },\n"," {\n"," \"name\": \"langgenius/dify\",\n"," \"description\": \"Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.\",\n"," \"stars\": 54976,\n"," \"forks\": 8083,\n"," \"today_stars\": 127,\n"," \"language\": \"TypeScript\"\n"," }\n"," ]\n","}\n"]}]},{"cell_type":"markdown","source":["### 💾 Save the output to a `CSV` file"],"metadata":{"id":"2as65QLypwdb"}},{"cell_type":"markdown","source":["Let's create a pandas dataframe and show the table with the extracted content"],"metadata":{"id":"HTLVFgbVLLBR"}},{"cell_type":"code","source":["import pandas as pd\n","\n","# Convert dictionary to DataFrame\n","df = pd.DataFrame(result[\"repositories\"])\n","df"],"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":635},"id":"1lS9O1KOI51y","executionInfo":{"status":"ok","timestamp":1734439642507,"user_tz":-60,"elapsed":262,"user":{"displayName":"ScrapeGraphAI","userId":"10474323355016263615"}},"outputId":"34a068a4-0fd0-47aa-a139-637b803f14f5"},"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":[" name \\\n","0 Byaidu/PDFMathTranslate \n","1 bigskysoftware/htmx \n","2 commaai/openpilot \n","3 google-gemini/cookbook \n","4 stripe/stripe-ios \n","5 RIOT-OS/RIOT \n","6 zju3dv/EasyVolcap \n","7 TEN-framework/TEN-Agent \n","8 DS4SD/docling \n","9 Guovin/iptv-api \n","10 fatedier/frp \n","11 facebookresearch/AnimatedDrawings \n","12 gorilla/websocket \n","13 DefiLlama/chainlist \n","14 open-telemetry/opentelemetry-collector \n","15 RocketChat/Rocket.Chat \n","16 langgenius/dify \n","\n"," description stars forks \\\n","0 PDF scientific paper translation with preserve... 8902 633 \n","1 htmx - high power tools for HTML 39143 1324 \n","2 openpilot is an operating system for robotics.... 50945 9206 \n","3 Examples and guides for using the Gemini API 8108 1011 \n","4 Stripe iOS SDK 2179 994 \n","5 RIOT - The friendly OS for IoT 5234 2017 \n","6 EasyVolcap: Accelerating Neural Volumetric Vid... 802 52 \n","7 TEN Agent is a conversational AI powered by TE... 3245 313 \n","8 Get your documents ready for gen AI 15201 774 \n","9 📺IPTV电视直播源更新工具:✨央视频、📱卫视、☕各省份地方台、🌏港·澳·台、🎥电影、🎮游戏... 9046 1938 \n","10 A fast reverse proxy to help you expose a loca... 87828 13502 \n","11 Code to accompany \"A Method for Animating Chil... 10766 955 \n","12 Package gorilla/websocket is a fast, well-test... 22633 3495 \n","13 NA 2368 2476 \n","14 OpenTelemetry Collector 4570 1497 \n","15 The communications platform that puts data pro... 41169 10877 \n","16 Dify is an open-source LLM app development pla... 54976 8083 \n","\n"," today_stars language \n","0 816 Python \n","1 186 JavaScript \n","2 132 Python \n","3 1221 Jupyter Notebook \n","4 19 Swift \n","5 168 C \n","6 30 Python \n","7 296 Python \n","8 281 Python \n","9 101 Python \n","10 64 Go \n","11 38 Python \n","12 13 Go \n","13 5 JavaScript \n","14 4 Go \n","15 73 TypeScript \n","16 127 TypeScript "],"text/html":["\n"," \n","
\n","\n","
\n"," \n"," \n"," | \n"," name | \n"," description | \n"," stars | \n"," forks | \n"," today_stars | \n"," language | \n","
\n"," \n"," \n"," \n"," 0 | \n"," Byaidu/PDFMathTranslate | \n"," PDF scientific paper translation with preserve... | \n"," 8902 | \n"," 633 | \n"," 816 | \n"," Python | \n","
\n"," \n"," 1 | \n"," bigskysoftware/htmx | \n"," htmx - high power tools for HTML | \n"," 39143 | \n"," 1324 | \n"," 186 | \n"," JavaScript | \n","
\n"," \n"," 2 | \n"," commaai/openpilot | \n"," openpilot is an operating system for robotics.... | \n"," 50945 | \n"," 9206 | \n"," 132 | \n"," Python | \n","
\n"," \n"," 3 | \n"," google-gemini/cookbook | \n"," Examples and guides for using the Gemini API | \n"," 8108 | \n"," 1011 | \n"," 1221 | \n"," Jupyter Notebook | \n","
\n"," \n"," 4 | \n"," stripe/stripe-ios | \n"," Stripe iOS SDK | \n"," 2179 | \n"," 994 | \n"," 19 | \n"," Swift | \n","
\n"," \n"," 5 | \n"," RIOT-OS/RIOT | \n"," RIOT - The friendly OS for IoT | \n"," 5234 | \n"," 2017 | \n"," 168 | \n"," C | \n","
\n"," \n"," 6 | \n"," zju3dv/EasyVolcap | \n"," EasyVolcap: Accelerating Neural Volumetric Vid... | \n"," 802 | \n"," 52 | \n"," 30 | \n"," Python | \n","
\n"," \n"," 7 | \n"," TEN-framework/TEN-Agent | \n"," TEN Agent is a conversational AI powered by TE... | \n"," 3245 | \n"," 313 | \n"," 296 | \n"," Python | \n","
\n"," \n"," 8 | \n"," DS4SD/docling | \n"," Get your documents ready for gen AI | \n"," 15201 | \n"," 774 | \n"," 281 | \n"," Python | \n","
\n"," \n"," 9 | \n"," Guovin/iptv-api | \n"," 📺IPTV电视直播源更新工具:✨央视频、📱卫视、☕各省份地方台、🌏港·澳·台、🎥电影、🎮游戏... | \n"," 9046 | \n"," 1938 | \n"," 101 | \n"," Python | \n","
\n"," \n"," 10 | \n"," fatedier/frp | \n"," A fast reverse proxy to help you expose a loca... | \n"," 87828 | \n"," 13502 | \n"," 64 | \n"," Go | \n","
\n"," \n"," 11 | \n"," facebookresearch/AnimatedDrawings | \n"," Code to accompany \"A Method for Animating Chil... | \n"," 10766 | \n"," 955 | \n"," 38 | \n"," Python | \n","
\n"," \n"," 12 | \n"," gorilla/websocket | \n"," Package gorilla/websocket is a fast, well-test... | \n"," 22633 | \n"," 3495 | \n"," 13 | \n"," Go | \n","
\n"," \n"," 13 | \n"," DefiLlama/chainlist | \n"," NA | \n"," 2368 | \n"," 2476 | \n"," 5 | \n"," JavaScript | \n","
\n"," \n"," 14 | \n"," open-telemetry/opentelemetry-collector | \n"," OpenTelemetry Collector | \n"," 4570 | \n"," 1497 | \n"," 4 | \n"," Go | \n","
\n"," \n"," 15 | \n"," RocketChat/Rocket.Chat | \n"," The communications platform that puts data pro... | \n"," 41169 | \n"," 10877 | \n"," 73 | \n"," TypeScript | \n","
\n"," \n"," 16 | \n"," langgenius/dify | \n"," Dify is an open-source LLM app development pla... | \n"," 54976 | \n"," 8083 | \n"," 127 | \n"," TypeScript | \n","
\n"," \n","
\n","
\n","
\n","
\n"],"application/vnd.google.colaboratory.intrinsic+json":{"type":"dataframe","variable_name":"df","summary":"{\n \"name\": \"df\",\n \"rows\": 17,\n \"fields\": [\n {\n \"column\": \"name\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 17,\n \"samples\": [\n \"Byaidu/PDFMathTranslate\",\n \"bigskysoftware/htmx\",\n \"RIOT-OS/RIOT\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"description\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 17,\n \"samples\": [\n \"PDF scientific paper translation with preserved formats - \\u57fa\\u4e8e AI \\u5b8c\\u6574\\u4fdd\\u7559\\u6392\\u7248\\u7684 PDF \\u6587\\u6863\\u5168\\u6587\\u53cc\\u8bed\\u7ffb\\u8bd1\\uff0c\\u652f\\u6301 Google/DeepL/Ollama/OpenAI \\u7b49\\u670d\\u52a1\\uff0c\\u63d0\\u4f9b CLI/GUI/Docker\",\n \"htmx - high power tools for HTML\",\n \"RIOT - The friendly OS for IoT\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"stars\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 24731,\n \"min\": 802,\n \"max\": 87828,\n \"num_unique_values\": 17,\n \"samples\": [\n 8902,\n 39143,\n 5234\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"forks\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 4176,\n \"min\": 52,\n \"max\": 13502,\n \"num_unique_values\": 17,\n \"samples\": [\n 633,\n 1324,\n 2017\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"today_stars\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 325,\n \"min\": 4,\n \"max\": 1221,\n \"num_unique_values\": 17,\n \"samples\": [\n 816,\n 186,\n 168\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"language\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 7,\n \"samples\": [\n \"Python\",\n \"JavaScript\",\n \"Go\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}"}},"metadata":{},"execution_count":38}]},{"cell_type":"markdown","source":["Save it to CSV"],"metadata":{"id":"v0CBYVk7qA5Z"}},{"cell_type":"code","source":["# Save the DataFrame to a CSV file\n","csv_file = \"trending_repositories.csv\"\n","df.to_csv(csv_file, index=False)\n","print(f\"Data saved to {csv_file}\")"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"BtEbB9pmQGhO","executionInfo":{"status":"ok","timestamp":1734439655791,"user_tz":-60,"elapsed":303,"user":{"displayName":"ScrapeGraphAI","userId":"10474323355016263615"}},"outputId":"d2ec6dac-395f-4ddb-ad49-4bd672a04e5b"},"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["Data saved to trending_repositories.csv\n"]}]},{"cell_type":"markdown","source":["## 🔗 Resources"],"metadata":{"id":"-1SZT8VzTZNd"}},{"cell_type":"markdown","source":["\n","\n","
\n","
\n","\n","\n","- 🚀 **Get your API Key:** [ScrapeGraphAI Dashboard](https://dashboard.scrapegraphai.com) \n","- 🐙 **GitHub:** [ScrapeGraphAI GitHub](https://github.com/scrapegraphai) \n","- 💼 **LinkedIn:** [ScrapeGraphAI LinkedIn](https://www.linkedin.com/company/scrapegraphai/) \n","- 🐦 **Twitter:** [ScrapeGraphAI Twitter](https://twitter.com/scrapegraphai) \n","- 💬 **Discord:** [Join our Discord Community](https://discord.gg/uJN7TYcpNa) \n","\n","Made with ❤️ by the [ScrapeGraphAI](https://scrapegraphai.com) Team \n"],"metadata":{"id":"dUi2LtMLRDDR"}}]}
\ No newline at end of file
diff --git a/cookbook/homes-forsale/scrapegraph_llama_index.ipynb b/cookbook/homes-forsale/scrapegraph_llama_index.ipynb
new file mode 100644
index 0000000..634fca0
--- /dev/null
+++ b/cookbook/homes-forsale/scrapegraph_llama_index.ipynb
@@ -0,0 +1,799 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "ReBHQ5_834pZ"
+ },
+ "source": [
+ "\n",
+ "
\n",
+ ""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "jEkuKbcRrPcK"
+ },
+ "source": [
+ "## 🕷️ Extract Houses Listing on Zillow with llama-index and ScrapegraphAI APIs"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "5cVkde_LpVkF"
+ },
+ "source": [
+ ""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "IzsyDXEWwPVt"
+ },
+ "source": [
+ "### 🔧 Install `dependencies`"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "os_vm0MkIxr9"
+ },
+ "outputs": [],
+ "source": [
+ "%%capture\n",
+ "!pip install llama-index\n",
+ "!pip install llama-index-tools-scrapegraphai"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "apBsL-L2KzM7"
+ },
+ "source": [
+ "### 🔑 Import `ScrapeGraph` API key"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "ol9gQbAFkh9b"
+ },
+ "source": [
+ "You can find the Scrapegraph API key [here](https://dashboard.scrapegraphai.com/)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "sffqFG2EJ8bI",
+ "outputId": "c588274d-64ce-4d13-f12d-0458d9c4839d"
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "SGAI_API_KEY not found in environment.\n",
+ "SGAI_API_KEY has been set in the environment.\n"
+ ]
+ }
+ ],
+ "source": [
+ "import os\n",
+ "from getpass import getpass\n",
+ "\n",
+ "# Check if the API key is already set in the environment\n",
+ "sgai_api_key = os.getenv(\"SGAI_API_KEY\")\n",
+ "\n",
+ "if sgai_api_key:\n",
+ " print(\"SGAI_API_KEY found in environment.\")\n",
+ "else:\n",
+ " print(\"SGAI_API_KEY not found in environment.\")\n",
+ " # Prompt the user to input the API key securely (hidden input)\n",
+ " sgai_api_key = getpass(\"Please enter your SGAI_API_KEY: \").strip()\n",
+ " if sgai_api_key:\n",
+ " # Set the API key in the environment\n",
+ " os.environ[\"SGAI_API_KEY\"] = sgai_api_key\n",
+ " print(\"SGAI_API_KEY has been set in the environment.\")\n",
+ " else:\n",
+ " print(\"No API key entered. Please set the API key to continue.\")\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "jnqMB2-xVYQ7"
+ },
+ "source": [
+ "### 📝 Defining an `Output Schema` for Webpage Content Extraction\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "VZvxbjfXvbgd"
+ },
+ "source": [
+ "If you already know what you want to extract from a webpage, you can **define an output schema** using **Pydantic**. This schema acts as a \"blueprint\" that tells the AI how to structure the response.\n",
+ "\n",
+ "\n",
+ " Pydantic Schema Quick Guide
\n",
+ "\n",
+ "Types of Schemas \n",
+ "\n",
+ "1. Simple Schema \n",
+ "Use this when you want to extract straightforward information, such as a single piece of content. \n",
+ "\n",
+ "```python\n",
+ "from pydantic import BaseModel, Field\n",
+ "\n",
+ "# Simple schema for a single webpage\n",
+ "class PageInfoSchema(BaseModel):\n",
+ " title: str = Field(description=\"The title of the webpage\")\n",
+ " description: str = Field(description=\"The description of the webpage\")\n",
+ "\n",
+ "# Example Output JSON after AI extraction\n",
+ "{\n",
+ " \"title\": \"ScrapeGraphAI: The Best Content Extraction Tool\",\n",
+ " \"description\": \"ScrapeGraphAI provides powerful tools for structured content extraction from websites.\"\n",
+ "}\n",
+ "```\n",
+ "\n",
+ "2. Complex Schema (Nested) \n",
+ "If you need to extract structured information with multiple related items (like a list of repositories), you can **nest schemas**.\n",
+ "\n",
+ "```python\n",
+ "from pydantic import BaseModel, Field\n",
+ "from typing import List\n",
+ "\n",
+ "# Define a schema for a single repository\n",
+ "class RepositorySchema(BaseModel):\n",
+ " name: str = Field(description=\"Name of the repository (e.g., 'owner/repo')\")\n",
+ " description: str = Field(description=\"Description of the repository\")\n",
+ " stars: int = Field(description=\"Star count of the repository\")\n",
+ " forks: int = Field(description=\"Fork count of the repository\")\n",
+ " today_stars: int = Field(description=\"Stars gained today\")\n",
+ " language: str = Field(description=\"Programming language used\")\n",
+ "\n",
+ "# Define a schema for a list of repositories\n",
+ "class ListRepositoriesSchema(BaseModel):\n",
+ " repositories: List[RepositorySchema] = Field(description=\"List of GitHub trending repositories\")\n",
+ "\n",
+ "# Example Output JSON after AI extraction\n",
+ "{\n",
+ " \"repositories\": [\n",
+ " {\n",
+ " \"name\": \"google-gemini/cookbook\",\n",
+ " \"description\": \"Examples and guides for using the Gemini API\",\n",
+ " \"stars\": 8036,\n",
+ " \"forks\": 1001,\n",
+ " \"today_stars\": 649,\n",
+ " \"language\": \"Jupyter Notebook\"\n",
+ " },\n",
+ " {\n",
+ " \"name\": \"TEN-framework/TEN-Agent\",\n",
+ " \"description\": \"TEN Agent is a conversational AI powered by TEN, integrating Gemini 2.0 Multimodal Live API, OpenAI Realtime API, RTC, and more.\",\n",
+ " \"stars\": 3224,\n",
+ " \"forks\": 311,\n",
+ " \"today_stars\": 361,\n",
+ " \"language\": \"Python\"\n",
+ " }\n",
+ " ]\n",
+ "}\n",
+ "```\n",
+ "\n",
+ "Key Takeaways \n",
+ "- **Simple Schema**: Perfect for small, straightforward extractions. \n",
+ "- **Complex Schema**: Use nesting to extract lists or structured data, like \"a list of repositories.\" \n",
+ "\n",
+ "Both approaches give the AI a clear structure to follow, ensuring that the extracted content matches exactly what you need.\n",
+ " \n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "dlrOEgZk_8V4"
+ },
+ "outputs": [],
+ "source": [
+ "from pydantic import BaseModel, Field\n",
+ "from typing import List\n",
+ "\n",
+ "class HouseListingSchema(BaseModel):\n",
+ " price: int = Field(description=\"Price of the house in USD\")\n",
+ " bedrooms: int = Field(description=\"Number of bedrooms\")\n",
+ " bathrooms: int = Field(description=\"Number of bathrooms\")\n",
+ " square_feet: int = Field(description=\"Total square footage of the house\")\n",
+ " address: str = Field(description=\"Address of the house\")\n",
+ " city: str = Field(description=\"City where the house is located\")\n",
+ " state: str = Field(description=\"State where the house is located\")\n",
+ " zip_code: str = Field(description=\"ZIP code of the house location\")\n",
+ " tags: List[str] = Field(description=\"Tags like 'New construction' or 'Large garage'\")\n",
+ " agent_name: str = Field(description=\"Name of the listing agent\")\n",
+ " agency: str = Field(description=\"Agency listing the house\")\n",
+ "\n",
+ "# Schema containing a list of house listings\n",
+ "class HousesListingsSchema(BaseModel):\n",
+ " houses: List[HouseListingSchema] = Field(description=\"List of house listings on Zillow or similar platforms\")\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "cDGH0b2DkY63"
+ },
+ "source": [
+ "### 🚀 Initialize `ScrapegraphToolSpec` tools and start extraction"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "M1KSXffZopUD"
+ },
+ "source": [
+ "Here we use `SmartScraperTool` to extract structured data using AI from a webpage.\n",
+ "\n",
+ "\n",
+ "> If you already have an HTML file, you can upload it and use `LocalScraperTool` instead.\n",
+ "\n",
+ "You can find more info in the [official langchain documentation](https://python.langchain.com/docs/integrations/tools/scrapegraph/)\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "p2BFhL53ore1"
+ },
+ "outputs": [],
+ "source": [
+ "from llama_index.tools.scrapegraph.base import ScrapegraphToolSpec\n",
+ "\n",
+ "scrapegraph_tool = ScrapegraphToolSpec()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "iCOcvpuOoubk"
+ },
+ "source": [
+ "`Invoke` the tool"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "2FIKomclLNFx"
+ },
+ "outputs": [],
+ "source": [
+ "response = scrapegraph_tool.scrapegraph_smartscraper(\n",
+ " prompt=\"Extract information about houses for sale\",\n",
+ " url=\"https://www.zillow.com/san-francisco-ca/\",\n",
+ " api_key=os.getenv(\"SGAI_API_KEY\"),\n",
+ " schema=HousesListingsSchema,\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "gR2UZZwzo9Sn"
+ },
+ "source": [
+ "> As you may have noticed, we are not passing the `llm_output_schema` while invoking the tool, this will make life easier to `AI agents` since they will not need to generate one themselves with high risk of failure. Instead, we force the tool to return always a structured output that follows your previously defined schema. To find out more, check the following [README](https://github.com/ScrapeGraphAI/langchain-scrapegraph)\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "YZz1bqCIpoL8"
+ },
+ "source": [
+ "Print the response"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "F1VfD8B4LPc8",
+ "outputId": "00597cd7-bac8-4af1-a5fe-d88d2f0ffa8d"
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Trending Repositories:\n",
+ "{\n",
+ " \"request_id\": \"628bdf64-26f9-486a-9f2f-f3b5ac9c0421\",\n",
+ " \"status\": \"completed\",\n",
+ " \"website_url\": \"https://www.zillow.com/san-francisco-ca/\",\n",
+ " \"user_prompt\": \"Extract information about houses for sale\",\n",
+ " \"result\": {\n",
+ " \"houses\": [\n",
+ " {\n",
+ " \"price\": 449000,\n",
+ " \"bedrooms\": 3,\n",
+ " \"bathrooms\": 3,\n",
+ " \"square_feet\": 2166,\n",
+ " \"address\": \"3229 Jennings St\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94124\",\n",
+ " \"tags\": [],\n",
+ " \"agent_name\": \"Michelle K. Pender\",\n",
+ " \"agency\": \"ENGEL & VOELKERS SAN FRANCISCO\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 950000,\n",
+ " \"bedrooms\": 2,\n",
+ " \"bathrooms\": 2,\n",
+ " \"square_feet\": 1686,\n",
+ " \"address\": \"401 Huron Ave\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94112\",\n",
+ " \"tags\": [\n",
+ " \"Cozy fireplace\"\n",
+ " ],\n",
+ " \"agent_name\": \"Allison Chapleau\",\n",
+ " \"agency\": \"COMPASS\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 207555,\n",
+ " \"bedrooms\": 1,\n",
+ " \"bathrooms\": 1,\n",
+ " \"square_feet\": 1593,\n",
+ " \"address\": \"2040 Fell St APT 10\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94117\",\n",
+ " \"tags\": [],\n",
+ " \"agent_name\": \"Trista Elizabeth Bernasconi\",\n",
+ " \"agency\": \"COMPASS\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 795000,\n",
+ " \"bedrooms\": 4,\n",
+ " \"bathrooms\": 2,\n",
+ " \"square_feet\": 2000,\n",
+ " \"address\": \"515 Athens St\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94112\",\n",
+ " \"tags\": [\n",
+ " \"Level fenced rear yard\"\n",
+ " ],\n",
+ " \"agent_name\": \"Darin J. Holwitz\",\n",
+ " \"agency\": \"COMPASS\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 599000,\n",
+ " \"bedrooms\": 1,\n",
+ " \"bathrooms\": 1,\n",
+ " \"square_feet\": 0,\n",
+ " \"address\": \"380 Dolores St #6\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94114\",\n",
+ " \"tags\": [],\n",
+ " \"agent_name\": \"Melody A. Hultgren\",\n",
+ " \"agency\": \"NOVA REAL ESTATE\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 875000,\n",
+ " \"bedrooms\": 2,\n",
+ " \"bathrooms\": 2,\n",
+ " \"square_feet\": 907,\n",
+ " \"address\": \"426 Fillmore St #A\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94117\",\n",
+ " \"tags\": [\n",
+ " \"Sleek finishes\"\n",
+ " ],\n",
+ " \"agent_name\": \"NA\",\n",
+ " \"agency\": \"NA\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 335512,\n",
+ " \"bedrooms\": 2,\n",
+ " \"bathrooms\": 2,\n",
+ " \"square_feet\": 886,\n",
+ " \"address\": \"1688 Pine St #101\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94109\",\n",
+ " \"tags\": [],\n",
+ " \"agent_name\": \"Trista Elizabeth Bernasconi\",\n",
+ " \"agency\": \"COMPASS\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 899000,\n",
+ " \"bedrooms\": 4,\n",
+ " \"bathrooms\": 2,\n",
+ " \"square_feet\": 1680,\n",
+ " \"address\": \"351 Chenery St\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94131\",\n",
+ " \"tags\": [\n",
+ " \"South-facing panoramic views\"\n",
+ " ],\n",
+ " \"agent_name\": \"Easton S. Thodos\",\n",
+ " \"agency\": \"THESEUS REAL ESTATE\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 155659,\n",
+ " \"bedrooms\": 0,\n",
+ " \"bathrooms\": 1,\n",
+ " \"square_feet\": 514,\n",
+ " \"address\": \"52 Kirkwood Ave #203\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94124\",\n",
+ " \"tags\": [\n",
+ " \"Modern cabinetry\"\n",
+ " ],\n",
+ " \"agent_name\": \"Lynn Anne Bell\",\n",
+ " \"agency\": \"CHRISTIE'S INT'L R.E. SF\"\n",
+ " }\n",
+ " ]\n",
+ " },\n",
+ " \"error\": \"\"\n",
+ "}\n"
+ ]
+ }
+ ],
+ "source": [
+ "import json\n",
+ "\n",
+ "print(\"Trending Repositories:\")\n",
+ "print(json.dumps(response, indent=2))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "2as65QLypwdb"
+ },
+ "source": [
+ "### 💾 Save the output to a `CSV` file"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "HTLVFgbVLLBR"
+ },
+ "source": [
+ "Let's create a pandas dataframe and show the table with the extracted content"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 488
+ },
+ "id": "1lS9O1KOI51y",
+ "outputId": "16c95c43-2312-4c08-9d29-b3b95de080c9"
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " price | \n",
+ " bedrooms | \n",
+ " bathrooms | \n",
+ " square_feet | \n",
+ " address | \n",
+ " city | \n",
+ " state | \n",
+ " zip_code | \n",
+ " tags | \n",
+ " agent_name | \n",
+ " agency | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 | \n",
+ " 449000 | \n",
+ " 3 | \n",
+ " 3 | \n",
+ " 2166 | \n",
+ " 3229 Jennings St | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94124 | \n",
+ " [] | \n",
+ " Michelle K. Pender | \n",
+ " ENGEL & VOELKERS SAN FRANCISCO | \n",
+ "
\n",
+ " \n",
+ " 1 | \n",
+ " 950000 | \n",
+ " 2 | \n",
+ " 2 | \n",
+ " 1686 | \n",
+ " 401 Huron Ave | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94112 | \n",
+ " [Cozy fireplace] | \n",
+ " Allison Chapleau | \n",
+ " COMPASS | \n",
+ "
\n",
+ " \n",
+ " 2 | \n",
+ " 207555 | \n",
+ " 1 | \n",
+ " 1 | \n",
+ " 1593 | \n",
+ " 2040 Fell St APT 10 | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94117 | \n",
+ " [] | \n",
+ " Trista Elizabeth Bernasconi | \n",
+ " COMPASS | \n",
+ "
\n",
+ " \n",
+ " 3 | \n",
+ " 795000 | \n",
+ " 4 | \n",
+ " 2 | \n",
+ " 2000 | \n",
+ " 515 Athens St | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94112 | \n",
+ " [Level fenced rear yard] | \n",
+ " Darin J. Holwitz | \n",
+ " COMPASS | \n",
+ "
\n",
+ " \n",
+ " 4 | \n",
+ " 599000 | \n",
+ " 1 | \n",
+ " 1 | \n",
+ " 0 | \n",
+ " 380 Dolores St #6 | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94114 | \n",
+ " [] | \n",
+ " Melody A. Hultgren | \n",
+ " NOVA REAL ESTATE | \n",
+ "
\n",
+ " \n",
+ " 5 | \n",
+ " 875000 | \n",
+ " 2 | \n",
+ " 2 | \n",
+ " 907 | \n",
+ " 426 Fillmore St #A | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94117 | \n",
+ " [Sleek finishes] | \n",
+ " NA | \n",
+ " NA | \n",
+ "
\n",
+ " \n",
+ " 6 | \n",
+ " 335512 | \n",
+ " 2 | \n",
+ " 2 | \n",
+ " 886 | \n",
+ " 1688 Pine St #101 | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94109 | \n",
+ " [] | \n",
+ " Trista Elizabeth Bernasconi | \n",
+ " COMPASS | \n",
+ "
\n",
+ " \n",
+ " 7 | \n",
+ " 899000 | \n",
+ " 4 | \n",
+ " 2 | \n",
+ " 1680 | \n",
+ " 351 Chenery St | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94131 | \n",
+ " [South-facing panoramic views] | \n",
+ " Easton S. Thodos | \n",
+ " THESEUS REAL ESTATE | \n",
+ "
\n",
+ " \n",
+ " 8 | \n",
+ " 155659 | \n",
+ " 0 | \n",
+ " 1 | \n",
+ " 514 | \n",
+ " 52 Kirkwood Ave #203 | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94124 | \n",
+ " [Modern cabinetry] | \n",
+ " Lynn Anne Bell | \n",
+ " CHRISTIE'S INT'L R.E. SF | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " price bedrooms bathrooms square_feet address \\\n",
+ "0 449000 3 3 2166 3229 Jennings St \n",
+ "1 950000 2 2 1686 401 Huron Ave \n",
+ "2 207555 1 1 1593 2040 Fell St APT 10 \n",
+ "3 795000 4 2 2000 515 Athens St \n",
+ "4 599000 1 1 0 380 Dolores St #6 \n",
+ "5 875000 2 2 907 426 Fillmore St #A \n",
+ "6 335512 2 2 886 1688 Pine St #101 \n",
+ "7 899000 4 2 1680 351 Chenery St \n",
+ "8 155659 0 1 514 52 Kirkwood Ave #203 \n",
+ "\n",
+ " city state zip_code tags \\\n",
+ "0 San Francisco CA 94124 [] \n",
+ "1 San Francisco CA 94112 [Cozy fireplace] \n",
+ "2 San Francisco CA 94117 [] \n",
+ "3 San Francisco CA 94112 [Level fenced rear yard] \n",
+ "4 San Francisco CA 94114 [] \n",
+ "5 San Francisco CA 94117 [Sleek finishes] \n",
+ "6 San Francisco CA 94109 [] \n",
+ "7 San Francisco CA 94131 [South-facing panoramic views] \n",
+ "8 San Francisco CA 94124 [Modern cabinetry] \n",
+ "\n",
+ " agent_name agency \n",
+ "0 Michelle K. Pender ENGEL & VOELKERS SAN FRANCISCO \n",
+ "1 Allison Chapleau COMPASS \n",
+ "2 Trista Elizabeth Bernasconi COMPASS \n",
+ "3 Darin J. Holwitz COMPASS \n",
+ "4 Melody A. Hultgren NOVA REAL ESTATE \n",
+ "5 NA NA \n",
+ "6 Trista Elizabeth Bernasconi COMPASS \n",
+ "7 Easton S. Thodos THESEUS REAL ESTATE \n",
+ "8 Lynn Anne Bell CHRISTIE'S INT'L R.E. SF "
+ ]
+ },
+ "execution_count": 10,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "import pandas as pd\n",
+ "\n",
+ "# Convert dictionary to DataFrame\n",
+ "df = pd.DataFrame(response[\"result\"][\"houses\"])\n",
+ "df"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "v0CBYVk7qA5Z"
+ },
+ "source": [
+ "Save it to CSV"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "BtEbB9pmQGhO"
+ },
+ "outputs": [],
+ "source": [
+ "# Save the DataFrame to a CSV file\n",
+ "csv_file = \"zillow_forsale.csv\"\n",
+ "df.to_csv(csv_file, index=False)\n",
+ "print(f\"Data saved to {csv_file}\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "-1SZT8VzTZNd"
+ },
+ "source": [
+ "## 🔗 Resources"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "dUi2LtMLRDDR"
+ },
+ "source": [
+ "\n",
+ "\n",
+ "
\n",
+ "
\n",
+ "\n",
+ "\n",
+ "- 🚀 **Get your API Key:** [ScrapeGraphAI Dashboard](https://dashboard.scrapegraphai.com) \n",
+ "- 🐙 **GitHub:** [ScrapeGraphAI GitHub](https://github.com/scrapegraphai) \n",
+ "- 💼 **LinkedIn:** [ScrapeGraphAI LinkedIn](https://www.linkedin.com/company/scrapegraphai/) \n",
+ "- 🐦 **Twitter:** [ScrapeGraphAI Twitter](https://twitter.com/scrapegraphai) \n",
+ "- 💬 **Discord:** [Join our Discord Community](https://discord.gg/uJN7TYcpNa) \n",
+ "- 🦙 **LlamaIndex:** [ScrapeGraph docs](https://docs.llamaindex.ai/en/stable/api_reference/tools/scrapegraph/)\n",
+ "\n",
+ "Made with ❤️ by the [ScrapeGraphAI](https://scrapegraphai.com) Team \n"
+ ]
+ }
+ ],
+ "metadata": {
+ "colab": {
+ "provenance": []
+ },
+ "kernelspec": {
+ "display_name": "Python 3",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.10.14"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}
\ No newline at end of file
diff --git a/cookbook/homes-forsale/scrapegraph_sdk.ipynb b/cookbook/homes-forsale/scrapegraph_sdk.ipynb
deleted file mode 100644
index 2d9dd08..0000000
--- a/cookbook/homes-forsale/scrapegraph_sdk.ipynb
+++ /dev/null
@@ -1 +0,0 @@
-{"cells":[{"cell_type":"markdown","metadata":{"id":"ReBHQ5_834pZ"},"source":["\n","
\n",""]},{"cell_type":"markdown","metadata":{"id":"jEkuKbcRrPcK"},"source":["## 🕷️ Extract Houses Listing with Official Scrapegraph SDK"]},{"cell_type":"markdown","metadata":{"id":"8vZBkAWLq9C1"},"source":[""]},{"cell_type":"markdown","metadata":{"id":"IzsyDXEWwPVt"},"source":["### 🔧 Install `dependencies`"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"os_vm0MkIxr9"},"outputs":[],"source":["%%capture\n","!pip install scrapegraph-py"]},{"cell_type":"markdown","metadata":{"id":"apBsL-L2KzM7"},"source":["### 🔑 Import `ScrapeGraph` API key"]},{"cell_type":"markdown","metadata":{"id":"ol9gQbAFkh9b"},"source":["You can find the Scrapegraph API key [here](https://dashboard.scrapegraphai.com/)"]},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"sffqFG2EJ8bI","outputId":"18dfce64-db37-4825-d316-fabd064100d0"},"outputs":[{"name":"stdout","output_type":"stream","text":["SGAI_API_KEY found in environment.\n"]}],"source":["import getpass\n","import os\n","\n","if not os.environ.get(\"SGAI_API_KEY\"):\n"," os.environ[\"SGAI_API_KEY\"] = getpass.getpass(\"Scrapegraph API key:\\n\")"]},{"cell_type":"markdown","metadata":{"id":"jnqMB2-xVYQ7"},"source":["### 📝 Defining an `Output Schema` for Webpage Content Extraction\n"]},{"cell_type":"markdown","metadata":{"id":"VZvxbjfXvbgd"},"source":["If you already know what you want to extract from a webpage, you can **define an output schema** using **Pydantic**. This schema acts as a \"blueprint\" that tells the AI how to structure the response.\n","\n","\n"," Pydantic Schema Quick Guide
\n","\n","Types of Schemas \n","\n","1. Simple Schema \n","Use this when you want to extract straightforward information, such as a single piece of content. \n","\n","```python\n","from pydantic import BaseModel, Field\n","\n","# Simple schema for a single webpage\n","class PageInfoSchema(BaseModel):\n"," title: str = Field(description=\"The title of the webpage\")\n"," description: str = Field(description=\"The description of the webpage\")\n","\n","# Example Output JSON after AI extraction\n","{\n"," \"title\": \"ScrapeGraphAI: The Best Content Extraction Tool\",\n"," \"description\": \"ScrapeGraphAI provides powerful tools for structured content extraction from websites.\"\n","}\n","```\n","\n","2. Complex Schema (Nested) \n","If you need to extract structured information with multiple related items (like a list of repositories), you can **nest schemas**.\n","\n","```python\n","from pydantic import BaseModel, Field\n","from typing import List\n","\n","# Define a schema for a single repository\n","class RepositorySchema(BaseModel):\n"," name: str = Field(description=\"Name of the repository (e.g., 'owner/repo')\")\n"," description: str = Field(description=\"Description of the repository\")\n"," stars: int = Field(description=\"Star count of the repository\")\n"," forks: int = Field(description=\"Fork count of the repository\")\n"," today_stars: int = Field(description=\"Stars gained today\")\n"," language: str = Field(description=\"Programming language used\")\n","\n","# Define a schema for a list of repositories\n","class ListRepositoriesSchema(BaseModel):\n"," repositories: List[RepositorySchema] = Field(description=\"List of GitHub trending repositories\")\n","\n","# Example Output JSON after AI extraction\n","{\n"," \"repositories\": [\n"," {\n"," \"name\": \"google-gemini/cookbook\",\n"," \"description\": \"Examples and guides for using the Gemini API\",\n"," \"stars\": 8036,\n"," \"forks\": 1001,\n"," \"today_stars\": 649,\n"," \"language\": \"Jupyter Notebook\"\n"," },\n"," {\n"," \"name\": \"TEN-framework/TEN-Agent\",\n"," \"description\": \"TEN Agent is a conversational AI powered by TEN, integrating Gemini 2.0 Multimodal Live API, OpenAI Realtime API, RTC, and more.\",\n"," \"stars\": 3224,\n"," \"forks\": 311,\n"," \"today_stars\": 361,\n"," \"language\": \"Python\"\n"," }\n"," ]\n","}\n","```\n","\n","Key Takeaways \n","- **Simple Schema**: Perfect for small, straightforward extractions. \n","- **Complex Schema**: Use nesting to extract lists or structured data, like \"a list of repositories.\" \n","\n","Both approaches give the AI a clear structure to follow, ensuring that the extracted content matches exactly what you need.\n"," \n"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"dlrOEgZk_8V4"},"outputs":[],"source":["from pydantic import BaseModel, Field\n","from typing import List, Optional\n","\n","# Schema for a single house listing\n","class HouseSchema(BaseModel):\n"," price: int = Field(description=\"Price of the house in USD\")\n"," bedrooms: int = Field(description=\"Number of bedrooms\")\n"," bathrooms: int = Field(description=\"Number of bathrooms\")\n"," square_feet: int = Field(description=\"Total square footage of the house\")\n"," address: str = Field(description=\"Address of the house\")\n"," city: str = Field(description=\"City where the house is located\")\n"," state: str = Field(description=\"State where the house is located\")\n"," zip_code: str = Field(description=\"ZIP code of the house location\")\n"," tags: List[str] = Field(description=\"Tags like 'New construction' or 'Large garage'\")\n"," agent_name: str = Field(description=\"Name of the listing agent. If not present or not sure write NA.\")\n"," agency: str = Field(description=\"Agency listing the house. If not present or not sure write NA.\")\n","\n","# Schema containing a list of house listings\n","class HouseListingsSchema(BaseModel):\n"," houses: List[HouseSchema] = Field(description=\"List of house listings on Homes or similar platforms\")\n"]},{"cell_type":"markdown","metadata":{"id":"cDGH0b2DkY63"},"source":["### 🚀 Initialize `SGAI Client` and start extraction"]},{"cell_type":"markdown","metadata":{"id":"4SLJgXgcob6L"},"source":["Initialize the client for scraping (there's also an async version [here](https://github.com/ScrapeGraphAI/scrapegraph-sdk/blob/main/scrapegraph-py/examples/async_smartscraper_example.py))"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"PQI25GZvoCSk"},"outputs":[],"source":["from scrapegraph_py import Client\n","\n","# Initialize the client with explicit API key\n","sgai_client = Client(api_key=sgai_api_key, timeout=240)"]},{"cell_type":"markdown","metadata":{"id":"M1KSXffZopUD"},"source":["Here we use `Smartscraper` service to extract structured data using AI from a webpage.\n","\n","\n","> If you already have an HTML file, you can upload it and use `Localscraper` instead.\n","\n","\n","\n"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"2FIKomclLNFx"},"outputs":[],"source":["# Request for Trending Repositories\n","repo_response = sgai_client.smartscraper(\n"," website_url=\"https://www.homes.com/san-francisco-ca/?bb=nzpwspy0mS749snkvsb\",\n"," user_prompt=\"Extract info about the houses visible on the page\",\n"," output_schema=HouseListingsSchema,\n",")"]},{"cell_type":"markdown","metadata":{"id":"YZz1bqCIpoL8"},"source":["Print the response"]},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"F1VfD8B4LPc8","outputId":"1e849a65-6713-486c-e306-bb7c26db4bf9"},"outputs":[{"name":"stdout","output_type":"stream","text":["Request ID: 4e023916-2a41-40ea-bea5-efc422daf33e\n","{\n"," \"houses\": [\n"," {\n"," \"price\": 549000,\n"," \"bedrooms\": 1,\n"," \"bathrooms\": 1,\n"," \"square_feet\": 477,\n"," \"address\": \"380 14th St Unit 405\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94103\",\n"," \"tags\": [\n"," \"New construction\"\n"," ],\n"," \"agent_name\": \"Eddie O'Sullivan\",\n"," \"agency\": \"Elevation Real Estate\"\n"," },\n"," {\n"," \"price\": 1799000,\n"," \"bedrooms\": 4,\n"," \"bathrooms\": 2,\n"," \"square_feet\": 2735,\n"," \"address\": \"123 Grattan St\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94117\",\n"," \"tags\": [],\n"," \"agent_name\": \"Sean Engmann\",\n"," \"agency\": \"eXp Realty of Northern CA Inc.\"\n"," },\n"," {\n"," \"price\": 1995000,\n"," \"bedrooms\": 7,\n"," \"bathrooms\": 3,\n"," \"square_feet\": 3330,\n"," \"address\": \"1590 Washington St\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94109\",\n"," \"tags\": [],\n"," \"agent_name\": \"Eddie O'Sullivan\",\n"," \"agency\": \"Elevation Real Estate\"\n"," },\n"," {\n"," \"price\": 549000,\n"," \"bedrooms\": 0,\n"," \"bathrooms\": 1,\n"," \"square_feet\": 477,\n"," \"address\": \"240 Lombard St Unit 835\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94111\",\n"," \"tags\": [],\n"," \"agent_name\": \"Tim Gullicksen\",\n"," \"agency\": \"Corcoran Icon Properties\"\n"," },\n"," {\n"," \"price\": 5495000,\n"," \"bedrooms\": 10,\n"," \"bathrooms\": 7,\n"," \"square_feet\": 6505,\n"," \"address\": \"1057 Steiner St\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94115\",\n"," \"tags\": [],\n"," \"agent_name\": \"Bonnie Spindler\",\n"," \"agency\": \"Corcoran Icon Properties\"\n"," },\n"," {\n"," \"price\": 925000,\n"," \"bedrooms\": 2,\n"," \"bathrooms\": 1,\n"," \"square_feet\": 779,\n"," \"address\": \"2 Fallon Place Unit 57\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94133\",\n"," \"tags\": [],\n"," \"agent_name\": \"Eddie O'Sullivan\",\n"," \"agency\": \"Elevation Real Estate\"\n"," },\n"," {\n"," \"price\": 898000,\n"," \"bedrooms\": 2,\n"," \"bathrooms\": 2,\n"," \"square_feet\": 1175,\n"," \"address\": \"5160 Diamond Heights Blvd Unit 208C\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94131\",\n"," \"tags\": [],\n"," \"agent_name\": \"Joe Polyak\",\n"," \"agency\": \"Rise Homes\"\n"," },\n"," {\n"," \"price\": 1700000,\n"," \"bedrooms\": 4,\n"," \"bathrooms\": 2,\n"," \"square_feet\": 1950,\n"," \"address\": \"1351 26th Ave\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94122\",\n"," \"tags\": [],\n"," \"agent_name\": \"Glenda Queensbury\",\n"," \"agency\": \"Referral Realty-BV\"\n"," },\n"," {\n"," \"price\": 1899000,\n"," \"bedrooms\": 3,\n"," \"bathrooms\": 2,\n"," \"square_feet\": 1560,\n"," \"address\": \"340 Yerba Buena Ave\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94127\",\n"," \"tags\": [],\n"," \"agent_name\": \"Jeannie Anderson\",\n"," \"agency\": \"Coldwell Banker Realty\"\n"," },\n"," {\n"," \"price\": 850000,\n"," \"bedrooms\": 2,\n"," \"bathrooms\": 2,\n"," \"square_feet\": 1055,\n"," \"address\": \"588 Minna Unit 604\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94103\",\n"," \"tags\": [],\n"," \"agent_name\": \"Mohamed Lakdawala\",\n"," \"agency\": \"Remax Prestigious Properties\"\n"," },\n"," {\n"," \"price\": 1990000,\n"," \"bedrooms\": 3,\n"," \"bathrooms\": 1,\n"," \"square_feet\": 1280,\n"," \"address\": \"1450 Diamond St\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94131\",\n"," \"tags\": [],\n"," \"agent_name\": \"Mary Anne Villamil\",\n"," \"agency\": \"Kinetic Real Estate\"\n"," },\n"," {\n"," \"price\": 849000,\n"," \"bedrooms\": 1,\n"," \"bathrooms\": 1,\n"," \"square_feet\": 855,\n"," \"address\": \"81 Lansing St Unit 401\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94105\",\n"," \"tags\": [],\n"," \"agent_name\": \"Kristen Haenggi\",\n"," \"agency\": \"Compass\"\n"," },\n"," {\n"," \"price\": 1080000,\n"," \"bedrooms\": 2,\n"," \"bathrooms\": 2,\n"," \"square_feet\": 936,\n"," \"address\": \"451 Kansas St Unit 466\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94107\",\n"," \"tags\": [],\n"," \"agent_name\": \"Maureen DeBoer\",\n"," \"agency\": \"LKJ Realty\"\n"," },\n"," {\n"," \"price\": 1499000,\n"," \"bedrooms\": 4,\n"," \"bathrooms\": 2,\n"," \"square_feet\": 2145,\n"," \"address\": \"486 Yale St\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94134\",\n"," \"tags\": [],\n"," \"agent_name\": \"Alicia Atienza\",\n"," \"agency\": \"Statewide Realty\"\n"," },\n"," {\n"," \"price\": 1140000,\n"," \"bedrooms\": 2,\n"," \"bathrooms\": 2,\n"," \"square_feet\": 998,\n"," \"address\": \"588 Minna Unit 801\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94103\",\n"," \"tags\": [],\n"," \"agent_name\": \"Milan Jezdimirovic\",\n"," \"agency\": \"Compass\"\n"," },\n"," {\n"," \"price\": 1988000,\n"," \"bedrooms\": 2,\n"," \"bathrooms\": 1,\n"," \"square_feet\": 3800,\n"," \"address\": \"183 19th Ave\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94121\",\n"," \"tags\": [\n"," \"Amazing Property\",\n"," \"Marina Style\",\n"," \"Needs TLC\"\n"," ],\n"," \"agent_name\": \"Leo Cheung\",\n"," \"agency\": \"eXp Realty of California, Inc\"\n"," },\n"," {\n"," \"price\": 1218000,\n"," \"bedrooms\": 2,\n"," \"bathrooms\": 2,\n"," \"square_feet\": 1275,\n"," \"address\": \"1998 Pacific Ave Unit 202\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94109\",\n"," \"tags\": [\n"," \"Light-filled\",\n"," \"Freshly painted\",\n"," \"Walker's paradise\"\n"," ],\n"," \"agent_name\": \"Grace Sun\",\n"," \"agency\": \"Compass\"\n"," },\n"," {\n"," \"price\": 895000,\n"," \"bedrooms\": 1,\n"," \"bathrooms\": 1,\n"," \"square_feet\": 837,\n"," \"address\": \"425 1st St Unit 2501\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94105\",\n"," \"tags\": [\n"," \"Unobstructed bay bridge views\",\n"," \"Open layout\"\n"," ],\n"," \"agent_name\": \"Matt Fuller\",\n"," \"agency\": \"Jackson Fuller Real Estate\"\n"," },\n"," {\n"," \"price\": 1499000,\n"," \"bedrooms\": 3,\n"," \"bathrooms\": 1,\n"," \"square_feet\": 1500,\n"," \"address\": \"Unlisted Address\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"NA\",\n"," \"tags\": [\n"," \"Contractor's Special\",\n"," \"Fixer-upper\"\n"," ],\n"," \"agent_name\": \"Jaymee Faith Sagisi\",\n"," \"agency\": \"IMPACT\"\n"," },\n"," {\n"," \"price\": 900000,\n"," \"bedrooms\": 1,\n"," \"bathrooms\": 1,\n"," \"square_feet\": 930,\n"," \"address\": \"1101 Green St Unit 302\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94109\",\n"," \"tags\": [\n"," \"Historic Art Deco\",\n"," \"Iconic views\"\n"," ],\n"," \"agent_name\": \"NA\",\n"," \"agency\": \"NA\"\n"," },\n"," {\n"," \"price\": 858000,\n"," \"bedrooms\": 1,\n"," \"bathrooms\": 1,\n"," \"square_feet\": 1104,\n"," \"address\": \"260 King St Unit 557\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94107\",\n"," \"tags\": [],\n"," \"agent_name\": \"Miyuki Takami\",\n"," \"agency\": \"eXp Realty of California, Inc\"\n"," },\n"," {\n"," \"price\": 945000,\n"," \"bedrooms\": 2,\n"," \"bathrooms\": 1,\n"," \"square_feet\": 767,\n"," \"address\": \"307 Page St Unit 1\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94102\",\n"," \"tags\": [],\n"," \"agent_name\": \"NA\",\n"," \"agency\": \"NA\"\n"," },\n"," {\n"," \"price\": 1099000,\n"," \"bedrooms\": 2,\n"," \"bathrooms\": 2,\n"," \"square_feet\": 1330,\n"," \"address\": \"1080 Sutter St Unit 202\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94109\",\n"," \"tags\": [],\n"," \"agent_name\": \"Annette Liberty\",\n"," \"agency\": \"Coldwell Banker Realty\"\n"," },\n"," {\n"," \"price\": 950000,\n"," \"bedrooms\": 4,\n"," \"bathrooms\": 3,\n"," \"square_feet\": 2090,\n"," \"address\": \"3328 26th St Unit 3330\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94110\",\n"," \"tags\": [],\n"," \"agent_name\": \"Isaac Munene\",\n"," \"agency\": \"Coldwell Banker Realty\"\n"," },\n"," {\n"," \"price\": 1088000,\n"," \"bedrooms\": 2,\n"," \"bathrooms\": 2,\n"," \"square_feet\": 1065,\n"," \"address\": \"1776 Sacramento St Unit 503\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94109\",\n"," \"tags\": [],\n"," \"agent_name\": \"Marilyn Becklehimer\",\n"," \"agency\": \"Dio Real Estate\"\n"," },\n"," {\n"," \"price\": 1788888,\n"," \"bedrooms\": 4,\n"," \"bathrooms\": 3,\n"," \"square_feet\": 1856,\n"," \"address\": \"2317 15th St\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94114\",\n"," \"tags\": [],\n"," \"agent_name\": \"Joel Gile\",\n"," \"agency\": \"Sequoia Real Estate\"\n"," },\n"," {\n"," \"price\": 1650000,\n"," \"bedrooms\": 3,\n"," \"bathrooms\": 2,\n"," \"square_feet\": 1547,\n"," \"address\": \"2475 47th Ave\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94116\",\n"," \"tags\": [],\n"," \"agent_name\": \"Lucy Goldenshteyn\",\n"," \"agency\": \"Redfin\"\n"," },\n"," {\n"," \"price\": 998000,\n"," \"bedrooms\": 2,\n"," \"bathrooms\": 2,\n"," \"square_feet\": 1202,\n"," \"address\": \"50 Lansing St Unit 201\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94105\",\n"," \"tags\": [],\n"," \"agent_name\": \"Tracey Broadman\",\n"," \"agency\": \"Vanguard Properties\"\n"," },\n"," {\n"," \"price\": 1595000,\n"," \"bedrooms\": 3,\n"," \"bathrooms\": 5,\n"," \"square_feet\": 1995,\n"," \"address\": \"15 Joy St\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94110\",\n"," \"tags\": [],\n"," \"agent_name\": \"Mike Stack\",\n"," \"agency\": \"Vanguard Properties\"\n"," },\n"," {\n"," \"price\": 1028000,\n"," \"bedrooms\": 2,\n"," \"bathrooms\": 2,\n"," \"square_feet\": 1065,\n"," \"address\": \"50 Lansing St Unit 403\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94105\",\n"," \"tags\": [],\n"," \"agent_name\": \"Robyn Kaufman\",\n"," \"agency\": \"Vivre Real Estate\"\n"," },\n"," {\n"," \"price\": 999000,\n"," \"bedrooms\": 1,\n"," \"bathrooms\": 1,\n"," \"square_feet\": 1021,\n"," \"address\": \"338 Spear St Unit 6J\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94105\",\n"," \"tags\": [\n"," \"Spacious\",\n"," \"Balcony\",\n"," \"Bright courtyard views\"\n"," ],\n"," \"agent_name\": \"Paul Hwang\",\n"," \"agency\": \"Skybox Realty\"\n"," },\n"," {\n"," \"price\": 799800,\n"," \"bedrooms\": 2,\n"," \"bathrooms\": 2,\n"," \"square_feet\": 1109,\n"," \"address\": \"10 Innes Ct\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94124\",\n"," \"tags\": [\n"," \"New Construction\"\n"," ],\n"," \"agent_name\": \"Lennar\",\n"," \"agency\": \"Lennar\"\n"," },\n"," {\n"," \"price\": 529880,\n"," \"bedrooms\": 1,\n"," \"bathrooms\": 1,\n"," \"square_feet\": 740,\n"," \"address\": \"10 Innes Ct\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94124\",\n"," \"tags\": [\n"," \"New Construction\"\n"," ],\n"," \"agent_name\": \"Lennar\",\n"," \"agency\": \"Lennar\"\n"," },\n"," {\n"," \"price\": 489000,\n"," \"bedrooms\": 1,\n"," \"bathrooms\": 1,\n"," \"square_feet\": 741,\n"," \"address\": \"10 Innes Ct\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94124\",\n"," \"tags\": [\n"," \"New Construction\"\n"," ],\n"," \"agent_name\": \"Lennar\",\n"," \"agency\": \"Lennar\"\n"," },\n"," {\n"," \"price\": 1359000,\n"," \"bedrooms\": 4,\n"," \"bathrooms\": 2,\n"," \"square_feet\": 1845,\n"," \"address\": \"170 Thrift St\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94112\",\n"," \"tags\": [\n"," \"Updated\",\n"," \"Single-family home\"\n"," ],\n"," \"agent_name\": \"Cristal Wright\",\n"," \"agency\": \"Vanguard Properties\"\n"," },\n"," {\n"," \"price\": 1295000,\n"," \"bedrooms\": 3,\n"," \"bathrooms\": 1,\n"," \"square_feet\": 1214,\n"," \"address\": \"1922 43rd Ave\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94116\",\n"," \"tags\": [],\n"," \"agent_name\": \"Mila Romprey\",\n"," \"agency\": \"Premier Realty Associates\"\n"," },\n"," {\n"," \"price\": 1098000,\n"," \"bedrooms\": 3,\n"," \"bathrooms\": 1,\n"," \"square_feet\": 1006,\n"," \"address\": \"150 Putnam St\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94110\",\n"," \"tags\": [],\n"," \"agent_name\": \"Genie Mantzoros\",\n"," \"agency\": \"Epic Real Estate & Asso. Inc.\"\n"," },\n"," {\n"," \"price\": 1189870,\n"," \"bedrooms\": 3,\n"," \"bathrooms\": 2,\n"," \"square_feet\": 1436,\n"," \"address\": \"327 Ordway St\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94134\",\n"," \"tags\": [],\n"," \"agent_name\": \"Shawn Zahraie\",\n"," \"agency\": \"Affinity Enterprises, Inc\"\n"," },\n"," {\n"," \"price\": 899000,\n"," \"bedrooms\": 2,\n"," \"bathrooms\": 1,\n"," \"square_feet\": 1118,\n"," \"address\": \"272 Farallones St\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94112\",\n"," \"tags\": [],\n"," \"agent_name\": \"Janice Lee\",\n"," \"agency\": \"Coldwell Banker Realty\"\n"," },\n"," {\n"," \"price\": 30000,\n"," \"bedrooms\": 0,\n"," \"bathrooms\": 0,\n"," \"square_feet\": 0,\n"," \"address\": \"0 Evans Ave\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94124\",\n"," \"tags\": [\n"," \"Land\",\n"," \"0.12 Acre\",\n"," \"$251,467 per Acre\"\n"," ],\n"," \"agent_name\": \"Heidy Carrera\",\n"," \"agency\": \"Berkshire Hathaway HomeService\"\n"," }\n"," ]\n","}\n"]}],"source":["import json\n","\n","# Print the response\n","request_id = repo_response['request_id']\n","result = repo_response['result']\n","\n","print(f\"Request ID: {request_id}\")\n","print(json.dumps(result, indent=2))"]},{"cell_type":"markdown","metadata":{"id":"2as65QLypwdb"},"source":["### 💾 Save the output to a `CSV` file"]},{"cell_type":"markdown","metadata":{"id":"HTLVFgbVLLBR"},"source":["Let's create a pandas dataframe and show the table with the extracted content"]},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":1000},"id":"1lS9O1KOI51y","outputId":"89fe200c-deca-45b1-be2e-6cf3e9f97fe2"},"outputs":[{"data":{"text/html":["\n","\n","
\n"," \n"," \n"," | \n"," price | \n"," bedrooms | \n"," bathrooms | \n"," square_feet | \n"," address | \n"," city | \n"," state | \n"," zip_code | \n"," tags | \n"," agent_name | \n"," agency | \n","
\n"," \n"," \n"," \n"," 0 | \n"," 549000 | \n"," 1 | \n"," 1 | \n"," 477 | \n"," 380 14th St Unit 405 | \n"," San Francisco | \n"," CA | \n"," 94103 | \n"," [New construction] | \n"," Eddie O'Sullivan | \n"," Elevation Real Estate | \n","
\n"," \n"," 1 | \n"," 1799000 | \n"," 4 | \n"," 2 | \n"," 2735 | \n"," 123 Grattan St | \n"," San Francisco | \n"," CA | \n"," 94117 | \n"," [] | \n"," Sean Engmann | \n"," eXp Realty of Northern CA Inc. | \n","
\n"," \n"," 2 | \n"," 1995000 | \n"," 7 | \n"," 3 | \n"," 3330 | \n"," 1590 Washington St | \n"," San Francisco | \n"," CA | \n"," 94109 | \n"," [] | \n"," Eddie O'Sullivan | \n"," Elevation Real Estate | \n","
\n"," \n"," 3 | \n"," 549000 | \n"," 0 | \n"," 1 | \n"," 477 | \n"," 240 Lombard St Unit 835 | \n"," San Francisco | \n"," CA | \n"," 94111 | \n"," [] | \n"," Tim Gullicksen | \n"," Corcoran Icon Properties | \n","
\n"," \n"," 4 | \n"," 5495000 | \n"," 10 | \n"," 7 | \n"," 6505 | \n"," 1057 Steiner St | \n"," San Francisco | \n"," CA | \n"," 94115 | \n"," [] | \n"," Bonnie Spindler | \n"," Corcoran Icon Properties | \n","
\n"," \n"," 5 | \n"," 925000 | \n"," 2 | \n"," 1 | \n"," 779 | \n"," 2 Fallon Place Unit 57 | \n"," San Francisco | \n"," CA | \n"," 94133 | \n"," [] | \n"," Eddie O'Sullivan | \n"," Elevation Real Estate | \n","
\n"," \n"," 6 | \n"," 898000 | \n"," 2 | \n"," 2 | \n"," 1175 | \n"," 5160 Diamond Heights Blvd Unit 208C | \n"," San Francisco | \n"," CA | \n"," 94131 | \n"," [] | \n"," Joe Polyak | \n"," Rise Homes | \n","
\n"," \n"," 7 | \n"," 1700000 | \n"," 4 | \n"," 2 | \n"," 1950 | \n"," 1351 26th Ave | \n"," San Francisco | \n"," CA | \n"," 94122 | \n"," [] | \n"," Glenda Queensbury | \n"," Referral Realty-BV | \n","
\n"," \n"," 8 | \n"," 1899000 | \n"," 3 | \n"," 2 | \n"," 1560 | \n"," 340 Yerba Buena Ave | \n"," San Francisco | \n"," CA | \n"," 94127 | \n"," [] | \n"," Jeannie Anderson | \n"," Coldwell Banker Realty | \n","
\n"," \n"," 9 | \n"," 850000 | \n"," 2 | \n"," 2 | \n"," 1055 | \n"," 588 Minna Unit 604 | \n"," San Francisco | \n"," CA | \n"," 94103 | \n"," [] | \n"," Mohamed Lakdawala | \n"," Remax Prestigious Properties | \n","
\n"," \n"," 10 | \n"," 1990000 | \n"," 3 | \n"," 1 | \n"," 1280 | \n"," 1450 Diamond St | \n"," San Francisco | \n"," CA | \n"," 94131 | \n"," [] | \n"," Mary Anne Villamil | \n"," Kinetic Real Estate | \n","
\n"," \n"," 11 | \n"," 849000 | \n"," 1 | \n"," 1 | \n"," 855 | \n"," 81 Lansing St Unit 401 | \n"," San Francisco | \n"," CA | \n"," 94105 | \n"," [] | \n"," Kristen Haenggi | \n"," Compass | \n","
\n"," \n"," 12 | \n"," 1080000 | \n"," 2 | \n"," 2 | \n"," 936 | \n"," 451 Kansas St Unit 466 | \n"," San Francisco | \n"," CA | \n"," 94107 | \n"," [] | \n"," Maureen DeBoer | \n"," LKJ Realty | \n","
\n"," \n"," 13 | \n"," 1499000 | \n"," 4 | \n"," 2 | \n"," 2145 | \n"," 486 Yale St | \n"," San Francisco | \n"," CA | \n"," 94134 | \n"," [] | \n"," Alicia Atienza | \n"," Statewide Realty | \n","
\n"," \n"," 14 | \n"," 1140000 | \n"," 2 | \n"," 2 | \n"," 998 | \n"," 588 Minna Unit 801 | \n"," San Francisco | \n"," CA | \n"," 94103 | \n"," [] | \n"," Milan Jezdimirovic | \n"," Compass | \n","
\n"," \n"," 15 | \n"," 1988000 | \n"," 2 | \n"," 1 | \n"," 3800 | \n"," 183 19th Ave | \n"," San Francisco | \n"," CA | \n"," 94121 | \n"," [Amazing Property, Marina Style, Needs TLC] | \n"," Leo Cheung | \n"," eXp Realty of California, Inc | \n","
\n"," \n"," 16 | \n"," 1218000 | \n"," 2 | \n"," 2 | \n"," 1275 | \n"," 1998 Pacific Ave Unit 202 | \n"," San Francisco | \n"," CA | \n"," 94109 | \n"," [Light-filled, Freshly painted, Walker's parad... | \n"," Grace Sun | \n"," Compass | \n","
\n"," \n"," 17 | \n"," 895000 | \n"," 1 | \n"," 1 | \n"," 837 | \n"," 425 1st St Unit 2501 | \n"," San Francisco | \n"," CA | \n"," 94105 | \n"," [Unobstructed bay bridge views, Open layout] | \n"," Matt Fuller | \n"," Jackson Fuller Real Estate | \n","
\n"," \n"," 18 | \n"," 1499000 | \n"," 3 | \n"," 1 | \n"," 1500 | \n"," Unlisted Address | \n"," San Francisco | \n"," CA | \n"," NA | \n"," [Contractor's Special, Fixer-upper] | \n"," Jaymee Faith Sagisi | \n"," IMPACT | \n","
\n"," \n"," 19 | \n"," 900000 | \n"," 1 | \n"," 1 | \n"," 930 | \n"," 1101 Green St Unit 302 | \n"," San Francisco | \n"," CA | \n"," 94109 | \n"," [Historic Art Deco, Iconic views] | \n"," NA | \n"," NA | \n","
\n"," \n"," 20 | \n"," 858000 | \n"," 1 | \n"," 1 | \n"," 1104 | \n"," 260 King St Unit 557 | \n"," San Francisco | \n"," CA | \n"," 94107 | \n"," [] | \n"," Miyuki Takami | \n"," eXp Realty of California, Inc | \n","
\n"," \n"," 21 | \n"," 945000 | \n"," 2 | \n"," 1 | \n"," 767 | \n"," 307 Page St Unit 1 | \n"," San Francisco | \n"," CA | \n"," 94102 | \n"," [] | \n"," NA | \n"," NA | \n","
\n"," \n"," 22 | \n"," 1099000 | \n"," 2 | \n"," 2 | \n"," 1330 | \n"," 1080 Sutter St Unit 202 | \n"," San Francisco | \n"," CA | \n"," 94109 | \n"," [] | \n"," Annette Liberty | \n"," Coldwell Banker Realty | \n","
\n"," \n"," 23 | \n"," 950000 | \n"," 4 | \n"," 3 | \n"," 2090 | \n"," 3328 26th St Unit 3330 | \n"," San Francisco | \n"," CA | \n"," 94110 | \n"," [] | \n"," Isaac Munene | \n"," Coldwell Banker Realty | \n","
\n"," \n"," 24 | \n"," 1088000 | \n"," 2 | \n"," 2 | \n"," 1065 | \n"," 1776 Sacramento St Unit 503 | \n"," San Francisco | \n"," CA | \n"," 94109 | \n"," [] | \n"," Marilyn Becklehimer | \n"," Dio Real Estate | \n","
\n"," \n"," 25 | \n"," 1788888 | \n"," 4 | \n"," 3 | \n"," 1856 | \n"," 2317 15th St | \n"," San Francisco | \n"," CA | \n"," 94114 | \n"," [] | \n"," Joel Gile | \n"," Sequoia Real Estate | \n","
\n"," \n"," 26 | \n"," 1650000 | \n"," 3 | \n"," 2 | \n"," 1547 | \n"," 2475 47th Ave | \n"," San Francisco | \n"," CA | \n"," 94116 | \n"," [] | \n"," Lucy Goldenshteyn | \n"," Redfin | \n","
\n"," \n"," 27 | \n"," 998000 | \n"," 2 | \n"," 2 | \n"," 1202 | \n"," 50 Lansing St Unit 201 | \n"," San Francisco | \n"," CA | \n"," 94105 | \n"," [] | \n"," Tracey Broadman | \n"," Vanguard Properties | \n","
\n"," \n"," 28 | \n"," 1595000 | \n"," 3 | \n"," 5 | \n"," 1995 | \n"," 15 Joy St | \n"," San Francisco | \n"," CA | \n"," 94110 | \n"," [] | \n"," Mike Stack | \n"," Vanguard Properties | \n","
\n"," \n"," 29 | \n"," 1028000 | \n"," 2 | \n"," 2 | \n"," 1065 | \n"," 50 Lansing St Unit 403 | \n"," San Francisco | \n"," CA | \n"," 94105 | \n"," [] | \n"," Robyn Kaufman | \n"," Vivre Real Estate | \n","
\n"," \n"," 30 | \n"," 999000 | \n"," 1 | \n"," 1 | \n"," 1021 | \n"," 338 Spear St Unit 6J | \n"," San Francisco | \n"," CA | \n"," 94105 | \n"," [Spacious, Balcony, Bright courtyard views] | \n"," Paul Hwang | \n"," Skybox Realty | \n","
\n"," \n"," 31 | \n"," 799800 | \n"," 2 | \n"," 2 | \n"," 1109 | \n"," 10 Innes Ct | \n"," San Francisco | \n"," CA | \n"," 94124 | \n"," [New Construction] | \n"," Lennar | \n"," Lennar | \n","
\n"," \n"," 32 | \n"," 529880 | \n"," 1 | \n"," 1 | \n"," 740 | \n"," 10 Innes Ct | \n"," San Francisco | \n"," CA | \n"," 94124 | \n"," [New Construction] | \n"," Lennar | \n"," Lennar | \n","
\n"," \n"," 33 | \n"," 489000 | \n"," 1 | \n"," 1 | \n"," 741 | \n"," 10 Innes Ct | \n"," San Francisco | \n"," CA | \n"," 94124 | \n"," [New Construction] | \n"," Lennar | \n"," Lennar | \n","
\n"," \n"," 34 | \n"," 1359000 | \n"," 4 | \n"," 2 | \n"," 1845 | \n"," 170 Thrift St | \n"," San Francisco | \n"," CA | \n"," 94112 | \n"," [Updated, Single-family home] | \n"," Cristal Wright | \n"," Vanguard Properties | \n","
\n"," \n"," 35 | \n"," 1295000 | \n"," 3 | \n"," 1 | \n"," 1214 | \n"," 1922 43rd Ave | \n"," San Francisco | \n"," CA | \n"," 94116 | \n"," [] | \n"," Mila Romprey | \n"," Premier Realty Associates | \n","
\n"," \n"," 36 | \n"," 1098000 | \n"," 3 | \n"," 1 | \n"," 1006 | \n"," 150 Putnam St | \n"," San Francisco | \n"," CA | \n"," 94110 | \n"," [] | \n"," Genie Mantzoros | \n"," Epic Real Estate & Asso. Inc. | \n","
\n"," \n"," 37 | \n"," 1189870 | \n"," 3 | \n"," 2 | \n"," 1436 | \n"," 327 Ordway St | \n"," San Francisco | \n"," CA | \n"," 94134 | \n"," [] | \n"," Shawn Zahraie | \n"," Affinity Enterprises, Inc | \n","
\n"," \n"," 38 | \n"," 899000 | \n"," 2 | \n"," 1 | \n"," 1118 | \n"," 272 Farallones St | \n"," San Francisco | \n"," CA | \n"," 94112 | \n"," [] | \n"," Janice Lee | \n"," Coldwell Banker Realty | \n","
\n"," \n"," 39 | \n"," 30000 | \n"," 0 | \n"," 0 | \n"," 0 | \n"," 0 Evans Ave | \n"," San Francisco | \n"," CA | \n"," 94124 | \n"," [Land, 0.12 Acre, $251,467 per Acre] | \n"," Heidy Carrera | \n"," Berkshire Hathaway HomeService | \n","
\n"," \n","
\n","
"],"text/plain":[" price bedrooms bathrooms square_feet \\\n","0 549000 1 1 477 \n","1 1799000 4 2 2735 \n","2 1995000 7 3 3330 \n","3 549000 0 1 477 \n","4 5495000 10 7 6505 \n","5 925000 2 1 779 \n","6 898000 2 2 1175 \n","7 1700000 4 2 1950 \n","8 1899000 3 2 1560 \n","9 850000 2 2 1055 \n","10 1990000 3 1 1280 \n","11 849000 1 1 855 \n","12 1080000 2 2 936 \n","13 1499000 4 2 2145 \n","14 1140000 2 2 998 \n","15 1988000 2 1 3800 \n","16 1218000 2 2 1275 \n","17 895000 1 1 837 \n","18 1499000 3 1 1500 \n","19 900000 1 1 930 \n","20 858000 1 1 1104 \n","21 945000 2 1 767 \n","22 1099000 2 2 1330 \n","23 950000 4 3 2090 \n","24 1088000 2 2 1065 \n","25 1788888 4 3 1856 \n","26 1650000 3 2 1547 \n","27 998000 2 2 1202 \n","28 1595000 3 5 1995 \n","29 1028000 2 2 1065 \n","30 999000 1 1 1021 \n","31 799800 2 2 1109 \n","32 529880 1 1 740 \n","33 489000 1 1 741 \n","34 1359000 4 2 1845 \n","35 1295000 3 1 1214 \n","36 1098000 3 1 1006 \n","37 1189870 3 2 1436 \n","38 899000 2 1 1118 \n","39 30000 0 0 0 \n","\n"," address city state zip_code \\\n","0 380 14th St Unit 405 San Francisco CA 94103 \n","1 123 Grattan St San Francisco CA 94117 \n","2 1590 Washington St San Francisco CA 94109 \n","3 240 Lombard St Unit 835 San Francisco CA 94111 \n","4 1057 Steiner St San Francisco CA 94115 \n","5 2 Fallon Place Unit 57 San Francisco CA 94133 \n","6 5160 Diamond Heights Blvd Unit 208C San Francisco CA 94131 \n","7 1351 26th Ave San Francisco CA 94122 \n","8 340 Yerba Buena Ave San Francisco CA 94127 \n","9 588 Minna Unit 604 San Francisco CA 94103 \n","10 1450 Diamond St San Francisco CA 94131 \n","11 81 Lansing St Unit 401 San Francisco CA 94105 \n","12 451 Kansas St Unit 466 San Francisco CA 94107 \n","13 486 Yale St San Francisco CA 94134 \n","14 588 Minna Unit 801 San Francisco CA 94103 \n","15 183 19th Ave San Francisco CA 94121 \n","16 1998 Pacific Ave Unit 202 San Francisco CA 94109 \n","17 425 1st St Unit 2501 San Francisco CA 94105 \n","18 Unlisted Address San Francisco CA NA \n","19 1101 Green St Unit 302 San Francisco CA 94109 \n","20 260 King St Unit 557 San Francisco CA 94107 \n","21 307 Page St Unit 1 San Francisco CA 94102 \n","22 1080 Sutter St Unit 202 San Francisco CA 94109 \n","23 3328 26th St Unit 3330 San Francisco CA 94110 \n","24 1776 Sacramento St Unit 503 San Francisco CA 94109 \n","25 2317 15th St San Francisco CA 94114 \n","26 2475 47th Ave San Francisco CA 94116 \n","27 50 Lansing St Unit 201 San Francisco CA 94105 \n","28 15 Joy St San Francisco CA 94110 \n","29 50 Lansing St Unit 403 San Francisco CA 94105 \n","30 338 Spear St Unit 6J San Francisco CA 94105 \n","31 10 Innes Ct San Francisco CA 94124 \n","32 10 Innes Ct San Francisco CA 94124 \n","33 10 Innes Ct San Francisco CA 94124 \n","34 170 Thrift St San Francisco CA 94112 \n","35 1922 43rd Ave San Francisco CA 94116 \n","36 150 Putnam St San Francisco CA 94110 \n","37 327 Ordway St San Francisco CA 94134 \n","38 272 Farallones St San Francisco CA 94112 \n","39 0 Evans Ave San Francisco CA 94124 \n","\n"," tags agent_name \\\n","0 [New construction] Eddie O'Sullivan \n","1 [] Sean Engmann \n","2 [] Eddie O'Sullivan \n","3 [] Tim Gullicksen \n","4 [] Bonnie Spindler \n","5 [] Eddie O'Sullivan \n","6 [] Joe Polyak \n","7 [] Glenda Queensbury \n","8 [] Jeannie Anderson \n","9 [] Mohamed Lakdawala \n","10 [] Mary Anne Villamil \n","11 [] Kristen Haenggi \n","12 [] Maureen DeBoer \n","13 [] Alicia Atienza \n","14 [] Milan Jezdimirovic \n","15 [Amazing Property, Marina Style, Needs TLC] Leo Cheung \n","16 [Light-filled, Freshly painted, Walker's parad... Grace Sun \n","17 [Unobstructed bay bridge views, Open layout] Matt Fuller \n","18 [Contractor's Special, Fixer-upper] Jaymee Faith Sagisi \n","19 [Historic Art Deco, Iconic views] NA \n","20 [] Miyuki Takami \n","21 [] NA \n","22 [] Annette Liberty \n","23 [] Isaac Munene \n","24 [] Marilyn Becklehimer \n","25 [] Joel Gile \n","26 [] Lucy Goldenshteyn \n","27 [] Tracey Broadman \n","28 [] Mike Stack \n","29 [] Robyn Kaufman \n","30 [Spacious, Balcony, Bright courtyard views] Paul Hwang \n","31 [New Construction] Lennar \n","32 [New Construction] Lennar \n","33 [New Construction] Lennar \n","34 [Updated, Single-family home] Cristal Wright \n","35 [] Mila Romprey \n","36 [] Genie Mantzoros \n","37 [] Shawn Zahraie \n","38 [] Janice Lee \n","39 [Land, 0.12 Acre, $251,467 per Acre] Heidy Carrera \n","\n"," agency \n","0 Elevation Real Estate \n","1 eXp Realty of Northern CA Inc. \n","2 Elevation Real Estate \n","3 Corcoran Icon Properties \n","4 Corcoran Icon Properties \n","5 Elevation Real Estate \n","6 Rise Homes \n","7 Referral Realty-BV \n","8 Coldwell Banker Realty \n","9 Remax Prestigious Properties \n","10 Kinetic Real Estate \n","11 Compass \n","12 LKJ Realty \n","13 Statewide Realty \n","14 Compass \n","15 eXp Realty of California, Inc \n","16 Compass \n","17 Jackson Fuller Real Estate \n","18 IMPACT \n","19 NA \n","20 eXp Realty of California, Inc \n","21 NA \n","22 Coldwell Banker Realty \n","23 Coldwell Banker Realty \n","24 Dio Real Estate \n","25 Sequoia Real Estate \n","26 Redfin \n","27 Vanguard Properties \n","28 Vanguard Properties \n","29 Vivre Real Estate \n","30 Skybox Realty \n","31 Lennar \n","32 Lennar \n","33 Lennar \n","34 Vanguard Properties \n","35 Premier Realty Associates \n","36 Epic Real Estate & Asso. Inc. \n","37 Affinity Enterprises, Inc \n","38 Coldwell Banker Realty \n","39 Berkshire Hathaway HomeService "]},"execution_count":10,"metadata":{},"output_type":"execute_result"}],"source":["import pandas as pd\n","\n","# Convert dictionary to DataFrame\n","df = pd.DataFrame(result[\"houses\"])\n","df"]},{"cell_type":"markdown","metadata":{"id":"v0CBYVk7qA5Z"},"source":["Save it to CSV"]},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"BtEbB9pmQGhO","outputId":"96b40841-381e-49fb-db05-752dfe63ad00"},"outputs":[{"name":"stdout","output_type":"stream","text":["Data saved to zillow_forsale.csv\n"]}],"source":["# Save the DataFrame to a CSV file\n","csv_file = \"houses_forsale.csv\"\n","df.to_csv(csv_file, index=False)\n","print(f\"Data saved to {csv_file}\")"]},{"cell_type":"markdown","metadata":{"id":"-1SZT8VzTZNd"},"source":["## 🔗 Resources"]},{"cell_type":"markdown","metadata":{"id":"dUi2LtMLRDDR"},"source":["\n","\n","
\n","
\n","\n","\n","- 🚀 **Get your API Key:** [ScrapeGraphAI Dashboard](https://dashboard.scrapegraphai.com) \n","- 🐙 **GitHub:** [ScrapeGraphAI GitHub](https://github.com/scrapegraphai) \n","- 💼 **LinkedIn:** [ScrapeGraphAI LinkedIn](https://www.linkedin.com/company/scrapegraphai/) \n","- 🐦 **Twitter:** [ScrapeGraphAI Twitter](https://twitter.com/scrapegraphai) \n","- 💬 **Discord:** [Join our Discord Community](https://discord.gg/uJN7TYcpNa) \n","\n","Made with ❤️ by the [ScrapeGraphAI](https://scrapegraphai.com) Team \n"]}],"metadata":{"colab":{"provenance":[]},"kernelspec":{"display_name":"Python 3","name":"python3"},"language_info":{"codemirror_mode":{"name":"ipython","version":3},"file_extension":".py","mimetype":"text/x-python","name":"python","nbconvert_exporter":"python","pygments_lexer":"ipython3","version":"3.10.14"}},"nbformat":4,"nbformat_minor":0}
\ No newline at end of file
diff --git a/cookbook/wired-news/scrapegraph_langgraph.ipynb b/cookbook/wired-news/scrapegraph_langgraph.ipynb
index f75ded7..447d984 100644
--- a/cookbook/wired-news/scrapegraph_langgraph.ipynb
+++ b/cookbook/wired-news/scrapegraph_langgraph.ipynb
@@ -1 +1 @@
-{"cells":[{"cell_type":"markdown","metadata":{"id":"ReBHQ5_834pZ"},"source":["\n","
\n",""]},{"cell_type":"markdown","metadata":{"id":"jEkuKbcRrPcK"},"source":["## 🕷️ Extract Wired Science News with `langchain-scrapegraph` and `langgraph`"]},{"cell_type":"markdown","source":[""],"metadata":{"id":"FxtXj1Qtx3zH"}},{"cell_type":"markdown","metadata":{"id":"IzsyDXEWwPVt"},"source":["### 🔧 Install `dependencies`"]},{"cell_type":"code","execution_count":1,"metadata":{"id":"os_vm0MkIxr9","executionInfo":{"status":"ok","timestamp":1734786986883,"user_tz":-60,"elapsed":13752,"user":{"displayName":"ScrapeGraphAI","userId":"10474323355016263615"}}},"outputs":[],"source":["%%capture\n","!pip install langgraph langchain-scrapegraph langchain-openai"]},{"cell_type":"markdown","metadata":{"id":"apBsL-L2KzM7"},"source":["### 🔑 Import `ScrapeGraph` and `OpenAI` API keys"]},{"cell_type":"markdown","metadata":{"id":"ol9gQbAFkh9b"},"source":["You can find the Scrapegraph API key [here](https://dashboard.scrapegraphai.com/)"]},{"cell_type":"code","execution_count":2,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"elapsed":67210,"status":"ok","timestamp":1734787067908,"user":{"displayName":"ScrapeGraphAI","userId":"10474323355016263615"},"user_tz":-60},"id":"sffqFG2EJ8bI","outputId":"8c3dd34f-22d5-4557-c562-97728738726b"},"outputs":[{"name":"stdout","output_type":"stream","text":["Scrapegraph API key:\n","··········\n","OpenAI API key:\n","··········\n"]}],"source":["import getpass\n","import os\n","\n","if not os.environ.get(\"SGAI_API_KEY\"):\n"," os.environ[\"SGAI_API_KEY\"] = getpass.getpass(\"Scrapegraph API key:\\n\")\n","\n","if not os.environ.get(\"OPENAI_API_KEY\"):\n"," os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API key:\\n\")"]},{"cell_type":"markdown","metadata":{"id":"jnqMB2-xVYQ7"},"source":["### 📝 Defining an `Output Schema` for Webpage Content Extraction\n"]},{"cell_type":"markdown","metadata":{"id":"VZvxbjfXvbgd"},"source":["If you already know what you want to extract from a webpage, you can **define an output schema** using **Pydantic**. This schema acts as a \"blueprint\" that tells the AI how to structure the response.\n","\n","\n"," Pydantic Schema Quick Guide
\n","\n","Types of Schemas \n","\n","1. Simple Schema \n","Use this when you want to extract straightforward information, such as a single piece of content. \n","\n","```python\n","from pydantic import BaseModel, Field\n","\n","# Simple schema for a single webpage\n","class PageInfoSchema(BaseModel):\n"," title: str = Field(description=\"The title of the webpage\")\n"," description: str = Field(description=\"The description of the webpage\")\n","\n","# Example Output JSON after AI extraction\n","{\n"," \"title\": \"ScrapeGraphAI: The Best Content Extraction Tool\",\n"," \"description\": \"ScrapeGraphAI provides powerful tools for structured content extraction from websites.\"\n","}\n","```\n","\n","2. Complex Schema (Nested) \n","If you need to extract structured information with multiple related items (like a list of repositories), you can **nest schemas**.\n","\n","```python\n","from pydantic import BaseModel, Field\n","from typing import List\n","\n","# Define a schema for a single repository\n","class RepositorySchema(BaseModel):\n"," name: str = Field(description=\"Name of the repository (e.g., 'owner/repo')\")\n"," description: str = Field(description=\"Description of the repository\")\n"," stars: int = Field(description=\"Star count of the repository\")\n"," forks: int = Field(description=\"Fork count of the repository\")\n"," today_stars: int = Field(description=\"Stars gained today\")\n"," language: str = Field(description=\"Programming language used\")\n","\n","# Define a schema for a list of repositories\n","class ListRepositoriesSchema(BaseModel):\n"," repositories: List[RepositorySchema] = Field(description=\"List of GitHub trending repositories\")\n","\n","# Example Output JSON after AI extraction\n","{\n"," \"repositories\": [\n"," {\n"," \"name\": \"google-gemini/cookbook\",\n"," \"description\": \"Examples and guides for using the Gemini API\",\n"," \"stars\": 8036,\n"," \"forks\": 1001,\n"," \"today_stars\": 649,\n"," \"language\": \"Jupyter Notebook\"\n"," },\n"," {\n"," \"name\": \"TEN-framework/TEN-Agent\",\n"," \"description\": \"TEN Agent is a conversational AI powered by TEN, integrating Gemini 2.0 Multimodal Live API, OpenAI Realtime API, RTC, and more.\",\n"," \"stars\": 3224,\n"," \"forks\": 311,\n"," \"today_stars\": 361,\n"," \"language\": \"Python\"\n"," }\n"," ]\n","}\n","```\n","\n","Key Takeaways \n","- **Simple Schema**: Perfect for small, straightforward extractions. \n","- **Complex Schema**: Use nesting to extract lists or structured data, like \"a list of repositories.\" \n","\n","Both approaches give the AI a clear structure to follow, ensuring that the extracted content matches exactly what you need.\n"," \n"]},{"cell_type":"code","execution_count":71,"metadata":{"id":"dlrOEgZk_8V4","executionInfo":{"status":"ok","timestamp":1734792801652,"user_tz":-60,"elapsed":221,"user":{"displayName":"ScrapeGraphAI","userId":"10474323355016263615"}}},"outputs":[],"source":["from pydantic import BaseModel, Field\n","from typing import List\n","\n","# Schema for a single news item\n","class NewsItemSchema(BaseModel):\n"," title: str = Field(description=\"Title of the news article\")\n"," link: str = Field(description=\"URL to the news article\")\n"," author: str = Field(description=\"Author of the news article\")\n","\n","# Schema that contains a list of news items\n","class ListNewsSchema(BaseModel):\n"," news: List[NewsItemSchema] = Field(description=\"List of news articles with their details\")"]},{"cell_type":"markdown","metadata":{"id":"cDGH0b2DkY63"},"source":["### 🚀 Initialize `langchain-scrapegraph` tools and `langgraph` prebuilt agent and run the `extraction`"]},{"cell_type":"markdown","metadata":{"id":"M1KSXffZopUD"},"source":["Here we use `SmartScraperTool` to extract structured data using AI from a webpage.\n","\n","\n","> If you already have an HTML file, you can upload it and use `LocalScraperTool` instead.\n","\n","You can find more info in the [official langchain documentation](https://python.langchain.com/docs/integrations/tools/scrapegraph/)\n","\n"]},{"cell_type":"code","execution_count":72,"metadata":{"id":"ySoE0Rowjgp1","executionInfo":{"status":"ok","timestamp":1734792804219,"user_tz":-60,"elapsed":222,"user":{"displayName":"ScrapeGraphAI","userId":"10474323355016263615"}}},"outputs":[],"source":["from langchain_scrapegraph.tools import SmartScraperTool\n","\n","# Will automatically get SGAI_API_KEY from environment\n","# Initialization without output schema\n","# tool = SmartScraperTool()\n","\n","# Since we have defined an output schema, let's use it\n","# This will force the tool to have always the same output structure\n","smartscraper_tool = SmartScraperTool(llm_output_schema=ListNewsSchema)"]},{"cell_type":"markdown","source":["We then initialize the `llm model` we want to use in the agent\n","\n"],"metadata":{"id":"W54HVoYeiJbG"}},{"cell_type":"code","source":["# First we initialize the llm model we want to use.\n","from langchain_openai import ChatOpenAI\n","\n","llm_model = ChatOpenAI(model=\"gpt-4o-mini\", temperature=0)"],"metadata":{"id":"ctrkEnltiBCD","executionInfo":{"status":"ok","timestamp":1734790431512,"user_tz":-60,"elapsed":350,"user":{"displayName":"ScrapeGraphAI","userId":"10474323355016263615"}}},"execution_count":12,"outputs":[]},{"cell_type":"markdown","source":["Here we use `create_react_agent` to quickly use one of the prebuilt agents from `langgraph.prebuilt` module\n","\n","You can find more info in the [official langgraph documentation](https://langchain-ai.github.io/langgraph/how-tos/create-react-agent/)\n","\n"],"metadata":{"id":"M0WY2Pa8Y8Pk"}},{"cell_type":"code","source":["from langgraph.prebuilt import create_react_agent\n","from langgraph.checkpoint.memory import MemorySaver\n","\n","# List of tools we want the agent to use\n","tools = [smartscraper_tool]\n","\n","# We set up the agent's memory to review the different reasoning steps\n","memory = MemorySaver()\n","\n","# Add a configuration to specify where to store the graph states\n","config = {\"configurable\": {\"thread_id\": \"1\"}}\n","\n","# Initialize the ReAct agent\n","graph = create_react_agent(llm_model, tools=tools, checkpointer=memory)"],"metadata":{"id":"Zo1BcIlHhcQP","executionInfo":{"status":"ok","timestamp":1734792887477,"user_tz":-60,"elapsed":224,"user":{"displayName":"ScrapeGraphAI","userId":"10474323355016263615"}}},"execution_count":75,"outputs":[]},{"cell_type":"markdown","source":["Let's visualize the `graph`"],"metadata":{"id":"_UYcJ2Mxip5w"}},{"cell_type":"code","execution_count":42,"metadata":{"id":"2FIKomclLNFx","colab":{"base_uri":"https://localhost:8080/","height":350},"executionInfo":{"status":"ok","timestamp":1734791621218,"user_tz":-60,"elapsed":262,"user":{"displayName":"ScrapeGraphAI","userId":"10474323355016263615"}},"outputId":"ed1c8315-c350-42ac-944b-6916f242a7b2"},"outputs":[{"output_type":"display_data","data":{"image/png":"iVBORw0KGgoAAAANSUhEUgAAAHwAAAFNCAIAAABNLZxVAAAAAXNSR0IArs4c6QAAIABJREFUeJztnXlAU1cW/+/LvrOFNWERVMCdEarFrS4VUeo2VXFrtdqpW6f92dra0aozU63j1Lr+uqFVK1ZUVByZKrVgVRQ3VEDZBNkiBMhGNrLn90f8UUcj8l7uy0sgn79Cknfu4cvlvPvuPfdcxGq1Ag/OhUS0Az0Rj+gE4BGdADyiE4BHdALwiE4AFOgWLRbQXKfTKE1apdlsshh07jEkpTFIDDaJzaNwvSk+QTRc20JgjdMtJlB6s63mvqa+QiuIYtJZZDaP7M2n6XVmKPbxxmoFSqlRqzTTWaTmen3kQHbkAE5IFAOPtuCIfutXWfktlbAPM3IgJzyWBcMxImmTGGvua6Rig1JqHPEGPyCMDte+o6LX3tfkHBYPHuM9fLIfPK9cBdHD9mtnJcERzFEz+RDNOiT6rV9l8hbj2FkBVDoC0SdXo65Ue/FEy9xPwuhMOOMO7KIX5sqNeku37ODPo5Kbjm6rX/z3XlQahO6FUfTcjBYWh/xqSo9QvIP9n9ekrglj88gO2sHy/3LvkoJGJ/U0xQEA89eGH/1XneN2UIv+uKpd0WwYNQPmjcVdYLBJU5aG5B5tcdAOatEvn2odMNLbwVbdl+BejHaNueaBxhEj6ESvvKPyDabxQ/B9YHNxElP8rp2VOmIBnegP76pHvuHvSHtdp6mpqbGxkajLO8E3iBbZn111T43ZAgrRW0V6tcLE9nb03t0VRCLR1KlTS0tLCbn8pQSEMyrvqjBfjkL0RyWaXgPYmFtChclkwjaWtV2F+fIu0msAu+Y+9rCOYpz+331NiSl+0GfgdDrd1q1bL1++DACIi4v7+OOPrVbr1KlTO76QkpKyadMmg8GQlpaWk5PT3NzM5/OnTJny3nvvkclkAMDs2bOjoqKioqIyMjJ0Ot2BAwfmzp37zOVwfQYA5GW09I7jhkUzMVyLYmq3vlKb5BeEoY3OOXDgQHZ29rJly/h8fnZ2NpPJZLFYX3zxxfr165ctWxYfH+/r6wsAIJPJN27cGD16tFAorKio+PHHH3k83oIFC2xGCgoKdDrdjh07tFpteHj485dDh0JD5M16fEU36CwkEkKhwp9jaWxsZDKZixYtolAo06dPt70ZExMDAIiIiBgyZIjtHTKZfOjQIQR54oBIJMrLy+sQnUKhbNmyhclkvuhy6LC9KJo2jLPWXY3pWpXZ8cdfuyQnJ+t0uvfff7+qqqrzb8pksq1bt06fPn3cuHHV1dVS6R/jtgEDBnQo7hzYPIpWacJ2bVdFt1oAnYmL6ImJibt27ZJKpampqV988YXJZP83kUql8+fPv3nz5vLly/fs2RMbG2s2/9HRnKw4AIBCJSEkjP/3XQ0vLC5Z0WrA1sZLSUxMHD58+NGjR3fs2BEcHLxkyZLnv3Py5EmZTHbw4MGgoCAAQFBQUF0dhGkQzKjkRjoL40xvVy+js0hGvcWCw9KbwWAAAJBIpPnz5/v7+5eXlwMAGAwGAKC1tbXjawqFwsfHx6a47cdOxl3PXw4djdLE5mFcYUZxWUQ/tkZp5vpADjIZGRmXLl2aPHlya2tra2trv379AACBgYECgSA9PZ3JZLa1taWmpsbHxx8/fvzbb78dPHhwXl7e1atXLRaLQqHw9rYzEfT85XQ65CU3AIAXn4rtQhT/IFxfanUx9sewFyEUCg0Gw44dO7KyslJTUxcuXAgAQBBky5YtbDb7q6++Onv2rEwmGzdu3NKlS0+cOLFu3Tqj0Xjw4MGIiIhjx47Ztfn85dDdLslvC4/BuBqM4uHocVX7zfOyGasE2FrqTjRUthfmyqYvxygFivAi6M1ESMBksFJevGSVkpKiVtuZCRo0aFBxcfHz73t5eZ05c6brPmAjPz9//fr1z79vtVqtViuJZOff/cKFC1TqC6OHuFbXN46H2R90y3X3LilUMlMnKxhisdhisXTdIIlE6rg34odOp7MbYSwWi8VioVDs9Lzg4OCOB7FnrWks6Vtql26OxOwP6jXSHzfUpH4cxsLnQcktyMtoCYpg9BuOvaejHmmOmuFfdFmBuT13Ryk16bRmRxTHInqfOI7RYCm+0uZIq+7L0X/XT5gX6KARLM9Uo2f6VxWpHVk6cVOOf90w9S8hNIajKUfYk41yfmqOHMDu8yeOgx64C8d3NCQtDPbiQ8hzxv5HS3or8FGJ+tavcsedcHGUUuP3a6tHTfOHojiEBNI7eYqSfEXiG/w+cd2wy7erzdfOSg3t5gnzA6l0aBsoIKRKq2Sma2clep0loh+7V3821xf+RgPnU1+uFdfpiq+0jXjDL3aYQ2OV54G2KUAiMtg2BdAYpOBIJoNFYvMoXB+qyYTiWYlALCarSmHSKs0IAoqvKIR9WH3iONDltgFN9A6kTYaWer26zahVmhES0CghTwc/ePAgLCyMy+XCNctgkehMMptH5vFp4TEsEp4Pf/BFx5ulS5euXLkyLi6OaEew49ldRwAe0QnA/UQXCAS2HCP3xf1Ef/z48dN5AO6I+4nOYrHsLju4Ee7nvVarRbVO4oK4n+g+Pj6enu5s5HK5p6c7G6FQ6Bm9OBuRSOQZvXhAjfuJzuFwPDdSZ6NWqz03UmfD4/E8Pd3ZKJVKT0/3gBr3Ez0oKMgzTnc2YrHYM073gBr3E90zDUAAnmkAD1hwP9FDQ0M94cXZNDQ0eMKLB9S4n+ieFAwC8KRgeMCC+4nuyXshAE/eCwF4ZhkJwDPL6AEL7ie6t7f3i0oluAvuJ3rnhaTcAvcT3TOfTgCe+XQC8PR0AvD0dALw8/Nz957uNpt3k5KSaDQaiUSSyWRsNptKpZJIJCqVmpmZSbRrqHGb4gksFquhocH2ur293fZi2bJlhDqFEbcJL8nJyc88EwmFwjlz5hDnEXbcRvRZs2YJBP9TfHLy5MnQyzI4B7cR3cfHZ9KkSR0/hoaGPn0Gg3vhNqIDAObNmxcaGmp77b7d3M1E5/F4SUlJCIKEh4e7bzfHa/SikptkYoPRAH99J3HwzBu96hITE8XVAADINQpJCMLxpvgG0ig4H68KeZyulBovnZRIGvXhsWyNys2eG+l0sqxZZ7WAPnGc+Nd98GsIpuhqhSnrm8Zx80K4Pm4z/LfLrRwJi0MaPhmXs3qgxnQrOPj32mkrw9xdcQBAQhJfq7bgV3ISmugFv8hGTAuAZY1wEpL4NQ80Oi0uaQfQRG+s1nJ9u9uRmfJmXA68gSa6xYLwfDEezOGa+AUzlHIjHpahia5RGC0W95iw7CIGnRngk9TkTg9H3QaP6ATgEZ0APKITgEd0AvCITgAe0QnAIzoBeEQnAI/oBOARnQC6v+hqtbryYTnRXvwP3V/0pX9JPXcO9yNPUeEGootE9Y5cbjs43KUgbGmtpaV5/4Fvbty4qtGoQ0PD581dPGH8k1wiqVSyZ++/CwtvUKjUoUOHXb6c+/236b16RQEAzvwn8/iJdImkJSgoZPy4SXNmL6TT6Q+rKt7/6ztbt+z+Yd+e6urKwMDg997964gRYwAAqfNS5HJZ1pkTWWdOBAYGZfycTdTv+zSEiW4ym8rLH0yb+qYXz/tyft7mLesFgtDYmP5ms/lv6z6UyaUffLBWJpOk7dsbNyTepvjBQz+cyEyfOSM1PDyyoaH22PGfRI/r/7b2HwAAvV7/93+ufX/VmuCgkAMHv/tiy7qMn7O9vLw3bdz2yaerhgweOuvN+VSaqyxsESZ6SLDg4I8nbDmhycnTZvx5wtWrv8fG9C8ru1/5sHzjhq2vjZkAAKivrz13/j8Gg0GpbDvy84/r120eM3q8zYKfn/+OnV+uWvmx7cf3V60ZN3YiAGDp0lXvLVtQVHxn9KhxMdH9KBSKnx9/4MAhRP2mz0Pkyn1VdeXBQ99XVJQCAMxms0wmBQC0tDYDAEJChLbvCIVhFoulvV1bWHjDZDJt3rJ+85YnR3TbkkckrS22H5kMpu1FYGAwAEAiaSXo13o5hIl+5+6tT9e+Hzck/pM1G9ks9oZNayxWCwBAIAgFAJSU3OvbJwYAUFZ2n8/39/LylsokAIAtm3cG+P/PwbchIcKa2uqn36FSqAAAi8V1U50IE/3w4X0hIcItm3faDjPv6KfRfWMT4of/kLa7ublJ0Sa/eu3S+nWbAQBc7pOz+8LCItC25Wq7TQgbMrYpFb2j+toUNxgM2vY/alu8v2qNUBjWIKrz9vLZu+eALbjHxSUgCHI661iHhY79GJ3DZDClUgluvwcWCOvpQ4bE5+Sc/eXcGR7X68TJIyqVsram2mq1ms3mFavenvXmAoEgFEEQlUqpVqs5HI5QEDpzRurJU0f/tv7/jBzxmlQqyTpz/Mstu2xRqBMGDozLzTv/89GDXC5v2CsjAgIcPZbbcQgT/Z1Fy2VSyZ69/+ZyeSlTZs5+c8HXO7fcvXf7T3EJ8UOHH07fZzKZbN/kcri7d+2PiIhcuWJ1QEDg6dPHbt0q8PPjjxo51p//8pyy9/7yV5lMcjh9n7eXT3R0P1cQHVoC6cFNtZPeEbK9IPwVzWazbdOi1WptbHq89N3U2bMWLF7k7D1d+aebIwewouPhbz1wuWRPvV6/YtXbAQFBgwf9iUqllZTc1el0UVF9ifYLJi4nOoIgE1+fkpeXc+DgdzQarVev3hs3bB09ahzRfsHE5USn0WhzZi+cM3sh0Y7giBvMMnY/PKITgEd0AvCITgAe0QnAIzoBeEQnAI/oBOARnQA8ohMANNF9Q2gutj7jKHQWmUrHpVNCM0qlkqSNOljWXAFRpcY3CJesDWiiRw7iSJv0sKwRjqbN5MWnevvjsh0ZmujRQzlGvbnokgyWQSKxgtyjTa+96Y+Tecj1Xi783EKhknyD6P5CBkSzzgEhISqZUSkzXs9uefvzCK4vXvPe8IthPryrrrmvMRmtksd2oo1er0MQEs2BDDetVkOnM1AVIZXL5TQajc1md/41JpdMoSEhkcxhk/Cq9PIEqxM5e/ZsWlqag0aWLFly584dVJcsWLAgPj7+jTfeyMzMdLB1KLhN2dcOCgoKYmJifHxQlHv68MMPr1y5giAIlUqNiopasWJFYmIinj6+BCc9HInF4m+++QaKqVdffRWV4rYijrYXRqOxtLR03bp1H3zwARRnsOEM0dVqdVpa2ooVK6BYO3LkSF1dHapLwsPDO86jIpFIKpXq6tWrY8aMgeIPBpwhOofD+fzzz2FZu3jxokyGbmDK5/OfuYsymcxLly7BcgktuIv+2Wef1dbWQjS4cePGmJiXpNI9Q0BAAJ1Ot722Wq0BAQFXrlyB6BJa8BV9z54906ZNi4hAnWfbCaGhoUwmE9UlAQEBtktYLFZ6evqgQYMg+oMB9xu9HDx4cNSoUVFRUaiumjlzptlsPnPmDACgrKyMRqOhtQATnIai1dXVBw4cwMPyBx98cOXKFTwsOw1cRNfr9SkpKXhYtlqtVVVVcrncQSNlZWWrV6+G5BFq3C+8wGL//v1RUVGvvfaa85uGL/qDBw+sVuuAAQPgmu0gIyNDKBSOHDkSJ/tOAPLopaamZuPGjfgpDgBobW2tqqqCYqq8vDw/Px+KKXTAjVa3bt1Sq9VwbT6DSCSqq6uDZW3mzJk1NTWwrHURmOFFp9MBABgMd5pJl0gkIpFoyBCnbu2FFl7UavWkSZOcoPi9e/e+/vprWNb4fL6TFYcp+rlz53bv3g3LWicwGIw7d+5ANFhQULBp0yaIBl+Ok8OZ4xiNxqqqKrg233777YaGBrg2OwFOTM/MzExISAgPD4fRDbo/EMJLUVHRL7/84kzFP/vsM5FIBNdmWVkZXIOdAEF0MpkM8c7WFWzP8XBtHjly5Pz583Btvgi3nAZobm5GECQgAOYRHKWlpQUFBUuWLIFo80U4KvqXX345ZMiQ5ORkeC51fxwKLwaD4d69e85XXCaTffXVV9DNXr9+XSwWQzf7PA6JTqPRjh071oUvQsbX1/fixYvQBaqurj59+jRcm3ZxSPSKioouFl2Bzs6dT6rzQGTkyJG+vjjndgHgUEwXiUQrV660LYB5QAX2nl5aWrp48WKozqBApVKtW7cOutnjx4931JnBD+yiT5w4cfr06VCdQQGXyxWJRPfv34dr9vDhwy0tLXBtPg9G0Q0GQ05ODmxn0LF161boITg1NbUjFww/MMb03NzcnJycbdu24eBS9wfjX5VOp7/11luwnUHNmjVrYC3d2cjPz29sbIRo0C4YRR85ciSuC6FdJCEhITsbZs3iY8eOwU0CtA+G6WCTybR7924c5pmJJz09/dGjR3i3guX5oqys7Pbt2zh0ACxUVVX5+fmhzVh/EfPnz4dip3OwhBcGg7Fq1SocnMGCQqH47LPPYFk7f/68UqmEZe1FYBG9d+/eCQkJODiDhfj4+JiYmObmZijWtm3b5oS5biyiZ2VlQV+4cYQPP/wwMBBCXVGz2TxhwgQvLy8YTnUKhvtAcnKyWCzG4QaDnR07dphMJqK96Cqoe7rJZJo3bx6UngURJpO5f/9+B420trZev34dkkedgVp0CoWyYMECfJzBztKlSyMjIx008ttvvzkntRG16NXV1T///DM+zmCHTCZPmDDB9nr06NHjx4/HYEQgEDhnFQz1OL2kpKS6uroLX3Q206ZNa2pqslW+9/LyunXrFtoh1ujRo3Hz7n9ALfqAAQP69++PjzMYefPNN5uamvT6P0oRUKlUDOtK586dmzhxIqqqA9hAHV569+7dp08ffJzBiE6nsyUMd0AikdCKXldXl5aW5gTFsYh+9OhRZyZDdYVTp04lJCQ8rZfFYkE7LU4mkyE+2XYOatFzc3Of6VaEQ6PRvvvuuzlz5jw9A4O2zwqFQqc9ZqMWfe7cub1798bHGYdYvXr1mjVrhEKh1WrFENN/+umn+nqHTibsOu6RVmc0WDUKU1ccFYvFu3btkkgkGzduFAqFXW9i3rx5Bw4c6NjNjgEEAC8+FelCN0Yt+t69e5cvX+6cGw4AoPKOuuiyQtqo9/KnGQ0W3NqxWixWB1dHeT60x1Wa8FjO0PHewZGd7UhBJ7rBYBgzZkxBQYEjznWdoitt9eXt8RP5HG+XO0biRSilxiunmxOn+IXFvLCAATrR9Xp9Xl6ecx7b7l5UNNcbRkyHmZrrNM4fEL2S5Bsey7L7qYvGdK3S/NvRlrGpwUQ7ghGjznr5VNP05SF2P0UXxVpaWrZv3w7Jsc6QNOrNXbpxuihUBiIT6zVt9pPF0IkulUrv3r0LybHOUMpMAaHutB/1eYR92PIWo92P0IkeEhLyySefQPKqM0xGi64dv7GKM1ArTFaL/X9WdKJ7eXkRXhaoG4BO9JKSku+++w43Z3oKqG+kjx49ws2ZngK6h464uDjHV8U8oBPd19fXORtEujfowsv169cPHz6MmzM9BXSii8VitCVXPTwPuvCSmJjo/Ooo3Q90osPdGd5jQRderly5kpmZiZszPQV0Pb2pqckZGxW6O+h6emJi4rRp03BzxlFKy+4/nf2Cgd8v/TZ2fHx9Pb4dC53oQqEwOjoaN2cc4nzO2ZWrFul0xGybRwU60QsKCn755RfcnHEIB/u4M0Enek1NjatlGtn4Lff8zl1bAQDTZ04YOz7+fM5Z2/ulZff/+uHSpOTEaTPG/2vb35WqJ1tbTCZT2r69b86e9HrS8KV/mZt/9Xe7Zq9fz39n6ZxJk0csemfWqdPQ6n2gu5EmJCRotVpYbUMkfuiw2bMWHD+R/uXmnWw2RygMAwDU1j766ONlERFRn6zZ2KaQHzj4XUuLePtX3wIAvtr+xW+55xbMfyciIuq33HOfb/h41460QYPinrap1Wo3/ePTiPDIj1avr6mpkkpbYXmLTnRXy2LswNvbJyRECACIjR3g5eVtezP9yH4SibTtX3u5HC4AgMvlbdm6oajojo+Pb86v2W8tXLro7fcAAGNGj1/w1oyDh77/evv/zFrLFTK9Xj9q1LjXJ0BeiEcXXu7evfv77/b/E12Qe0WFcXEJNsUBAAkJrwIAKipLi4rvAABGjhxrex9BkIT44RWVpc9cHhIs6N9/UPqR/SdPZRgMBoiOoRO9rKyssLAQYvO4otGovb3+yG7kcnkAAImkVaNRAwB8vP+YLuXxvLRarUajefpyBEG2btmdNDHlu+93vrVoZlERtLKn6EQfPHjwiBEjYLWNB09nlPD5AUplW8ePcrkMAMDhcPn8AADA0x/JZFIKhfJ8oWAOh/PhB2sPHTzJZnPWf74a1v0Mnej9+/cfPnw4lIahw2QwbR25453+/QfdKyrsyDG+fDkXADBw4JDY2AEIgly/8WR7kcFguH4jv3//QWQymUalPf33sA1DQ4IFM2ekqjVqsRhOrQYyqtq+paWlNTU1AoEAStudIK7Ttastgt72M6TswmCyzvznRG3dIwQgpWUl0dH9IsIjT546eq+okEqlXb+Rv//AN4MGxr391rs8npdY3HQ66xgAiETS+u23O2pqq9d8vCE4WEChUk9nHSuveBAWFsH3839r0UyJpFUqlZzOOmbQ65e8s6LrycCPilWCKIYX384xsuhEv3DhwoMHD5xw2h4G0Xlcnr9/4O+/XygouKJSKZOSUng8r4ED4m7dLjibfbKismzsaxPXfLzBlpebEP+qRqM+d/5MXl4Om8X++KP1ttssl8MNDgq5c/cWCSHF9hsoEtXnX714JT/Pz89/7SebBAIUacCdiI4ure727dttbW3Y9q6h4t4lhVRsemUSH++G8OPC4caE171Do+30G3Tj9Pj4eHhe9VxQTwMUFxfj5kxPAZ3ot27dclrp5W4MuvASHh7OYqG4uXmwCzrRhw0bhpsnPQh04UUkEj148AA3Z3oK6EQvLCw8efIkbs70FNCFF4FAAHe+rWeCepzuGao7Duq0Oug1hXsg6EQvLS09dOgQbs70FNCJLhAIBg8ejJszPQV0MT06Otpl817cCHQ9XSKROKe0GI1OorNwr2OOK1wfKomM2P0I3S/W1NS0b98+SF51Bs+P2lzjBrlanVBXpvYNotn9CJ3oQUFBr7/+OiSvOiMwjE6m2O8mboFWaQ6KYDA59muFuGhtAABAxW1V6Q3VhAX2t9e7OGf21ie9HegvtF89Bl1P12g0P/30EyTHXkJ0PHfoeO9z+0UtDTq9m+ye1ipN4tr2zB21U5YEv0hx1D1do9EkJydfvnwZkpMvp6lGdzdP8bhaSyIhBj1M6TEUV+scn0Bau8oUHstOSPLl+nQ2LEQ3ZGSz2U6u+RrcixG8JAgAYDbCDINtbW2pqannzp2DaNMCAJXapfuQ68Z0XNHpdIcPH3733XcJaR216CdPnkxOTvasHzkC6qB24sQJJxxKgzd6vT4rK4uo1lGLPmPGjOdz/twOrVa7d+9eolrvoTFdr9fn5+c7IWvKLqhFv3nzppeXl2fayxFQh5fi4uK8vDx8nHEearWawF3IqEUfPXr0wIED8XHGebS0tBByULONHhrT5XJ5YWFhx4kOTga16FKpNCsra8mSJbi51P1BHV44HI7jh9sQTnl5OYGH2KIWnU6nL1++nKhD1GFx/fr1yspKolrvoTH95s2bHA6nX79+hLSORfSCggI/P7++ffvi41L3B8uEcn19PYETF1DYt2+fVColqnUsoo8aNYqof0womM3mH374wc/PjygHemJMVygUBQUFzim9bxeMon/zzTdz5swhsLO4NRgXCdVq9bVr12A74ySysrJKSkoIdABjT29tbW1ra3PNA49eyqRJkw4fPuzv70+UAz0upmu12uLiYmIrHGDPQdi+fXtTUxNUZ5wBi8UivKYEdtHZbHZ2djZUZ5zBP/7xj8ePHxPrA/ZDmxYuXFhTUwPVGdypqKh4+PChE6p4dE7PiulyuZxKpXI4HGLdcEj0zMxMlUq1ePFiqC51fxxK5ktOTs7NzYXnDL6kpaU5J7n+pfSg8JKampqRkUG0FwCC6FqttqWlJSIiAp5L3R9Hc4VZLNbmzZvv3IFWPg8PDAaDS5UIhpCg/emnn7p45Z0NGzZQqXZqaRFF94/pcrm8qqrKaYd2dwU4WxFEItH3338PxRR0fHx8XEpxaKILhUKRSORScdPGrl27nLZJquvADC9qtZrwh72naWho+O9//7ts2TKiHXkWmKIrlUqTyeQ5Z+2lwNxexuPxPv30UxcZPmZmZjrnkGAMQN5/v3fv3urqarg2MZCbm/vo0aO4uLgufJcAuv+Q0QXBpdLE9u3b4W7RREV2dvYz5eddDVxE/+ijjwoLC5ubm20/Tp48GdchxNNbh9auXUun09lsNn7NOQ7u4SUpKUkqlQqFwvT0dDwGlPv27bMde3379m21Wm2xWHg8HvRW4IJjIZuSkpJhw4bZUgZ1Oh1Oh1M/ePDA1m+GDh06ffp011ccX9EXLVpkNpttr+VyOU754NXV1QiC2M4NUSgURO1SRAUuok+aNGno0KE2LWyYzWY8jgKrqKgwmUxPv9PW1vbKK69AbwguuIh+/vz5Xr16cTgci+VJsRCr1VpeXg69obq6OqVS+fQ7AoFg7Nix0BuCC/YUjM7JzMzMzs7OyMgQi8VyuRxBEKVSKRaLg4KCILZSVFTU3t6OIAiJRAoODk5MTJw5c6bLHjvWAV6iAwBSUlJSUlIuXbqUkZFRW1vb1tZWW1sLV/SSkhIKhRISEpKUlDR16tSQEPeoPeXQkFFU2V5T2t7SoNOqTO1qM4IgRr3Z7jetVqvFYiGT7RcSw4zZbEYQhISQwAuK2/gGMdrVRiaH4htED4qgRQ1ks71w7GddBIvoaoXp1gVF2Y02tg+dG8ChMihUOplCJ5MpJOBycwqIUW8y6U1mo0Ul0aqlWq4PdfBor37DuET6hEp0sxlcPNb66L46qC+fw2e+qNijK6NTGWT1bQatfswM/14DiSkVhEL0ugr95VMSli/LL8wNHkA6R68xSmsVPF/S5EU+oktuAAAFcUlEQVQBUKundYmuin7/mvLWBUWvVwhOvYSLXKTSKdRz16A4pwsKXRK9rqL90ilZ2BCYAw8XQSvXa1oVsz5w6rDn5f9aj+5rrmR1T8UBACwfOivA++dtDc5s9CWiqxWm3462CAd1T8VtsL3pTF/ur0danNbiS0T/735x+JBgZzlDGD4CrqzFUnPfSUsfnYlefltpBhQ6x4US0vDDJ9T78mmJc9rqTPT8LKl/ZE/Jp6CzqTQ2vfS6sgvfdZQXil5dpGH7sagMyA/uUDhyYsO/ds2GbtY31Ls4n1DRK++qmTy3L3qJCjqHqlKYVHJTF77rEC8UvbZUzQtw6eVdPODwWY/uq/Fuxf6UW0u93k/AJuFzQIJM3vifczsrq29SKXRBSHTyhGWhgn4AgANH1vjzw8lkyo3bWSazMbbviJlvfMJkPFnLvldy4deL++SKpkD/SKsVrxr2XD9Wqwj3Sln2e7pGZTIacPnFlErJ3rR3tVrltMmrpyStMpuN/3ffe03NT5LCLl09IpM3vrNg+/TJq4vv5+b+fsD2/p2inPTj63kcv+mTP4ruM7xR/BAP3wAAJApJ8liPk/EO7Pd0rdJMpuByC71w6UcO2/e9xXvJZAoAYOjg5K07/3zj9pnpU1YDAPz9wua9+XcEQcKE/YtLL1ZUXU8B7xuN+jO/fB0ZHvfu23tsM/ISaQNOulPoZK0K95huX3STwUJl4TI8L6+8pmhr/ts/X+t4x2w2KpRP0pKoVEbHcravd3BtfTEAoKauSKNVjEpM7VgDIZHwGlNRGRQmF/fnEvuik8iIQYvLH1yllvaLHjll4sqn32TQ7SQhkclUi8UMAJC3iW1/Azz8eQazwaxR4H72p33RWTyKxYTLMzGLydNo2wL8UWyB5LB9AABqrQIPf57BqDczubiv59m/kbK5ZIsJlxtpn8iE2vqihsd/5MDoDS8ZLYQE9UEQ0p0iZ5zibjKYud4EhZeAcIayVYdHe6+PXVpWeTXt0F9Hj5jHZfuWPyywWMyL5/+7k0t8vINe+dMbNwrPmEz66D6vKlWSssqrXA4u5cPaFbrwPvbPPoPIC2I6CYREsVSSdi6fCbc9vp9w1btpZ3N25106CBBEGBwzYvisl141fcpHFArtbnFORdWNXmGDQ4L6qtS4VFXUyLRRAwPxsPw0L1w5un+1reSGLjiWj7cHroNRZ66/07jkn7hvuX/hTSN2mNeNXzu7d2m1yi07Ztj9iO8rlMhEz7/fP2b03D9vxOSnHdp16s3bp9n9iMPytnvjfW3E/AmvvfMig21i9YARzlhz72yN9PovsoYai3+kj91PLRaLok38IrPAXgYMjca0DUWg0IkDJpORQrFzP2QyuEym/YwXqwWU5tWs3O6M+nsvWZj+Zk11zJhwd8xvQYu4Utp3IPVP46D1iU54yXLdxAVBLVVOWk8hEJ3KSLYanaP4y0XvPZgd3ocmqZE7xxtCsFpA9Q3RrA+dl9Lz8hSMV6f4BoeSmx92W90flzQt2tDLmS12KaVsxBs+3j7m5oeEFRzHCb3aeP/XmhkrgtleTl2VRJHLWJireFSq5wbyGFzcn9mcgLReqWlVLlwXjjh9lIAua/dxVXve8VYSlRrQx49Kd8U1664gE6laqmQDEr1HTiMm1wFLfnpFobrkmkopM3L82F5BbCqD4vpjSrPRopa2q1o1WoUuPJY95s98Bsvp2br/H+w7MVpF+od3NY21+pY6LYmEUJlkGpOC09wkZhgcmrJVq9eafYMZXB9K9FB2rwEcKo3gLgJnx7S+3aJVmvQ6K3Cx8g4kCsLikNk8CkJYt7aDpwoGAbhSB+gxeEQnAI/oBOARnQA8ohOAR3QC+H+6fjfnM2J1xQAAAABJRU5ErkJggg==\n","text/plain":[""]},"metadata":{}}],"source":["from IPython.display import Image, display\n","\n","display(Image(graph.get_graph().draw_mermaid_png()))"]},{"cell_type":"markdown","source":["`Run the graph` and stream the agent reasoning.\n","\n","We are going to ask the agent to extract the content from a `specific webpage`."],"metadata":{"id":"cw-T5CYWkCEN"}},{"cell_type":"code","source":["# inputs for the agent\n","inputs = {\"messages\": [(\"user\", \"Go to https://www.wired.com/category/science/ and extract all the news\")]}\n","\n","# run the graph\n","for event in graph.stream(inputs, config, stream_mode=\"values\"):\n"," event[\"messages\"][-1].pretty_print()"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"Qn1rC2y8kAe9","executionInfo":{"status":"ok","timestamp":1734793681953,"user_tz":-60,"elapsed":24711,"user":{"displayName":"ScrapeGraphAI","userId":"10474323355016263615"}},"outputId":"749364bb-a84c-4cf8-8903-1d1dd9d7be79"},"execution_count":86,"outputs":[{"output_type":"stream","name":"stdout","text":["================================\u001b[1m Human Message \u001b[0m=================================\n","\n","Go to https://www.wired.com/category/science/ and extract all the news\n","==================================\u001b[1m Ai Message \u001b[0m==================================\n","Tool Calls:\n"," SmartScraper (call_jIkP03hdmoO83XT1V7PsBI6p)\n"," Call ID: call_jIkP03hdmoO83XT1V7PsBI6p\n"," Args:\n"," user_prompt: Extract all the news articles from the Science category on Wired. Include the title, publication date, author, and a brief summary of each article.\n"," website_url: https://www.wired.com/category/science/\n","=================================\u001b[1m Tool Message \u001b[0m=================================\n","Name: SmartScraper\n","\n","{\"news\": [{\"title\": \"December Wildfires Are Now a Thing\", \"link\": \"https://www.wired.com/story/december-wildfires-are-now-a-thing/\", \"author\": \"Kylie Mohr\"}, {\"title\": \"How to Manage Food Anxiety Over the Holidays\", \"link\": \"https://www.wired.com/story/how-to-cope-with-food-anxiety-during-the-festive-season/\", \"author\": \"Alison Fixsen\"}, {\"title\": \"A Spacecraft Is About to Fly Into the Sun’s Atmosphere for the First Time\", \"link\": \"https://www.wired.com/story/parker-solar-probe-atmosphere/\", \"author\": \"Eric Berger, Ars Technica\"}, {\"title\": \"To Improve Your Gut Microbiome, Spend More Time in Nature\", \"link\": \"https://www.wired.com/story/to-improve-your-gut-microbiome-spend-more-time-in-nature-kathy-willis/\", \"author\": \"Kathy Willis\"}, {\"title\": \"This Tropical Virus Is Spreading Out of the Amazon to the US and Europe\", \"link\": \"https://www.wired.com/story/this-tropical-virus-is-spreading-out-of-the-amazon-to-the-us-and-europe/\", \"author\": \"Geraldine Castro\"}, {\"title\": \"CDC Confirms First US Case of Severe Bird Flu\", \"link\": \"https://www.wired.com/story/cdc-confirms-first-us-case-of-severe-bird-flu/\", \"author\": \"Emily Mullin\"}, {\"title\": \"The Study That Called Out Black Plastic Utensils Had a Major Math Error\", \"link\": \"https://www.wired.com/story/black-plastic-utensils-study-math-error-correction/\", \"author\": \"Beth Mole, Ars Technica\"}, {\"title\": \"How Christmas Trees Could Become a Source of Low-Carbon Protein\", \"link\": \"https://www.wired.com/story/how-christmas-trees-could-become-a-source-of-low-carbon-protein/\", \"author\": \"Alexa Phillips\"}, {\"title\": \"Creating a Global Package to Solve the Problem of Plastics\", \"link\": \"https://www.wired.com/story/global-plastics-treaty-united-nations/\", \"author\": \"Susan Solomon\"}, {\"title\": \"These 3 Things Are Standing in the Way of a Global Plastics Treaty\", \"link\": \"https://www.wired.com/story/these-3-things-are-standing-in-the-way-of-a-global-plastics-treaty/\", \"author\": \"Steve Fletcher and Samuel Winton\"}, {\"title\": \"Environmental Sensing Is Here, Tracking Everything from Forest Fires to Threatened Species\", \"link\": \"https://www.wired.com/story/environmental-sensing-is-here-tracking-everything-from-forest-fires-to-threatened-species/\", \"author\": \"Sabrina Weiss\"}, {\"title\": \"Generative AI and Climate Change Are on a Collision Course\", \"link\": \"https://www.wired.com/story/true-cost-generative-ai-data-centers-energy/\", \"author\": \"Sasha Luccioni\"}, {\"title\": \"Climate Change Is Destroying Monarch Butterflies’ Winter Habitat\", \"link\": \"https://www.wired.com/story/global-warming-threatens-the-monarch-butterfly-sanctuary-but-this-scientist-prepares-a-new-home-for-them/\", \"author\": \"Andrea J. Arratibel\"}, {\"title\": \"More Humanitarian Organizations Will Harness AI’s Potential\", \"link\": \"https://www.wired.com/story/humanitarian-organizations-artificial-intelligence/\", \"author\": \"David Miliband\"}, {\"title\": \"Chocolate Has a Sustainability Problem. Science Thinks It's Found the Answer\", \"link\": \"https://www.wired.com/story/chocolate-has-a-sustainability-problem-science-thinks-its-found-the-answer/\", \"author\": \"Eve Thomas\"}, {\"title\": \"We’ve Never Been Closer to Finding Life Outside Our Solar System\", \"link\": \"https://www.wired.com/story/james-webb-space-telescope-signs-of-life/\", \"author\": \"Lisa Kaltenegger\"}, {\"title\": \"The End Is Near for NASA’s Voyager Probes\", \"link\": \"https://www.wired.com/story/the-end-is-near-for-nasas-voyager-probes/\", \"author\": \"Luca Nardi\"}, {\"title\": \"Why Can’t You Switch Seats in an Empty Airplane?\", \"link\": \"https://www.wired.com/story/why-cant-you-switch-seats-in-an-empty-airplane/\", \"author\": \"Rhett Allain\"}, {\"title\": \"The Simple Math Behind Public Key Cryptography\", \"link\": \"https://www.wired.com/story/how-public-key-cryptography-really-works-using-only-simple-math/\", \"author\": \"John Pavlus\"}, {\"title\": \"Everyone Is Capable of Mathematical Thinking--Yes, Even You\", \"link\": \"https://www.wired.com/story/everyone-is-capable-of-mathematical-thinking-yes-even-you/\", \"author\": \"Kelsey Houston-Edwards\"}, {\"title\": \"The Physics of the Macy’s Thanksgiving Day Parade Balloons\", \"link\": \"https://www.wired.com/story/the-physics-of-the-macys-thanksgiving-day-parade-balloons/\", \"author\": \"Rhett Allain\"}, {\"title\": \"A Third Person Has Received a Transplant of a Genetically Engineered Pig Kidney\", \"link\": \"https://www.wired.com/story/a-third-person-has-received-a-transplant-of-a-genetically-engineered-pig-kidney/\", \"author\": \"Emily Mullin\"}, {\"title\": \"Muscle Implants Could Allow Mind-Controlled Prosthetics--No Brain Surgery Required\", \"link\": \"https://www.wired.com/story/amputees-could-control-prosthetics-with-just-their-thoughts-no-brain-surgery-required-phantom-neuro/\", \"author\": \"Emily Mullin\"}, {\"title\": \"Combining AI and Crispr Will Be Transformational\", \"link\": \"https://www.wired.com/story/combining-ai-and-crispr-will-be-transformational/\", \"author\": \"Jennifer Doudna\"}, {\"title\": \"Neuralink Plans to Test Whether Its Brain Implant Can Control a Robotic Arm\", \"link\": \"https://www.wired.com/story/neuralink-robotic-arm-controlled-by-mind/\", \"author\": \"Emily Mullin\"}, {\"title\": \"Eight Scientists, a Billion Dollars, and the Moonshot Agency Trying to Make Britain Great Again\", \"link\": \"https://www.wired.com/story/aria-moonshot-darpa-uk-britain-great-again/\", \"author\": \"Matt Reynolds\"}, {\"title\": \"The Atlas Robot Is Dead. Long Live the Atlas Robot\", \"link\": \"https://www.wired.com/story/the-atlas-robot-is-dead-long-live-the-atlas-robot/\", \"author\": \"NA\"}, {\"title\": \"The Atlas Robot Is Dead. Long Live the Atlas Robot\", \"link\": \"https://www.wired.com/story/the-atlas-robot-is-dead-long-live-the-atlas-robot/\", \"author\": \"Carlton Reid\"}, {\"title\": \"Meet the Next Generation of Doctors--and Their Surgical Robots\", \"link\": \"https://www.wired.com/story/next-generation-doctors-surgical-robots/\", \"author\": \"Neha Mukherjee\"}, {\"title\": \"AI Is Building Highly Effective Antibodies That Humans Can’t Even Imagine\", \"link\": \"https://www.wired.com/story/labgenius-antibody-factory-machine-learning/\", \"author\": \"Amit Katwala\"}, {\"title\": \"An Uncertain Future Requires Uncertain Prediction Skills\", \"link\": \"https://www.wired.com/story/embrace-uncertainty-forecasting-prediction-skills/\", \"author\": \"David Spiegelhalter\"}, {\"title\": \"These Rats Learned to Drive--and They Love It\", \"link\": \"https://www.wired.com/story/these-rats-learned-to-drive-and-they-love-it/\", \"author\": \"Kelly Lambert\"}, {\"title\": \"Scientists Are Unlocking the Secrets of Your ‗Little Brain’\", \"link\": \"https://www.wired.com/story/cerebellum-brain-movement-feelings/\", \"author\": \"R Douglas Fields\"}, {\"title\": \"Meet the Designer Behind Neuralink’s Surgical Robot\", \"link\": \"https://www.wired.com/story/designer-behind-neuralinks-surgical-robot-afshin-mehin/\", \"author\": \"Emily Mullin\"}, {\"title\": \"Antibodies Could Soon Help Slow the Aging Process\", \"link\": \"https://www.wired.com/story/antibodies-could-soon-help-slow-the-aging-process/\", \"author\": \"Andrew Steele\"}, {\"title\": \"Good at Reading? Your Brain May Be Structured Differently\", \"link\": \"https://www.wired.com/story/good-at-reading-your-brain-may-be-structured-differently/\", \"author\": \"Mikael Roll\"}, {\"title\": \"Mega-Farms Are Driving the Threat of Bird Flu\", \"link\": \"https://www.wired.com/story/mega-farms-are-driving-the-threat-of-bird-flu/\", \"author\": \"Georgina Gustin\"}, {\"title\": \"RFK Plans to Take on Big Pharma. It’s Easier Said Than Done\", \"link\": \"https://www.wired.com/story/rfks-plan-to-take-on-big-pharma/\", \"author\": \"Emily Mullin\"}, {\"title\": \"Designer Babies Are Teenagers Now--and Some of Them Need Therapy Because of It\", \"link\": \"https://www.wired.com/story/your-next-job-designer-baby-therapist/\", \"author\": \"Emi Nietfeld\"}, {\"title\": \"US Meat, Milk Prices Should Spike if Donald Trump Carries Out Mass Deportation Schemes\", \"link\": \"https://www.wired.com/story/us-meat-milk-prices-should-spike-if-donald-trump-carries-out-mass-deportation-schemes/\", \"author\": \"Matt Reynolds\"}, {\"title\": \"An Augmented Reality Program Can Help Patients Overcome Parkinson’s Symptoms\", \"link\": \"https://www.wired.com/story/lining-up-tech-to-help-banish-tremors-strolll-parkinsons/\", \"author\": \"Grace Browne\"}, {\"title\": \"Meet the Plant Hacker Creating Flowers Never Seen (or Smelled) Before\", \"link\": \"https://www.wired.com/story/meet-the-plant-hacker-creating-flowers-never-seen-or-smelled-before/\", \"author\": \"Matt Reynolds\"}, {\"title\": \"A Mysterious Respiratory Disease Has the Democratic Republic of the Congo on High Alert\", \"link\": \"https://www.wired.com/story/drc-mysterious-respiratory-disease-children-who-africa/\", \"author\": \"Marta Musso\"}, {\"title\": \"Skip the Sea Kelp Supplements\", \"link\": \"https://www.wired.com/story/pass-on-sea-kelp-supplements/\", \"author\": \"Boutayna Chokrane\"}, {\"title\": \"Why Soccer Players Are Training in the Dark\", \"link\": \"https://www.wired.com/story/why-soccer-players-are-training-in-the-dark-okkulo-football-sunderland-leeds-united-neuroscience/\", \"author\": \"RM Clark\"}, {\"title\": \"A Parasite That Eats Cattle Alive Is Creeping North Toward the US\", \"link\": \"https://www.wired.com/story/a-parasite-that-eats-cattle-alive-is-creeping-north-toward-the-us/\", \"author\": \"Geraldine Castro\"}, {\"title\": \"Lasers Are Making It Easier to Find Buried Land Mines\", \"link\": \"https://www.wired.com/story/this-laser-system-can-locate-landmines-with-high-accuracy/\", \"author\": \"Ritsuko Kawai\"}, {\"title\": \"Mark Cuban’s War on Drug Prices: ‖How Much Fucking Money Do I Need?‗\", \"link\": \"https://www.wired.com/story/big-interview-mark-cuban-2024/\", \"author\": \"Marah Eakin\"}, {\"title\": \"Can Artificial Rain, Drones, or Satellites Clean Toxic Air?\", \"link\": \"https://www.wired.com/story/artificial-rain-drones-and-satellites-can-tech-clean-indias-toxic-air/\", \"author\": \"Arunima Kar\"}, {\"title\": \"These Stem Cell Treatments Are Worth Millions. Donors Get Paid $200\", \"link\": \"https://www.wired.com/story/stem-cells-cost-rich-16500-donors-get-paid-200-cellcolabs-sweden/\", \"author\": \"Matt Reynolds\"}, {\"title\": \"The Mystery of How Supermassive Black Holes Merge\", \"link\": \"https://www.wired.com/story/how-do-merging-supermassive-black-holes-pass-the-final-parsec/\", \"author\": \"Jonathan O’Callaghan\"}, {\"title\": \"The $60 Billion Potential Hiding in Your Discarded Gadgets\", \"link\": \"https://www.wired.com/story/a-dollar60-billion-a-year-climate-solution-is-sitting-in-our-junk-drawers/\", \"author\": \"Vince Beiser\"}, {\"title\": \"Tune In to the Healing Powers of a Decent Playlist\", \"link\": \"https://www.wired.com/story/music-therapy-health-care/\", \"author\": \"Daniel Levitin\"}, {\"title\": \"Returning the Amazon Rainforest to Its True Caretakers\", \"link\": \"https://www.wired.com/story/amazon-rainforest-indigenous-peoples-justice-stewardship/\", \"author\": \"Nemonte Nenquimo and Mitch Anderson\"}]}\n"]}]},{"cell_type":"code","source":["# get last message (assuming the last one is the Smartscraper tool response)\n","result = graph.get_state(config).values[\"messages\"][-1].content\n","\n","import json\n","# convert string into json\n","result = json.loads(result)"],"metadata":{"id":"_12IqhcrkiHC","executionInfo":{"status":"ok","timestamp":1734793707438,"user_tz":-60,"elapsed":253,"user":{"displayName":"ScrapeGraphAI","userId":"10474323355016263615"}}},"execution_count":87,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"YZz1bqCIpoL8"},"source":["Print the response"]},{"cell_type":"code","execution_count":90,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"elapsed":238,"status":"ok","timestamp":1734793746145,"user":{"displayName":"ScrapeGraphAI","userId":"10474323355016263615"},"user_tz":-60},"id":"F1VfD8B4LPc8","outputId":"93746d86-6599-44b1-a052-2a83ace873e7"},"outputs":[{"output_type":"stream","name":"stdout","text":["Science News:\n","{\n"," \"news\": [\n"," {\n"," \"title\": \"December Wildfires Are Now a Thing\",\n"," \"link\": \"https://www.wired.com/story/december-wildfires-are-now-a-thing/\",\n"," \"author\": \"Kylie Mohr\"\n"," },\n"," {\n"," \"title\": \"How to Manage Food Anxiety Over the Holidays\",\n"," \"link\": \"https://www.wired.com/story/how-to-cope-with-food-anxiety-during-the-festive-season/\",\n"," \"author\": \"Alison Fixsen\"\n"," },\n"," {\n"," \"title\": \"A Spacecraft Is About to Fly Into the Sun\\u2019s Atmosphere for the First Time\",\n"," \"link\": \"https://www.wired.com/story/parker-solar-probe-atmosphere/\",\n"," \"author\": \"Eric Berger, Ars Technica\"\n"," },\n"," {\n"," \"title\": \"To Improve Your Gut Microbiome, Spend More Time in Nature\",\n"," \"link\": \"https://www.wired.com/story/to-improve-your-gut-microbiome-spend-more-time-in-nature-kathy-willis/\",\n"," \"author\": \"Kathy Willis\"\n"," },\n"," {\n"," \"title\": \"This Tropical Virus Is Spreading Out of the Amazon to the US and Europe\",\n"," \"link\": \"https://www.wired.com/story/this-tropical-virus-is-spreading-out-of-the-amazon-to-the-us-and-europe/\",\n"," \"author\": \"Geraldine Castro\"\n"," },\n"," {\n"," \"title\": \"CDC Confirms First US Case of Severe Bird Flu\",\n"," \"link\": \"https://www.wired.com/story/cdc-confirms-first-us-case-of-severe-bird-flu/\",\n"," \"author\": \"Emily Mullin\"\n"," },\n"," {\n"," \"title\": \"The Study That Called Out Black Plastic Utensils Had a Major Math Error\",\n"," \"link\": \"https://www.wired.com/story/black-plastic-utensils-study-math-error-correction/\",\n"," \"author\": \"Beth Mole, Ars Technica\"\n"," },\n"," {\n"," \"title\": \"How Christmas Trees Could Become a Source of Low-Carbon Protein\",\n"," \"link\": \"https://www.wired.com/story/how-christmas-trees-could-become-a-source-of-low-carbon-protein/\",\n"," \"author\": \"Alexa Phillips\"\n"," },\n"," {\n"," \"title\": \"Creating a Global Package to Solve the Problem of Plastics\",\n"," \"link\": \"https://www.wired.com/story/global-plastics-treaty-united-nations/\",\n"," \"author\": \"Susan Solomon\"\n"," },\n"," {\n"," \"title\": \"These 3 Things Are Standing in the Way of a Global Plastics Treaty\",\n"," \"link\": \"https://www.wired.com/story/these-3-things-are-standing-in-the-way-of-a-global-plastics-treaty/\",\n"," \"author\": \"Steve Fletcher and Samuel Winton\"\n"," },\n"," {\n"," \"title\": \"Environmental Sensing Is Here, Tracking Everything from Forest Fires to Threatened Species\",\n"," \"link\": \"https://www.wired.com/story/environmental-sensing-is-here-tracking-everything-from-forest-fires-to-threatened-species/\",\n"," \"author\": \"Sabrina Weiss\"\n"," },\n"," {\n"," \"title\": \"Generative AI and Climate Change Are on a Collision Course\",\n"," \"link\": \"https://www.wired.com/story/true-cost-generative-ai-data-centers-energy/\",\n"," \"author\": \"Sasha Luccioni\"\n"," },\n"," {\n"," \"title\": \"Climate Change Is Destroying Monarch Butterflies\\u2019 Winter Habitat\",\n"," \"link\": \"https://www.wired.com/story/global-warming-threatens-the-monarch-butterfly-sanctuary-but-this-scientist-prepares-a-new-home-for-them/\",\n"," \"author\": \"Andrea J. Arratibel\"\n"," },\n"," {\n"," \"title\": \"More Humanitarian Organizations Will Harness AI\\u2019s Potential\",\n"," \"link\": \"https://www.wired.com/story/humanitarian-organizations-artificial-intelligence/\",\n"," \"author\": \"David Miliband\"\n"," },\n"," {\n"," \"title\": \"Chocolate Has a Sustainability Problem. Science Thinks It's Found the Answer\",\n"," \"link\": \"https://www.wired.com/story/chocolate-has-a-sustainability-problem-science-thinks-its-found-the-answer/\",\n"," \"author\": \"Eve Thomas\"\n"," },\n"," {\n"," \"title\": \"We\\u2019ve Never Been Closer to Finding Life Outside Our Solar System\",\n"," \"link\": \"https://www.wired.com/story/james-webb-space-telescope-signs-of-life/\",\n"," \"author\": \"Lisa Kaltenegger\"\n"," },\n"," {\n"," \"title\": \"The End Is Near for NASA\\u2019s Voyager Probes\",\n"," \"link\": \"https://www.wired.com/story/the-end-is-near-for-nasas-voyager-probes/\",\n"," \"author\": \"Luca Nardi\"\n"," },\n"," {\n"," \"title\": \"Why Can\\u2019t You Switch Seats in an Empty Airplane?\",\n"," \"link\": \"https://www.wired.com/story/why-cant-you-switch-seats-in-an-empty-airplane/\",\n"," \"author\": \"Rhett Allain\"\n"," },\n"," {\n"," \"title\": \"The Simple Math Behind Public Key Cryptography\",\n"," \"link\": \"https://www.wired.com/story/how-public-key-cryptography-really-works-using-only-simple-math/\",\n"," \"author\": \"John Pavlus\"\n"," },\n"," {\n"," \"title\": \"Everyone Is Capable of Mathematical Thinking--Yes, Even You\",\n"," \"link\": \"https://www.wired.com/story/everyone-is-capable-of-mathematical-thinking-yes-even-you/\",\n"," \"author\": \"Kelsey Houston-Edwards\"\n"," },\n"," {\n"," \"title\": \"The Physics of the Macy\\u2019s Thanksgiving Day Parade Balloons\",\n"," \"link\": \"https://www.wired.com/story/the-physics-of-the-macys-thanksgiving-day-parade-balloons/\",\n"," \"author\": \"Rhett Allain\"\n"," },\n"," {\n"," \"title\": \"A Third Person Has Received a Transplant of a Genetically Engineered Pig Kidney\",\n"," \"link\": \"https://www.wired.com/story/a-third-person-has-received-a-transplant-of-a-genetically-engineered-pig-kidney/\",\n"," \"author\": \"Emily Mullin\"\n"," },\n"," {\n"," \"title\": \"Muscle Implants Could Allow Mind-Controlled Prosthetics--No Brain Surgery Required\",\n"," \"link\": \"https://www.wired.com/story/amputees-could-control-prosthetics-with-just-their-thoughts-no-brain-surgery-required-phantom-neuro/\",\n"," \"author\": \"Emily Mullin\"\n"," },\n"," {\n"," \"title\": \"Combining AI and Crispr Will Be Transformational\",\n"," \"link\": \"https://www.wired.com/story/combining-ai-and-crispr-will-be-transformational/\",\n"," \"author\": \"Jennifer Doudna\"\n"," },\n"," {\n"," \"title\": \"Neuralink Plans to Test Whether Its Brain Implant Can Control a Robotic Arm\",\n"," \"link\": \"https://www.wired.com/story/neuralink-robotic-arm-controlled-by-mind/\",\n"," \"author\": \"Emily Mullin\"\n"," },\n"," {\n"," \"title\": \"Eight Scientists, a Billion Dollars, and the Moonshot Agency Trying to Make Britain Great Again\",\n"," \"link\": \"https://www.wired.com/story/aria-moonshot-darpa-uk-britain-great-again/\",\n"," \"author\": \"Matt Reynolds\"\n"," },\n"," {\n"," \"title\": \"The Atlas Robot Is Dead. Long Live the Atlas Robot\",\n"," \"link\": \"https://www.wired.com/story/the-atlas-robot-is-dead-long-live-the-atlas-robot/\",\n"," \"author\": \"NA\"\n"," },\n"," {\n"," \"title\": \"The Atlas Robot Is Dead. Long Live the Atlas Robot\",\n"," \"link\": \"https://www.wired.com/story/the-atlas-robot-is-dead-long-live-the-atlas-robot/\",\n"," \"author\": \"Carlton Reid\"\n"," },\n"," {\n"," \"title\": \"Meet the Next Generation of Doctors--and Their Surgical Robots\",\n"," \"link\": \"https://www.wired.com/story/next-generation-doctors-surgical-robots/\",\n"," \"author\": \"Neha Mukherjee\"\n"," },\n"," {\n"," \"title\": \"AI Is Building Highly Effective Antibodies That Humans Can\\u2019t Even Imagine\",\n"," \"link\": \"https://www.wired.com/story/labgenius-antibody-factory-machine-learning/\",\n"," \"author\": \"Amit Katwala\"\n"," },\n"," {\n"," \"title\": \"An Uncertain Future Requires Uncertain Prediction Skills\",\n"," \"link\": \"https://www.wired.com/story/embrace-uncertainty-forecasting-prediction-skills/\",\n"," \"author\": \"David Spiegelhalter\"\n"," },\n"," {\n"," \"title\": \"These Rats Learned to Drive--and They Love It\",\n"," \"link\": \"https://www.wired.com/story/these-rats-learned-to-drive-and-they-love-it/\",\n"," \"author\": \"Kelly Lambert\"\n"," },\n"," {\n"," \"title\": \"Scientists Are Unlocking the Secrets of Your \\u2017Little Brain\\u2019\",\n"," \"link\": \"https://www.wired.com/story/cerebellum-brain-movement-feelings/\",\n"," \"author\": \"R Douglas Fields\"\n"," },\n"," {\n"," \"title\": \"Meet the Designer Behind Neuralink\\u2019s Surgical Robot\",\n"," \"link\": \"https://www.wired.com/story/designer-behind-neuralinks-surgical-robot-afshin-mehin/\",\n"," \"author\": \"Emily Mullin\"\n"," },\n"," {\n"," \"title\": \"Antibodies Could Soon Help Slow the Aging Process\",\n"," \"link\": \"https://www.wired.com/story/antibodies-could-soon-help-slow-the-aging-process/\",\n"," \"author\": \"Andrew Steele\"\n"," },\n"," {\n"," \"title\": \"Good at Reading? Your Brain May Be Structured Differently\",\n"," \"link\": \"https://www.wired.com/story/good-at-reading-your-brain-may-be-structured-differently/\",\n"," \"author\": \"Mikael Roll\"\n"," },\n"," {\n"," \"title\": \"Mega-Farms Are Driving the Threat of Bird Flu\",\n"," \"link\": \"https://www.wired.com/story/mega-farms-are-driving-the-threat-of-bird-flu/\",\n"," \"author\": \"Georgina Gustin\"\n"," },\n"," {\n"," \"title\": \"RFK Plans to Take on Big Pharma. It\\u2019s Easier Said Than Done\",\n"," \"link\": \"https://www.wired.com/story/rfks-plan-to-take-on-big-pharma/\",\n"," \"author\": \"Emily Mullin\"\n"," },\n"," {\n"," \"title\": \"Designer Babies Are Teenagers Now--and Some of Them Need Therapy Because of It\",\n"," \"link\": \"https://www.wired.com/story/your-next-job-designer-baby-therapist/\",\n"," \"author\": \"Emi Nietfeld\"\n"," },\n"," {\n"," \"title\": \"US Meat, Milk Prices Should Spike if Donald Trump Carries Out Mass Deportation Schemes\",\n"," \"link\": \"https://www.wired.com/story/us-meat-milk-prices-should-spike-if-donald-trump-carries-out-mass-deportation-schemes/\",\n"," \"author\": \"Matt Reynolds\"\n"," },\n"," {\n"," \"title\": \"An Augmented Reality Program Can Help Patients Overcome Parkinson\\u2019s Symptoms\",\n"," \"link\": \"https://www.wired.com/story/lining-up-tech-to-help-banish-tremors-strolll-parkinsons/\",\n"," \"author\": \"Grace Browne\"\n"," },\n"," {\n"," \"title\": \"Meet the Plant Hacker Creating Flowers Never Seen (or Smelled) Before\",\n"," \"link\": \"https://www.wired.com/story/meet-the-plant-hacker-creating-flowers-never-seen-or-smelled-before/\",\n"," \"author\": \"Matt Reynolds\"\n"," },\n"," {\n"," \"title\": \"A Mysterious Respiratory Disease Has the Democratic Republic of the Congo on High Alert\",\n"," \"link\": \"https://www.wired.com/story/drc-mysterious-respiratory-disease-children-who-africa/\",\n"," \"author\": \"Marta Musso\"\n"," },\n"," {\n"," \"title\": \"Skip the Sea Kelp Supplements\",\n"," \"link\": \"https://www.wired.com/story/pass-on-sea-kelp-supplements/\",\n"," \"author\": \"Boutayna Chokrane\"\n"," },\n"," {\n"," \"title\": \"Why Soccer Players Are Training in the Dark\",\n"," \"link\": \"https://www.wired.com/story/why-soccer-players-are-training-in-the-dark-okkulo-football-sunderland-leeds-united-neuroscience/\",\n"," \"author\": \"RM Clark\"\n"," },\n"," {\n"," \"title\": \"A Parasite That Eats Cattle Alive Is Creeping North Toward the US\",\n"," \"link\": \"https://www.wired.com/story/a-parasite-that-eats-cattle-alive-is-creeping-north-toward-the-us/\",\n"," \"author\": \"Geraldine Castro\"\n"," },\n"," {\n"," \"title\": \"Lasers Are Making It Easier to Find Buried Land Mines\",\n"," \"link\": \"https://www.wired.com/story/this-laser-system-can-locate-landmines-with-high-accuracy/\",\n"," \"author\": \"Ritsuko Kawai\"\n"," },\n"," {\n"," \"title\": \"Mark Cuban\\u2019s War on Drug Prices: \\u2016How Much Fucking Money Do I Need?\\u2017\",\n"," \"link\": \"https://www.wired.com/story/big-interview-mark-cuban-2024/\",\n"," \"author\": \"Marah Eakin\"\n"," },\n"," {\n"," \"title\": \"Can Artificial Rain, Drones, or Satellites Clean Toxic Air?\",\n"," \"link\": \"https://www.wired.com/story/artificial-rain-drones-and-satellites-can-tech-clean-indias-toxic-air/\",\n"," \"author\": \"Arunima Kar\"\n"," },\n"," {\n"," \"title\": \"These Stem Cell Treatments Are Worth Millions. Donors Get Paid $200\",\n"," \"link\": \"https://www.wired.com/story/stem-cells-cost-rich-16500-donors-get-paid-200-cellcolabs-sweden/\",\n"," \"author\": \"Matt Reynolds\"\n"," },\n"," {\n"," \"title\": \"The Mystery of How Supermassive Black Holes Merge\",\n"," \"link\": \"https://www.wired.com/story/how-do-merging-supermassive-black-holes-pass-the-final-parsec/\",\n"," \"author\": \"Jonathan O\\u2019Callaghan\"\n"," },\n"," {\n"," \"title\": \"The $60 Billion Potential Hiding in Your Discarded Gadgets\",\n"," \"link\": \"https://www.wired.com/story/a-dollar60-billion-a-year-climate-solution-is-sitting-in-our-junk-drawers/\",\n"," \"author\": \"Vince Beiser\"\n"," },\n"," {\n"," \"title\": \"Tune In to the Healing Powers of a Decent Playlist\",\n"," \"link\": \"https://www.wired.com/story/music-therapy-health-care/\",\n"," \"author\": \"Daniel Levitin\"\n"," },\n"," {\n"," \"title\": \"Returning the Amazon Rainforest to Its True Caretakers\",\n"," \"link\": \"https://www.wired.com/story/amazon-rainforest-indigenous-peoples-justice-stewardship/\",\n"," \"author\": \"Nemonte Nenquimo and Mitch Anderson\"\n"," }\n"," ]\n","}\n"]}],"source":["import json\n","\n","print(\"Science News:\")\n","print(json.dumps(result, indent=2))"]},{"cell_type":"markdown","metadata":{"id":"2as65QLypwdb"},"source":["### 💾 Save the output to a `CSV` file"]},{"cell_type":"markdown","metadata":{"id":"HTLVFgbVLLBR"},"source":["Let's create a pandas dataframe and show the table with the extracted content"]},{"cell_type":"code","execution_count":91,"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":1000},"executionInfo":{"elapsed":242,"status":"ok","timestamp":1734793782979,"user":{"displayName":"ScrapeGraphAI","userId":"10474323355016263615"},"user_tz":-60},"id":"1lS9O1KOI51y","outputId":"c54e03f8-0441-4e10-db59-6a0c17470abb"},"outputs":[{"output_type":"execute_result","data":{"text/plain":[" title \\\n","0 December Wildfires Are Now a Thing \n","1 How to Manage Food Anxiety Over the Holidays \n","2 A Spacecraft Is About to Fly Into the Sun’s At... \n","3 To Improve Your Gut Microbiome, Spend More Tim... \n","4 This Tropical Virus Is Spreading Out of the Am... \n","5 CDC Confirms First US Case of Severe Bird Flu \n","6 The Study That Called Out Black Plastic Utensi... \n","7 How Christmas Trees Could Become a Source of L... \n","8 Creating a Global Package to Solve the Problem... \n","9 These 3 Things Are Standing in the Way of a Gl... \n","10 Environmental Sensing Is Here, Tracking Everyt... \n","11 Generative AI and Climate Change Are on a Coll... \n","12 Climate Change Is Destroying Monarch Butterfli... \n","13 More Humanitarian Organizations Will Harness A... \n","14 Chocolate Has a Sustainability Problem. Scienc... \n","15 We’ve Never Been Closer to Finding Life Outsid... \n","16 The End Is Near for NASA’s Voyager Probes \n","17 Why Can’t You Switch Seats in an Empty Airplane? \n","18 The Simple Math Behind Public Key Cryptography \n","19 Everyone Is Capable of Mathematical Thinking--... \n","20 The Physics of the Macy’s Thanksgiving Day Par... \n","21 A Third Person Has Received a Transplant of a ... \n","22 Muscle Implants Could Allow Mind-Controlled Pr... \n","23 Combining AI and Crispr Will Be Transformational \n","24 Neuralink Plans to Test Whether Its Brain Impl... \n","25 Eight Scientists, a Billion Dollars, and the M... \n","26 The Atlas Robot Is Dead. Long Live the Atlas R... \n","27 The Atlas Robot Is Dead. Long Live the Atlas R... \n","28 Meet the Next Generation of Doctors--and Their... \n","29 AI Is Building Highly Effective Antibodies Tha... \n","30 An Uncertain Future Requires Uncertain Predict... \n","31 These Rats Learned to Drive--and They Love It \n","32 Scientists Are Unlocking the Secrets of Your ‗... \n","33 Meet the Designer Behind Neuralink’s Surgical ... \n","34 Antibodies Could Soon Help Slow the Aging Process \n","35 Good at Reading? Your Brain May Be Structured ... \n","36 Mega-Farms Are Driving the Threat of Bird Flu \n","37 RFK Plans to Take on Big Pharma. It’s Easier S... \n","38 Designer Babies Are Teenagers Now--and Some of... \n","39 US Meat, Milk Prices Should Spike if Donald Tr... \n","40 An Augmented Reality Program Can Help Patients... \n","41 Meet the Plant Hacker Creating Flowers Never S... \n","42 A Mysterious Respiratory Disease Has the Democ... \n","43 Skip the Sea Kelp Supplements \n","44 Why Soccer Players Are Training in the Dark \n","45 A Parasite That Eats Cattle Alive Is Creeping ... \n","46 Lasers Are Making It Easier to Find Buried Lan... \n","47 Mark Cuban’s War on Drug Prices: ‖How Much Fuc... \n","48 Can Artificial Rain, Drones, or Satellites Cle... \n","49 These Stem Cell Treatments Are Worth Millions.... \n","50 The Mystery of How Supermassive Black Holes Merge \n","51 The $60 Billion Potential Hiding in Your Disca... \n","52 Tune In to the Healing Powers of a Decent Play... \n","53 Returning the Amazon Rainforest to Its True Ca... \n","\n"," link \\\n","0 https://www.wired.com/story/december-wildfires... \n","1 https://www.wired.com/story/how-to-cope-with-f... \n","2 https://www.wired.com/story/parker-solar-probe... \n","3 https://www.wired.com/story/to-improve-your-gu... \n","4 https://www.wired.com/story/this-tropical-viru... \n","5 https://www.wired.com/story/cdc-confirms-first... \n","6 https://www.wired.com/story/black-plastic-uten... \n","7 https://www.wired.com/story/how-christmas-tree... \n","8 https://www.wired.com/story/global-plastics-tr... \n","9 https://www.wired.com/story/these-3-things-are... \n","10 https://www.wired.com/story/environmental-sens... \n","11 https://www.wired.com/story/true-cost-generati... \n","12 https://www.wired.com/story/global-warming-thr... \n","13 https://www.wired.com/story/humanitarian-organ... \n","14 https://www.wired.com/story/chocolate-has-a-su... \n","15 https://www.wired.com/story/james-webb-space-t... \n","16 https://www.wired.com/story/the-end-is-near-fo... \n","17 https://www.wired.com/story/why-cant-you-switc... \n","18 https://www.wired.com/story/how-public-key-cry... \n","19 https://www.wired.com/story/everyone-is-capabl... \n","20 https://www.wired.com/story/the-physics-of-the... \n","21 https://www.wired.com/story/a-third-person-has... \n","22 https://www.wired.com/story/amputees-could-con... \n","23 https://www.wired.com/story/combining-ai-and-c... \n","24 https://www.wired.com/story/neuralink-robotic-... \n","25 https://www.wired.com/story/aria-moonshot-darp... \n","26 https://www.wired.com/story/the-atlas-robot-is... \n","27 https://www.wired.com/story/the-atlas-robot-is... \n","28 https://www.wired.com/story/next-generation-do... \n","29 https://www.wired.com/story/labgenius-antibody... \n","30 https://www.wired.com/story/embrace-uncertaint... \n","31 https://www.wired.com/story/these-rats-learned... \n","32 https://www.wired.com/story/cerebellum-brain-m... \n","33 https://www.wired.com/story/designer-behind-ne... \n","34 https://www.wired.com/story/antibodies-could-s... \n","35 https://www.wired.com/story/good-at-reading-yo... \n","36 https://www.wired.com/story/mega-farms-are-dri... \n","37 https://www.wired.com/story/rfks-plan-to-take-... \n","38 https://www.wired.com/story/your-next-job-desi... \n","39 https://www.wired.com/story/us-meat-milk-price... \n","40 https://www.wired.com/story/lining-up-tech-to-... \n","41 https://www.wired.com/story/meet-the-plant-hac... \n","42 https://www.wired.com/story/drc-mysterious-res... \n","43 https://www.wired.com/story/pass-on-sea-kelp-s... \n","44 https://www.wired.com/story/why-soccer-players... \n","45 https://www.wired.com/story/a-parasite-that-ea... \n","46 https://www.wired.com/story/this-laser-system-... \n","47 https://www.wired.com/story/big-interview-mark... \n","48 https://www.wired.com/story/artificial-rain-dr... \n","49 https://www.wired.com/story/stem-cells-cost-ri... \n","50 https://www.wired.com/story/how-do-merging-sup... \n","51 https://www.wired.com/story/a-dollar60-billion... \n","52 https://www.wired.com/story/music-therapy-heal... \n","53 https://www.wired.com/story/amazon-rainforest-... \n","\n"," author \n","0 Kylie Mohr \n","1 Alison Fixsen \n","2 Eric Berger, Ars Technica \n","3 Kathy Willis \n","4 Geraldine Castro \n","5 Emily Mullin \n","6 Beth Mole, Ars Technica \n","7 Alexa Phillips \n","8 Susan Solomon \n","9 Steve Fletcher and Samuel Winton \n","10 Sabrina Weiss \n","11 Sasha Luccioni \n","12 Andrea J. Arratibel \n","13 David Miliband \n","14 Eve Thomas \n","15 Lisa Kaltenegger \n","16 Luca Nardi \n","17 Rhett Allain \n","18 John Pavlus \n","19 Kelsey Houston-Edwards \n","20 Rhett Allain \n","21 Emily Mullin \n","22 Emily Mullin \n","23 Jennifer Doudna \n","24 Emily Mullin \n","25 Matt Reynolds \n","26 NA \n","27 Carlton Reid \n","28 Neha Mukherjee \n","29 Amit Katwala \n","30 David Spiegelhalter \n","31 Kelly Lambert \n","32 R Douglas Fields \n","33 Emily Mullin \n","34 Andrew Steele \n","35 Mikael Roll \n","36 Georgina Gustin \n","37 Emily Mullin \n","38 Emi Nietfeld \n","39 Matt Reynolds \n","40 Grace Browne \n","41 Matt Reynolds \n","42 Marta Musso \n","43 Boutayna Chokrane \n","44 RM Clark \n","45 Geraldine Castro \n","46 Ritsuko Kawai \n","47 Marah Eakin \n","48 Arunima Kar \n","49 Matt Reynolds \n","50 Jonathan O’Callaghan \n","51 Vince Beiser \n","52 Daniel Levitin \n","53 Nemonte Nenquimo and Mitch Anderson "],"text/html":["\n"," \n","
\n","\n","
\n"," \n"," \n"," | \n"," title | \n"," link | \n"," author | \n","
\n"," \n"," \n"," \n"," 0 | \n"," December Wildfires Are Now a Thing | \n"," https://www.wired.com/story/december-wildfires... | \n"," Kylie Mohr | \n","
\n"," \n"," 1 | \n"," How to Manage Food Anxiety Over the Holidays | \n"," https://www.wired.com/story/how-to-cope-with-f... | \n"," Alison Fixsen | \n","
\n"," \n"," 2 | \n"," A Spacecraft Is About to Fly Into the Sun’s At... | \n"," https://www.wired.com/story/parker-solar-probe... | \n"," Eric Berger, Ars Technica | \n","
\n"," \n"," 3 | \n"," To Improve Your Gut Microbiome, Spend More Tim... | \n"," https://www.wired.com/story/to-improve-your-gu... | \n"," Kathy Willis | \n","
\n"," \n"," 4 | \n"," This Tropical Virus Is Spreading Out of the Am... | \n"," https://www.wired.com/story/this-tropical-viru... | \n"," Geraldine Castro | \n","
\n"," \n"," 5 | \n"," CDC Confirms First US Case of Severe Bird Flu | \n"," https://www.wired.com/story/cdc-confirms-first... | \n"," Emily Mullin | \n","
\n"," \n"," 6 | \n"," The Study That Called Out Black Plastic Utensi... | \n"," https://www.wired.com/story/black-plastic-uten... | \n"," Beth Mole, Ars Technica | \n","
\n"," \n"," 7 | \n"," How Christmas Trees Could Become a Source of L... | \n"," https://www.wired.com/story/how-christmas-tree... | \n"," Alexa Phillips | \n","
\n"," \n"," 8 | \n"," Creating a Global Package to Solve the Problem... | \n"," https://www.wired.com/story/global-plastics-tr... | \n"," Susan Solomon | \n","
\n"," \n"," 9 | \n"," These 3 Things Are Standing in the Way of a Gl... | \n"," https://www.wired.com/story/these-3-things-are... | \n"," Steve Fletcher and Samuel Winton | \n","
\n"," \n"," 10 | \n"," Environmental Sensing Is Here, Tracking Everyt... | \n"," https://www.wired.com/story/environmental-sens... | \n"," Sabrina Weiss | \n","
\n"," \n"," 11 | \n"," Generative AI and Climate Change Are on a Coll... | \n"," https://www.wired.com/story/true-cost-generati... | \n"," Sasha Luccioni | \n","
\n"," \n"," 12 | \n"," Climate Change Is Destroying Monarch Butterfli... | \n"," https://www.wired.com/story/global-warming-thr... | \n"," Andrea J. Arratibel | \n","
\n"," \n"," 13 | \n"," More Humanitarian Organizations Will Harness A... | \n"," https://www.wired.com/story/humanitarian-organ... | \n"," David Miliband | \n","
\n"," \n"," 14 | \n"," Chocolate Has a Sustainability Problem. Scienc... | \n"," https://www.wired.com/story/chocolate-has-a-su... | \n"," Eve Thomas | \n","
\n"," \n"," 15 | \n"," We’ve Never Been Closer to Finding Life Outsid... | \n"," https://www.wired.com/story/james-webb-space-t... | \n"," Lisa Kaltenegger | \n","
\n"," \n"," 16 | \n"," The End Is Near for NASA’s Voyager Probes | \n"," https://www.wired.com/story/the-end-is-near-fo... | \n"," Luca Nardi | \n","
\n"," \n"," 17 | \n"," Why Can’t You Switch Seats in an Empty Airplane? | \n"," https://www.wired.com/story/why-cant-you-switc... | \n"," Rhett Allain | \n","
\n"," \n"," 18 | \n"," The Simple Math Behind Public Key Cryptography | \n"," https://www.wired.com/story/how-public-key-cry... | \n"," John Pavlus | \n","
\n"," \n"," 19 | \n"," Everyone Is Capable of Mathematical Thinking--... | \n"," https://www.wired.com/story/everyone-is-capabl... | \n"," Kelsey Houston-Edwards | \n","
\n"," \n"," 20 | \n"," The Physics of the Macy’s Thanksgiving Day Par... | \n"," https://www.wired.com/story/the-physics-of-the... | \n"," Rhett Allain | \n","
\n"," \n"," 21 | \n"," A Third Person Has Received a Transplant of a ... | \n"," https://www.wired.com/story/a-third-person-has... | \n"," Emily Mullin | \n","
\n"," \n"," 22 | \n"," Muscle Implants Could Allow Mind-Controlled Pr... | \n"," https://www.wired.com/story/amputees-could-con... | \n"," Emily Mullin | \n","
\n"," \n"," 23 | \n"," Combining AI and Crispr Will Be Transformational | \n"," https://www.wired.com/story/combining-ai-and-c... | \n"," Jennifer Doudna | \n","
\n"," \n"," 24 | \n"," Neuralink Plans to Test Whether Its Brain Impl... | \n"," https://www.wired.com/story/neuralink-robotic-... | \n"," Emily Mullin | \n","
\n"," \n"," 25 | \n"," Eight Scientists, a Billion Dollars, and the M... | \n"," https://www.wired.com/story/aria-moonshot-darp... | \n"," Matt Reynolds | \n","
\n"," \n"," 26 | \n"," The Atlas Robot Is Dead. Long Live the Atlas R... | \n"," https://www.wired.com/story/the-atlas-robot-is... | \n"," NA | \n","
\n"," \n"," 27 | \n"," The Atlas Robot Is Dead. Long Live the Atlas R... | \n"," https://www.wired.com/story/the-atlas-robot-is... | \n"," Carlton Reid | \n","
\n"," \n"," 28 | \n"," Meet the Next Generation of Doctors--and Their... | \n"," https://www.wired.com/story/next-generation-do... | \n"," Neha Mukherjee | \n","
\n"," \n"," 29 | \n"," AI Is Building Highly Effective Antibodies Tha... | \n"," https://www.wired.com/story/labgenius-antibody... | \n"," Amit Katwala | \n","
\n"," \n"," 30 | \n"," An Uncertain Future Requires Uncertain Predict... | \n"," https://www.wired.com/story/embrace-uncertaint... | \n"," David Spiegelhalter | \n","
\n"," \n"," 31 | \n"," These Rats Learned to Drive--and They Love It | \n"," https://www.wired.com/story/these-rats-learned... | \n"," Kelly Lambert | \n","
\n"," \n"," 32 | \n"," Scientists Are Unlocking the Secrets of Your ‗... | \n"," https://www.wired.com/story/cerebellum-brain-m... | \n"," R Douglas Fields | \n","
\n"," \n"," 33 | \n"," Meet the Designer Behind Neuralink’s Surgical ... | \n"," https://www.wired.com/story/designer-behind-ne... | \n"," Emily Mullin | \n","
\n"," \n"," 34 | \n"," Antibodies Could Soon Help Slow the Aging Process | \n"," https://www.wired.com/story/antibodies-could-s... | \n"," Andrew Steele | \n","
\n"," \n"," 35 | \n"," Good at Reading? Your Brain May Be Structured ... | \n"," https://www.wired.com/story/good-at-reading-yo... | \n"," Mikael Roll | \n","
\n"," \n"," 36 | \n"," Mega-Farms Are Driving the Threat of Bird Flu | \n"," https://www.wired.com/story/mega-farms-are-dri... | \n"," Georgina Gustin | \n","
\n"," \n"," 37 | \n"," RFK Plans to Take on Big Pharma. It’s Easier S... | \n"," https://www.wired.com/story/rfks-plan-to-take-... | \n"," Emily Mullin | \n","
\n"," \n"," 38 | \n"," Designer Babies Are Teenagers Now--and Some of... | \n"," https://www.wired.com/story/your-next-job-desi... | \n"," Emi Nietfeld | \n","
\n"," \n"," 39 | \n"," US Meat, Milk Prices Should Spike if Donald Tr... | \n"," https://www.wired.com/story/us-meat-milk-price... | \n"," Matt Reynolds | \n","
\n"," \n"," 40 | \n"," An Augmented Reality Program Can Help Patients... | \n"," https://www.wired.com/story/lining-up-tech-to-... | \n"," Grace Browne | \n","
\n"," \n"," 41 | \n"," Meet the Plant Hacker Creating Flowers Never S... | \n"," https://www.wired.com/story/meet-the-plant-hac... | \n"," Matt Reynolds | \n","
\n"," \n"," 42 | \n"," A Mysterious Respiratory Disease Has the Democ... | \n"," https://www.wired.com/story/drc-mysterious-res... | \n"," Marta Musso | \n","
\n"," \n"," 43 | \n"," Skip the Sea Kelp Supplements | \n"," https://www.wired.com/story/pass-on-sea-kelp-s... | \n"," Boutayna Chokrane | \n","
\n"," \n"," 44 | \n"," Why Soccer Players Are Training in the Dark | \n"," https://www.wired.com/story/why-soccer-players... | \n"," RM Clark | \n","
\n"," \n"," 45 | \n"," A Parasite That Eats Cattle Alive Is Creeping ... | \n"," https://www.wired.com/story/a-parasite-that-ea... | \n"," Geraldine Castro | \n","
\n"," \n"," 46 | \n"," Lasers Are Making It Easier to Find Buried Lan... | \n"," https://www.wired.com/story/this-laser-system-... | \n"," Ritsuko Kawai | \n","
\n"," \n"," 47 | \n"," Mark Cuban’s War on Drug Prices: ‖How Much Fuc... | \n"," https://www.wired.com/story/big-interview-mark... | \n"," Marah Eakin | \n","
\n"," \n"," 48 | \n"," Can Artificial Rain, Drones, or Satellites Cle... | \n"," https://www.wired.com/story/artificial-rain-dr... | \n"," Arunima Kar | \n","
\n"," \n"," 49 | \n"," These Stem Cell Treatments Are Worth Millions.... | \n"," https://www.wired.com/story/stem-cells-cost-ri... | \n"," Matt Reynolds | \n","
\n"," \n"," 50 | \n"," The Mystery of How Supermassive Black Holes Merge | \n"," https://www.wired.com/story/how-do-merging-sup... | \n"," Jonathan O’Callaghan | \n","
\n"," \n"," 51 | \n"," The $60 Billion Potential Hiding in Your Disca... | \n"," https://www.wired.com/story/a-dollar60-billion... | \n"," Vince Beiser | \n","
\n"," \n"," 52 | \n"," Tune In to the Healing Powers of a Decent Play... | \n"," https://www.wired.com/story/music-therapy-heal... | \n"," Daniel Levitin | \n","
\n"," \n"," 53 | \n"," Returning the Amazon Rainforest to Its True Ca... | \n"," https://www.wired.com/story/amazon-rainforest-... | \n"," Nemonte Nenquimo and Mitch Anderson | \n","
\n"," \n","
\n","
\n","
\n","
\n"],"application/vnd.google.colaboratory.intrinsic+json":{"type":"dataframe","variable_name":"df","summary":"{\n \"name\": \"df\",\n \"rows\": 54,\n \"fields\": [\n {\n \"column\": \"title\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 53,\n \"samples\": [\n \"Everyone Is Capable of Mathematical Thinking--Yes, Even You\",\n \"A Mysterious Respiratory Disease Has the Democratic Republic of the Congo on High Alert\",\n \"Can Artificial Rain, Drones, or Satellites Clean Toxic Air?\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"link\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 53,\n \"samples\": [\n \"https://www.wired.com/story/everyone-is-capable-of-mathematical-thinking-yes-even-you/\",\n \"https://www.wired.com/story/drc-mysterious-respiratory-disease-children-who-africa/\",\n \"https://www.wired.com/story/artificial-rain-drones-and-satellites-can-tech-clean-indias-toxic-air/\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"author\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 44,\n \"samples\": [\n \"Ritsuko Kawai\",\n \"Neha Mukherjee\",\n \"Amit Katwala\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}"}},"metadata":{},"execution_count":91}],"source":["import pandas as pd\n","\n","# Convert dictionary to DataFrame\n","df = pd.DataFrame(result[\"news\"])\n","df"]},{"cell_type":"markdown","metadata":{"id":"v0CBYVk7qA5Z"},"source":["Save it to CSV"]},{"cell_type":"code","execution_count":92,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"elapsed":227,"status":"ok","timestamp":1734793788052,"user":{"displayName":"ScrapeGraphAI","userId":"10474323355016263615"},"user_tz":-60},"id":"BtEbB9pmQGhO","outputId":"8609d8c0-adee-49ae-9f2c-ca427de06bc3"},"outputs":[{"output_type":"stream","name":"stdout","text":["Data saved to wired_news.csv\n"]}],"source":["# Save the DataFrame to a CSV file\n","csv_file = \"wired_news.csv\"\n","df.to_csv(csv_file, index=False)\n","print(f\"Data saved to {csv_file}\")"]},{"cell_type":"markdown","metadata":{"id":"-1SZT8VzTZNd"},"source":["## 🔗 Resources"]},{"cell_type":"markdown","metadata":{"id":"dUi2LtMLRDDR"},"source":["\n","\n","
\n","
\n","\n","\n","- 🚀 **Get your API Key:** [ScrapeGraphAI Dashboard](https://dashboard.scrapegraphai.com) \n","- 🐙 **GitHub:** [ScrapeGraphAI GitHub](https://github.com/scrapegraphai) \n","- 💼 **LinkedIn:** [ScrapeGraphAI LinkedIn](https://www.linkedin.com/company/scrapegraphai/) \n","- 🐦 **Twitter:** [ScrapeGraphAI Twitter](https://twitter.com/scrapegraphai) \n","- 💬 **Discord:** [Join our Discord Community](https://discord.gg/uJN7TYcpNa) \n","- 🦜 **Langchain:** [ScrapeGraph docs](https://python.langchain.com/docs/integrations/tools/scrapegraph/)\n","\n","Made with ❤️ by the [ScrapeGraphAI](https://scrapegraphai.com) Team \n"]}],"metadata":{"colab":{"provenance":[],"authorship_tag":"ABX9TyMIy/dnR3c9alpVvODp/1Ia"},"kernelspec":{"display_name":"Python 3","name":"python3"},"language_info":{"name":"python"}},"nbformat":4,"nbformat_minor":0}
\ No newline at end of file
+{"cells":[{"cell_type":"markdown","metadata":{"id":"ReBHQ5_834pZ"},"source":["\n","
\n",""]},{"cell_type":"markdown","metadata":{"id":"jEkuKbcRrPcK"},"source":["## 🕷️ Extract Wired Science News with `langchain-scrapegraph` and `langgraph`"]},{"cell_type":"markdown","metadata":{"id":"FxtXj1Qtx3zH"},"source":[""]},{"cell_type":"markdown","metadata":{"id":"IzsyDXEWwPVt"},"source":["### 🔧 Install `dependencies`"]},{"cell_type":"code","execution_count":1,"metadata":{"executionInfo":{"elapsed":13752,"status":"ok","timestamp":1734786986883,"user":{"displayName":"ScrapeGraphAI","userId":"10474323355016263615"},"user_tz":-60},"id":"os_vm0MkIxr9"},"outputs":[],"source":["%%capture\n","!pip install llama-index\n","!pip install llama-index-tools-scrapegraphai "]},{"cell_type":"markdown","metadata":{"id":"apBsL-L2KzM7"},"source":["### 🔑 Import `ScrapeGraph` and `OpenAI` API keys"]},{"cell_type":"markdown","metadata":{"id":"ol9gQbAFkh9b"},"source":["You can find the Scrapegraph API key [here](https://dashboard.scrapegraphai.com/)"]},{"cell_type":"code","execution_count":1,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"elapsed":67210,"status":"ok","timestamp":1734787067908,"user":{"displayName":"ScrapeGraphAI","userId":"10474323355016263615"},"user_tz":-60},"id":"sffqFG2EJ8bI","outputId":"8c3dd34f-22d5-4557-c562-97728738726b"},"outputs":[{"name":"stdout","output_type":"stream","text":["SGAI_API_KEY not found in environment.\n","SGAI_API_KEY has been set in the environment.\n"]}],"source":["import os\n","from getpass import getpass\n","\n","# Check if the API key is already set in the environment\n","sgai_api_key = os.getenv(\"SGAI_API_KEY\")\n","\n","if sgai_api_key:\n"," print(\"SGAI_API_KEY found in environment.\")\n","else:\n"," print(\"SGAI_API_KEY not found in environment.\")\n"," # Prompt the user to input the API key securely (hidden input)\n"," sgai_api_key = getpass(\"Please enter your SGAI_API_KEY: \").strip()\n"," if sgai_api_key:\n"," # Set the API key in the environment\n"," os.environ[\"SGAI_API_KEY\"] = sgai_api_key\n"," print(\"SGAI_API_KEY has been set in the environment.\")\n"," else:\n"," print(\"No API key entered. Please set the API key to continue.\")\n"]},{"cell_type":"markdown","metadata":{"id":"jnqMB2-xVYQ7"},"source":["### 📝 Defining an `Output Schema` for Webpage Content Extraction\n"]},{"cell_type":"markdown","metadata":{"id":"VZvxbjfXvbgd"},"source":["If you already know what you want to extract from a webpage, you can **define an output schema** using **Pydantic**. This schema acts as a \"blueprint\" that tells the AI how to structure the response.\n","\n","\n"," Pydantic Schema Quick Guide
\n","\n","Types of Schemas \n","\n","1. Simple Schema \n","Use this when you want to extract straightforward information, such as a single piece of content. \n","\n","```python\n","from pydantic import BaseModel, Field\n","\n","# Simple schema for a single webpage\n","class PageInfoSchema(BaseModel):\n"," title: str = Field(description=\"The title of the webpage\")\n"," description: str = Field(description=\"The description of the webpage\")\n","\n","# Example Output JSON after AI extraction\n","{\n"," \"title\": \"ScrapeGraphAI: The Best Content Extraction Tool\",\n"," \"description\": \"ScrapeGraphAI provides powerful tools for structured content extraction from websites.\"\n","}\n","```\n","\n","2. Complex Schema (Nested) \n","If you need to extract structured information with multiple related items (like a list of repositories), you can **nest schemas**.\n","\n","```python\n","from pydantic import BaseModel, Field\n","from typing import List\n","\n","# Define a schema for a single repository\n","class RepositorySchema(BaseModel):\n"," name: str = Field(description=\"Name of the repository (e.g., 'owner/repo')\")\n"," description: str = Field(description=\"Description of the repository\")\n"," stars: int = Field(description=\"Star count of the repository\")\n"," forks: int = Field(description=\"Fork count of the repository\")\n"," today_stars: int = Field(description=\"Stars gained today\")\n"," language: str = Field(description=\"Programming language used\")\n","\n","# Define a schema for a list of repositories\n","class ListRepositoriesSchema(BaseModel):\n"," repositories: List[RepositorySchema] = Field(description=\"List of GitHub trending repositories\")\n","\n","# Example Output JSON after AI extraction\n","{\n"," \"repositories\": [\n"," {\n"," \"name\": \"google-gemini/cookbook\",\n"," \"description\": \"Examples and guides for using the Gemini API\",\n"," \"stars\": 8036,\n"," \"forks\": 1001,\n"," \"today_stars\": 649,\n"," \"language\": \"Jupyter Notebook\"\n"," },\n"," {\n"," \"name\": \"TEN-framework/TEN-Agent\",\n"," \"description\": \"TEN Agent is a conversational AI powered by TEN, integrating Gemini 2.0 Multimodal Live API, OpenAI Realtime API, RTC, and more.\",\n"," \"stars\": 3224,\n"," \"forks\": 311,\n"," \"today_stars\": 361,\n"," \"language\": \"Python\"\n"," }\n"," ]\n","}\n","```\n","\n","Key Takeaways \n","- **Simple Schema**: Perfect for small, straightforward extractions. \n","- **Complex Schema**: Use nesting to extract lists or structured data, like \"a list of repositories.\" \n","\n","Both approaches give the AI a clear structure to follow, ensuring that the extracted content matches exactly what you need.\n"," \n"]},{"cell_type":"code","execution_count":2,"metadata":{"executionInfo":{"elapsed":221,"status":"ok","timestamp":1734792801652,"user":{"displayName":"ScrapeGraphAI","userId":"10474323355016263615"},"user_tz":-60},"id":"dlrOEgZk_8V4"},"outputs":[],"source":["from pydantic import BaseModel, Field\n","from typing import List\n","\n","# Schema for a single news item\n","class NewsItemSchema(BaseModel):\n"," title: str = Field(description=\"Title of the news article\")\n"," link: str = Field(description=\"URL to the news article\")\n"," author: str = Field(description=\"Author of the news article\")\n","\n","# Schema that contains a list of news items\n","class ListNewsSchema(BaseModel):\n"," news: List[NewsItemSchema] = Field(description=\"List of news articles with their details\")"]},{"cell_type":"markdown","metadata":{"id":"cDGH0b2DkY63"},"source":["### 🚀 Initialize `langchain-scrapegraph` tools and `langgraph` prebuilt agent and run the `extraction`"]},{"cell_type":"markdown","metadata":{"id":"M1KSXffZopUD"},"source":["Here we use `SmartScraperTool` to extract structured data using AI from a webpage.\n","\n","\n","> If you already have an HTML file, you can upload it and use `LocalScraperTool` instead.\n","\n","You can find more info in the [official langchain documentation](https://python.langchain.com/docs/integrations/tools/scrapegraph/)\n","\n"]},{"cell_type":"code","execution_count":3,"metadata":{"executionInfo":{"elapsed":222,"status":"ok","timestamp":1734792804219,"user":{"displayName":"ScrapeGraphAI","userId":"10474323355016263615"},"user_tz":-60},"id":"ySoE0Rowjgp1"},"outputs":[],"source":["from llama_index.tools.scrapegraph.base import ScrapegraphToolSpec\n","\n","scrapegraph_tool = ScrapegraphToolSpec()"]},{"cell_type":"markdown","metadata":{"id":"W54HVoYeiJbG"},"source":["We then initialize the `llm model` we want to use in the agent\n","\n"]},{"cell_type":"code","execution_count":4,"metadata":{"executionInfo":{"elapsed":350,"status":"ok","timestamp":1734790431512,"user":{"displayName":"ScrapeGraphAI","userId":"10474323355016263615"},"user_tz":-60},"id":"ctrkEnltiBCD"},"outputs":[],"source":["response = scrapegraph_tool.scrapegraph_smartscraper(\n"," prompt=\"Extract information about science news articles\",\n"," url=\"https://www.wired.com/tag/science/\",\n"," api_key=sgai_api_key,\n"," schema=ListNewsSchema,\n",")"]},{"cell_type":"code","execution_count":5,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"elapsed":238,"status":"ok","timestamp":1734793746145,"user":{"displayName":"ScrapeGraphAI","userId":"10474323355016263615"},"user_tz":-60},"id":"F1VfD8B4LPc8","outputId":"93746d86-6599-44b1-a052-2a83ace873e7"},"outputs":[{"name":"stdout","output_type":"stream","text":["Science News:\n","{\n"," \"request_id\": \"311b964b-0ac0-4011-b894-52398ace4b64\",\n"," \"status\": \"completed\",\n"," \"website_url\": \"https://www.wired.com/tag/science/\",\n"," \"user_prompt\": \"Extract information about science news articles\",\n"," \"result\": {\n"," \"news\": [\n"," {\n"," \"title\": \"Transforming the Moon Into Humanity\\u2019s First Space Hub\",\n"," \"link\": \"https://www.wired.com/story/moon-humanity-industrial-space-hub/\",\n"," \"author\": \"Saurav Shroff\"\n"," },\n"," {\n"," \"title\": \"24 Things That Made the World a Better Place in 2024\",\n"," \"link\": \"https://www.wired.com/story/24-things-that-made-the-world-a-better-place-in-2024-good-news/\",\n"," \"author\": \"Rob Reddick\"\n"," },\n"," {\n"," \"title\": \"Viewers of Quantum Events Are Also Subject to Uncertainty\",\n"," \"link\": \"https://www.wired.com/story/in-the-quantum-world-even-points-of-view-are-uncertain/\",\n"," \"author\": \"Anil Ananthaswamy\"\n"," },\n"," {\n"," \"title\": \"Beyond Meat Says Being Attacked Has Just Made It Stronger\",\n"," \"link\": \"https://www.wired.com/story/beyond-meat-hits-back-against-the-haters-ethan-brown/\",\n"," \"author\": \"Matt Reynolds\"\n"," },\n"," {\n"," \"title\": \"The World\\u2019s First Crispr Drug Gets a Slow Start\",\n"," \"link\": \"https://www.wired.com/story/the-worlds-first-crispr-drug-gets-a-slow-start-sickle-cell-beta-thalassemia-vertex/\",\n"," \"author\": \"Emily Mullin\"\n"," },\n"," {\n"," \"title\": \"The Universe Is Teeming With Complex Organic Molecules\",\n"," \"link\": \"https://www.wired.com/story/the-universe-is-teeming-with-complex-organic-molecules/\",\n"," \"author\": \"Elise Cutts\"\n"," },\n"," {\n"," \"title\": \"December Wildfires Are Now a Thing\",\n"," \"link\": \"https://www.wired.com/story/december-wildfires-are-now-a-thing/\",\n"," \"author\": \"Kylie Mohr\"\n"," },\n"," {\n"," \"title\": \"To Improve Your Gut Microbiome, Spend More Time in Nature\",\n"," \"link\": \"https://www.wired.com/story/to-improve-your-gut-microbiome-spend-more-time-in-nature-kathy-willis/\",\n"," \"author\": \"Kathy Willis\"\n"," },\n"," {\n"," \"title\": \"This Tropical Virus Is Spreading Out of the Amazon to the US and Europe\",\n"," \"link\": \"https://www.wired.com/story/this-tropical-virus-is-spreading-out-of-the-amazon-to-the-us-and-europe/\",\n"," \"author\": \"Geraldine Castro\"\n"," },\n"," {\n"," \"title\": \"CDC Confirms First US Case of Severe Bird Flu\",\n"," \"link\": \"https://www.wired.com/story/cdc-confirms-first-us-case-of-severe-bird-flu/\",\n"," \"author\": \"Emily Mullin\"\n"," },\n"," {\n"," \"title\": \"A Third Person Has Received a Transplant of a Genetically Engineered Pig Kidney\",\n"," \"link\": \"https://www.wired.com/story/a-third-person-has-received-a-transplant-of-a-genetically-engineered-pig-kidney/\",\n"," \"author\": \"Emily Mullin\"\n"," },\n"," {\n"," \"title\": \"How Christmas Trees Could Become a Source of Low-Carbon Protein\",\n"," \"link\": \"https://www.wired.com/story/how-christmas-trees-could-become-a-source-of-low-carbon-protein/\",\n"," \"author\": \"Alexa Phillips\"\n"," },\n"," {\n"," \"title\": \"The Simple Math Behind Public Key Cryptography\",\n"," \"link\": \"https://www.wired.com/story/how-public-key-cryptography-really-works-using-only-simple-math/\",\n"," \"author\": \"John Pavlus\"\n"," },\n"," {\n"," \"title\": \"Good at Reading? Your Brain May Be Structured Differently\",\n"," \"link\": \"https://www.wired.com/story/good-at-reading-your-brain-may-be-structured-differently/\",\n"," \"author\": \"Mikael Roll\"\n"," },\n"," {\n"," \"title\": \"Mega-Farms Are Driving the Threat of Bird Flu\",\n"," \"link\": \"https://www.wired.com/story/mega-farms-are-driving-the-threat-of-bird-flu/\",\n"," \"author\": \"Georgina Gustin\"\n"," },\n"," {\n"," \"title\": \"RFK Plans to Take on Big Pharma. It\\u2019s Easier Said Than Done\",\n"," \"link\": \"https://www.wired.com/story/rfks-plan-to-take-on-big-pharma/\",\n"," \"author\": \"Emily Mullin\"\n"," },\n"," {\n"," \"title\": \"US Meat, Milk Prices Should Spike if Donald Trump Carries Out Mass Deportation Schemes\",\n"," \"link\": \"https://www.wired.com/story/us-meat-milk-prices-should-spike-if-donald-trump-carries-out-mass-deportation-schemes/\",\n"," \"author\": \"Matt Reynolds\"\n"," },\n"," {\n"," \"title\": \"An Augmented Reality Program Can Help Patients Overcome Parkinson\\u2019s Symptoms\",\n"," \"link\": \"https://www.wired.com/story/lining-up-tech-to-help-banish-tremors-strolll-parkinsons/\",\n"," \"author\": \"Grace Browne\"\n"," },\n"," {\n"," \"title\": \"Climate Change Is Destroying Monarch Butterflies\\u2019 Winter Habitat\",\n"," \"link\": \"https://www.wired.com/story/global-warming-threatens-the-monarch-butterfly-sanctuary-but-this-scientist-prepares-a-new-home-for-them/\",\n"," \"author\": \"Andrea J. Arratibel\"\n"," },\n"," {\n"," \"title\": \"Muscle Implants Could Allow Mind-Controlled Prosthetics--No Brain Surgery Required\",\n"," \"link\": \"https://www.wired.com/story/amputees-could-control-prosthetics-with-just-their-thoughts-no-brain-surgery-required-phantom-neuro/\",\n"," \"author\": \"Emily Mullin\"\n"," },\n"," {\n"," \"title\": \"Meet the Plant Hacker Creating Flowers Never Seen (or Smelled) Before\",\n"," \"link\": \"https://www.wired.com/story/meet-the-plant-hacker-creating-flowers-never-seen-or-smelled-before/\",\n"," \"author\": \"Matt Reynolds\"\n"," },\n"," {\n"," \"title\": \"These 3 Things Are Standing in the Way of a Global Plastics Treaty\",\n"," \"link\": \"https://www.wired.com/story/these-3-things-are-standing-in-the-way-of-a-global-plastics-treaty/\",\n"," \"author\": \"Steve Fletcher and Samuel Winton\"\n"," },\n"," {\n"," \"title\": \"Everyone Is Capable of Mathematical Thinking--Yes, Even You\",\n"," \"link\": \"https://www.wired.com/story/everyone-is-capable-of-mathematical-thinking-yes-even-you/\",\n"," \"author\": \"Kelsey Houston-Edwards\"\n"," },\n"," {\n"," \"title\": \"A Uranium-Mining Boom Is Sweeping Through Texas\",\n"," \"link\": \"https://www.wired.com/story/a-uranium-mining-boom-is-sweeping-through-texas-nuclear-energy/\",\n"," \"author\": \"Dylan Baddour\"\n"," }\n"," ]\n"," },\n"," \"error\": \"\"\n","}\n"]}],"source":["import json\n","\n","print(\"Science News:\")\n","print(json.dumps(response, indent=2))"]},{"cell_type":"markdown","metadata":{"id":"2as65QLypwdb"},"source":["### 💾 Save the output to a `CSV` file"]},{"cell_type":"markdown","metadata":{"id":"HTLVFgbVLLBR"},"source":["Let's create a pandas dataframe and show the table with the extracted content"]},{"cell_type":"code","execution_count":91,"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":1000},"executionInfo":{"elapsed":242,"status":"ok","timestamp":1734793782979,"user":{"displayName":"ScrapeGraphAI","userId":"10474323355016263615"},"user_tz":-60},"id":"1lS9O1KOI51y","outputId":"c54e03f8-0441-4e10-db59-6a0c17470abb"},"outputs":[{"data":{"application/vnd.google.colaboratory.intrinsic+json":{"summary":"{\n \"name\": \"df\",\n \"rows\": 54,\n \"fields\": [\n {\n \"column\": \"title\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 53,\n \"samples\": [\n \"Everyone Is Capable of Mathematical Thinking--Yes, Even You\",\n \"A Mysterious Respiratory Disease Has the Democratic Republic of the Congo on High Alert\",\n \"Can Artificial Rain, Drones, or Satellites Clean Toxic Air?\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"link\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 53,\n \"samples\": [\n \"https://www.wired.com/story/everyone-is-capable-of-mathematical-thinking-yes-even-you/\",\n \"https://www.wired.com/story/drc-mysterious-respiratory-disease-children-who-africa/\",\n \"https://www.wired.com/story/artificial-rain-drones-and-satellites-can-tech-clean-indias-toxic-air/\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"author\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 44,\n \"samples\": [\n \"Ritsuko Kawai\",\n \"Neha Mukherjee\",\n \"Amit Katwala\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}","type":"dataframe","variable_name":"df"},"text/html":["\n"," \n","
\n","\n","
\n"," \n"," \n"," | \n"," title | \n"," link | \n"," author | \n","
\n"," \n"," \n"," \n"," 0 | \n"," December Wildfires Are Now a Thing | \n"," https://www.wired.com/story/december-wildfires... | \n"," Kylie Mohr | \n","
\n"," \n"," 1 | \n"," How to Manage Food Anxiety Over the Holidays | \n"," https://www.wired.com/story/how-to-cope-with-f... | \n"," Alison Fixsen | \n","
\n"," \n"," 2 | \n"," A Spacecraft Is About to Fly Into the Sun’s At... | \n"," https://www.wired.com/story/parker-solar-probe... | \n"," Eric Berger, Ars Technica | \n","
\n"," \n"," 3 | \n"," To Improve Your Gut Microbiome, Spend More Tim... | \n"," https://www.wired.com/story/to-improve-your-gu... | \n"," Kathy Willis | \n","
\n"," \n"," 4 | \n"," This Tropical Virus Is Spreading Out of the Am... | \n"," https://www.wired.com/story/this-tropical-viru... | \n"," Geraldine Castro | \n","
\n"," \n"," 5 | \n"," CDC Confirms First US Case of Severe Bird Flu | \n"," https://www.wired.com/story/cdc-confirms-first... | \n"," Emily Mullin | \n","
\n"," \n"," 6 | \n"," The Study That Called Out Black Plastic Utensi... | \n"," https://www.wired.com/story/black-plastic-uten... | \n"," Beth Mole, Ars Technica | \n","
\n"," \n"," 7 | \n"," How Christmas Trees Could Become a Source of L... | \n"," https://www.wired.com/story/how-christmas-tree... | \n"," Alexa Phillips | \n","
\n"," \n"," 8 | \n"," Creating a Global Package to Solve the Problem... | \n"," https://www.wired.com/story/global-plastics-tr... | \n"," Susan Solomon | \n","
\n"," \n"," 9 | \n"," These 3 Things Are Standing in the Way of a Gl... | \n"," https://www.wired.com/story/these-3-things-are... | \n"," Steve Fletcher and Samuel Winton | \n","
\n"," \n"," 10 | \n"," Environmental Sensing Is Here, Tracking Everyt... | \n"," https://www.wired.com/story/environmental-sens... | \n"," Sabrina Weiss | \n","
\n"," \n"," 11 | \n"," Generative AI and Climate Change Are on a Coll... | \n"," https://www.wired.com/story/true-cost-generati... | \n"," Sasha Luccioni | \n","
\n"," \n"," 12 | \n"," Climate Change Is Destroying Monarch Butterfli... | \n"," https://www.wired.com/story/global-warming-thr... | \n"," Andrea J. Arratibel | \n","
\n"," \n"," 13 | \n"," More Humanitarian Organizations Will Harness A... | \n"," https://www.wired.com/story/humanitarian-organ... | \n"," David Miliband | \n","
\n"," \n"," 14 | \n"," Chocolate Has a Sustainability Problem. Scienc... | \n"," https://www.wired.com/story/chocolate-has-a-su... | \n"," Eve Thomas | \n","
\n"," \n"," 15 | \n"," We’ve Never Been Closer to Finding Life Outsid... | \n"," https://www.wired.com/story/james-webb-space-t... | \n"," Lisa Kaltenegger | \n","
\n"," \n"," 16 | \n"," The End Is Near for NASA’s Voyager Probes | \n"," https://www.wired.com/story/the-end-is-near-fo... | \n"," Luca Nardi | \n","
\n"," \n"," 17 | \n"," Why Can’t You Switch Seats in an Empty Airplane? | \n"," https://www.wired.com/story/why-cant-you-switc... | \n"," Rhett Allain | \n","
\n"," \n"," 18 | \n"," The Simple Math Behind Public Key Cryptography | \n"," https://www.wired.com/story/how-public-key-cry... | \n"," John Pavlus | \n","
\n"," \n"," 19 | \n"," Everyone Is Capable of Mathematical Thinking--... | \n"," https://www.wired.com/story/everyone-is-capabl... | \n"," Kelsey Houston-Edwards | \n","
\n"," \n"," 20 | \n"," The Physics of the Macy’s Thanksgiving Day Par... | \n"," https://www.wired.com/story/the-physics-of-the... | \n"," Rhett Allain | \n","
\n"," \n"," 21 | \n"," A Third Person Has Received a Transplant of a ... | \n"," https://www.wired.com/story/a-third-person-has... | \n"," Emily Mullin | \n","
\n"," \n"," 22 | \n"," Muscle Implants Could Allow Mind-Controlled Pr... | \n"," https://www.wired.com/story/amputees-could-con... | \n"," Emily Mullin | \n","
\n"," \n"," 23 | \n"," Combining AI and Crispr Will Be Transformational | \n"," https://www.wired.com/story/combining-ai-and-c... | \n"," Jennifer Doudna | \n","
\n"," \n"," 24 | \n"," Neuralink Plans to Test Whether Its Brain Impl... | \n"," https://www.wired.com/story/neuralink-robotic-... | \n"," Emily Mullin | \n","
\n"," \n"," 25 | \n"," Eight Scientists, a Billion Dollars, and the M... | \n"," https://www.wired.com/story/aria-moonshot-darp... | \n"," Matt Reynolds | \n","
\n"," \n"," 26 | \n"," The Atlas Robot Is Dead. Long Live the Atlas R... | \n"," https://www.wired.com/story/the-atlas-robot-is... | \n"," NA | \n","
\n"," \n"," 27 | \n"," The Atlas Robot Is Dead. Long Live the Atlas R... | \n"," https://www.wired.com/story/the-atlas-robot-is... | \n"," Carlton Reid | \n","
\n"," \n"," 28 | \n"," Meet the Next Generation of Doctors--and Their... | \n"," https://www.wired.com/story/next-generation-do... | \n"," Neha Mukherjee | \n","
\n"," \n"," 29 | \n"," AI Is Building Highly Effective Antibodies Tha... | \n"," https://www.wired.com/story/labgenius-antibody... | \n"," Amit Katwala | \n","
\n"," \n"," 30 | \n"," An Uncertain Future Requires Uncertain Predict... | \n"," https://www.wired.com/story/embrace-uncertaint... | \n"," David Spiegelhalter | \n","
\n"," \n"," 31 | \n"," These Rats Learned to Drive--and They Love It | \n"," https://www.wired.com/story/these-rats-learned... | \n"," Kelly Lambert | \n","
\n"," \n"," 32 | \n"," Scientists Are Unlocking the Secrets of Your ‗... | \n"," https://www.wired.com/story/cerebellum-brain-m... | \n"," R Douglas Fields | \n","
\n"," \n"," 33 | \n"," Meet the Designer Behind Neuralink’s Surgical ... | \n"," https://www.wired.com/story/designer-behind-ne... | \n"," Emily Mullin | \n","
\n"," \n"," 34 | \n"," Antibodies Could Soon Help Slow the Aging Process | \n"," https://www.wired.com/story/antibodies-could-s... | \n"," Andrew Steele | \n","
\n"," \n"," 35 | \n"," Good at Reading? Your Brain May Be Structured ... | \n"," https://www.wired.com/story/good-at-reading-yo... | \n"," Mikael Roll | \n","
\n"," \n"," 36 | \n"," Mega-Farms Are Driving the Threat of Bird Flu | \n"," https://www.wired.com/story/mega-farms-are-dri... | \n"," Georgina Gustin | \n","
\n"," \n"," 37 | \n"," RFK Plans to Take on Big Pharma. It’s Easier S... | \n"," https://www.wired.com/story/rfks-plan-to-take-... | \n"," Emily Mullin | \n","
\n"," \n"," 38 | \n"," Designer Babies Are Teenagers Now--and Some of... | \n"," https://www.wired.com/story/your-next-job-desi... | \n"," Emi Nietfeld | \n","
\n"," \n"," 39 | \n"," US Meat, Milk Prices Should Spike if Donald Tr... | \n"," https://www.wired.com/story/us-meat-milk-price... | \n"," Matt Reynolds | \n","
\n"," \n"," 40 | \n"," An Augmented Reality Program Can Help Patients... | \n"," https://www.wired.com/story/lining-up-tech-to-... | \n"," Grace Browne | \n","
\n"," \n"," 41 | \n"," Meet the Plant Hacker Creating Flowers Never S... | \n"," https://www.wired.com/story/meet-the-plant-hac... | \n"," Matt Reynolds | \n","
\n"," \n"," 42 | \n"," A Mysterious Respiratory Disease Has the Democ... | \n"," https://www.wired.com/story/drc-mysterious-res... | \n"," Marta Musso | \n","
\n"," \n"," 43 | \n"," Skip the Sea Kelp Supplements | \n"," https://www.wired.com/story/pass-on-sea-kelp-s... | \n"," Boutayna Chokrane | \n","
\n"," \n"," 44 | \n"," Why Soccer Players Are Training in the Dark | \n"," https://www.wired.com/story/why-soccer-players... | \n"," RM Clark | \n","
\n"," \n"," 45 | \n"," A Parasite That Eats Cattle Alive Is Creeping ... | \n"," https://www.wired.com/story/a-parasite-that-ea... | \n"," Geraldine Castro | \n","
\n"," \n"," 46 | \n"," Lasers Are Making It Easier to Find Buried Lan... | \n"," https://www.wired.com/story/this-laser-system-... | \n"," Ritsuko Kawai | \n","
\n"," \n"," 47 | \n"," Mark Cuban’s War on Drug Prices: ‖How Much Fuc... | \n"," https://www.wired.com/story/big-interview-mark... | \n"," Marah Eakin | \n","
\n"," \n"," 48 | \n"," Can Artificial Rain, Drones, or Satellites Cle... | \n"," https://www.wired.com/story/artificial-rain-dr... | \n"," Arunima Kar | \n","
\n"," \n"," 49 | \n"," These Stem Cell Treatments Are Worth Millions.... | \n"," https://www.wired.com/story/stem-cells-cost-ri... | \n"," Matt Reynolds | \n","
\n"," \n"," 50 | \n"," The Mystery of How Supermassive Black Holes Merge | \n"," https://www.wired.com/story/how-do-merging-sup... | \n"," Jonathan O’Callaghan | \n","
\n"," \n"," 51 | \n"," The $60 Billion Potential Hiding in Your Disca... | \n"," https://www.wired.com/story/a-dollar60-billion... | \n"," Vince Beiser | \n","
\n"," \n"," 52 | \n"," Tune In to the Healing Powers of a Decent Play... | \n"," https://www.wired.com/story/music-therapy-heal... | \n"," Daniel Levitin | \n","
\n"," \n"," 53 | \n"," Returning the Amazon Rainforest to Its True Ca... | \n"," https://www.wired.com/story/amazon-rainforest-... | \n"," Nemonte Nenquimo and Mitch Anderson | \n","
\n"," \n","
\n","
\n","
\n","
\n"],"text/plain":[" title \\\n","0 December Wildfires Are Now a Thing \n","1 How to Manage Food Anxiety Over the Holidays \n","2 A Spacecraft Is About to Fly Into the Sun’s At... \n","3 To Improve Your Gut Microbiome, Spend More Tim... \n","4 This Tropical Virus Is Spreading Out of the Am... \n","5 CDC Confirms First US Case of Severe Bird Flu \n","6 The Study That Called Out Black Plastic Utensi... \n","7 How Christmas Trees Could Become a Source of L... \n","8 Creating a Global Package to Solve the Problem... \n","9 These 3 Things Are Standing in the Way of a Gl... \n","10 Environmental Sensing Is Here, Tracking Everyt... \n","11 Generative AI and Climate Change Are on a Coll... \n","12 Climate Change Is Destroying Monarch Butterfli... \n","13 More Humanitarian Organizations Will Harness A... \n","14 Chocolate Has a Sustainability Problem. Scienc... \n","15 We’ve Never Been Closer to Finding Life Outsid... \n","16 The End Is Near for NASA’s Voyager Probes \n","17 Why Can’t You Switch Seats in an Empty Airplane? \n","18 The Simple Math Behind Public Key Cryptography \n","19 Everyone Is Capable of Mathematical Thinking--... \n","20 The Physics of the Macy’s Thanksgiving Day Par... \n","21 A Third Person Has Received a Transplant of a ... \n","22 Muscle Implants Could Allow Mind-Controlled Pr... \n","23 Combining AI and Crispr Will Be Transformational \n","24 Neuralink Plans to Test Whether Its Brain Impl... \n","25 Eight Scientists, a Billion Dollars, and the M... \n","26 The Atlas Robot Is Dead. Long Live the Atlas R... \n","27 The Atlas Robot Is Dead. Long Live the Atlas R... \n","28 Meet the Next Generation of Doctors--and Their... \n","29 AI Is Building Highly Effective Antibodies Tha... \n","30 An Uncertain Future Requires Uncertain Predict... \n","31 These Rats Learned to Drive--and They Love It \n","32 Scientists Are Unlocking the Secrets of Your ‗... \n","33 Meet the Designer Behind Neuralink’s Surgical ... \n","34 Antibodies Could Soon Help Slow the Aging Process \n","35 Good at Reading? Your Brain May Be Structured ... \n","36 Mega-Farms Are Driving the Threat of Bird Flu \n","37 RFK Plans to Take on Big Pharma. It’s Easier S... \n","38 Designer Babies Are Teenagers Now--and Some of... \n","39 US Meat, Milk Prices Should Spike if Donald Tr... \n","40 An Augmented Reality Program Can Help Patients... \n","41 Meet the Plant Hacker Creating Flowers Never S... \n","42 A Mysterious Respiratory Disease Has the Democ... \n","43 Skip the Sea Kelp Supplements \n","44 Why Soccer Players Are Training in the Dark \n","45 A Parasite That Eats Cattle Alive Is Creeping ... \n","46 Lasers Are Making It Easier to Find Buried Lan... \n","47 Mark Cuban’s War on Drug Prices: ‖How Much Fuc... \n","48 Can Artificial Rain, Drones, or Satellites Cle... \n","49 These Stem Cell Treatments Are Worth Millions.... \n","50 The Mystery of How Supermassive Black Holes Merge \n","51 The $60 Billion Potential Hiding in Your Disca... \n","52 Tune In to the Healing Powers of a Decent Play... \n","53 Returning the Amazon Rainforest to Its True Ca... \n","\n"," link \\\n","0 https://www.wired.com/story/december-wildfires... \n","1 https://www.wired.com/story/how-to-cope-with-f... \n","2 https://www.wired.com/story/parker-solar-probe... \n","3 https://www.wired.com/story/to-improve-your-gu... \n","4 https://www.wired.com/story/this-tropical-viru... \n","5 https://www.wired.com/story/cdc-confirms-first... \n","6 https://www.wired.com/story/black-plastic-uten... \n","7 https://www.wired.com/story/how-christmas-tree... \n","8 https://www.wired.com/story/global-plastics-tr... \n","9 https://www.wired.com/story/these-3-things-are... \n","10 https://www.wired.com/story/environmental-sens... \n","11 https://www.wired.com/story/true-cost-generati... \n","12 https://www.wired.com/story/global-warming-thr... \n","13 https://www.wired.com/story/humanitarian-organ... \n","14 https://www.wired.com/story/chocolate-has-a-su... \n","15 https://www.wired.com/story/james-webb-space-t... \n","16 https://www.wired.com/story/the-end-is-near-fo... \n","17 https://www.wired.com/story/why-cant-you-switc... \n","18 https://www.wired.com/story/how-public-key-cry... \n","19 https://www.wired.com/story/everyone-is-capabl... \n","20 https://www.wired.com/story/the-physics-of-the... \n","21 https://www.wired.com/story/a-third-person-has... \n","22 https://www.wired.com/story/amputees-could-con... \n","23 https://www.wired.com/story/combining-ai-and-c... \n","24 https://www.wired.com/story/neuralink-robotic-... \n","25 https://www.wired.com/story/aria-moonshot-darp... \n","26 https://www.wired.com/story/the-atlas-robot-is... \n","27 https://www.wired.com/story/the-atlas-robot-is... \n","28 https://www.wired.com/story/next-generation-do... \n","29 https://www.wired.com/story/labgenius-antibody... \n","30 https://www.wired.com/story/embrace-uncertaint... \n","31 https://www.wired.com/story/these-rats-learned... \n","32 https://www.wired.com/story/cerebellum-brain-m... \n","33 https://www.wired.com/story/designer-behind-ne... \n","34 https://www.wired.com/story/antibodies-could-s... \n","35 https://www.wired.com/story/good-at-reading-yo... \n","36 https://www.wired.com/story/mega-farms-are-dri... \n","37 https://www.wired.com/story/rfks-plan-to-take-... \n","38 https://www.wired.com/story/your-next-job-desi... \n","39 https://www.wired.com/story/us-meat-milk-price... \n","40 https://www.wired.com/story/lining-up-tech-to-... \n","41 https://www.wired.com/story/meet-the-plant-hac... \n","42 https://www.wired.com/story/drc-mysterious-res... \n","43 https://www.wired.com/story/pass-on-sea-kelp-s... \n","44 https://www.wired.com/story/why-soccer-players... \n","45 https://www.wired.com/story/a-parasite-that-ea... \n","46 https://www.wired.com/story/this-laser-system-... \n","47 https://www.wired.com/story/big-interview-mark... \n","48 https://www.wired.com/story/artificial-rain-dr... \n","49 https://www.wired.com/story/stem-cells-cost-ri... \n","50 https://www.wired.com/story/how-do-merging-sup... \n","51 https://www.wired.com/story/a-dollar60-billion... \n","52 https://www.wired.com/story/music-therapy-heal... \n","53 https://www.wired.com/story/amazon-rainforest-... \n","\n"," author \n","0 Kylie Mohr \n","1 Alison Fixsen \n","2 Eric Berger, Ars Technica \n","3 Kathy Willis \n","4 Geraldine Castro \n","5 Emily Mullin \n","6 Beth Mole, Ars Technica \n","7 Alexa Phillips \n","8 Susan Solomon \n","9 Steve Fletcher and Samuel Winton \n","10 Sabrina Weiss \n","11 Sasha Luccioni \n","12 Andrea J. Arratibel \n","13 David Miliband \n","14 Eve Thomas \n","15 Lisa Kaltenegger \n","16 Luca Nardi \n","17 Rhett Allain \n","18 John Pavlus \n","19 Kelsey Houston-Edwards \n","20 Rhett Allain \n","21 Emily Mullin \n","22 Emily Mullin \n","23 Jennifer Doudna \n","24 Emily Mullin \n","25 Matt Reynolds \n","26 NA \n","27 Carlton Reid \n","28 Neha Mukherjee \n","29 Amit Katwala \n","30 David Spiegelhalter \n","31 Kelly Lambert \n","32 R Douglas Fields \n","33 Emily Mullin \n","34 Andrew Steele \n","35 Mikael Roll \n","36 Georgina Gustin \n","37 Emily Mullin \n","38 Emi Nietfeld \n","39 Matt Reynolds \n","40 Grace Browne \n","41 Matt Reynolds \n","42 Marta Musso \n","43 Boutayna Chokrane \n","44 RM Clark \n","45 Geraldine Castro \n","46 Ritsuko Kawai \n","47 Marah Eakin \n","48 Arunima Kar \n","49 Matt Reynolds \n","50 Jonathan O’Callaghan \n","51 Vince Beiser \n","52 Daniel Levitin \n","53 Nemonte Nenquimo and Mitch Anderson "]},"execution_count":91,"metadata":{},"output_type":"execute_result"}],"source":["import pandas as pd\n","\n","# Convert dictionary to DataFrame\n","df = pd.DataFrame(response[\"result\"][\"news\"])\n","df"]},{"cell_type":"markdown","metadata":{"id":"v0CBYVk7qA5Z"},"source":["Save it to CSV"]},{"cell_type":"code","execution_count":92,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"elapsed":227,"status":"ok","timestamp":1734793788052,"user":{"displayName":"ScrapeGraphAI","userId":"10474323355016263615"},"user_tz":-60},"id":"BtEbB9pmQGhO","outputId":"8609d8c0-adee-49ae-9f2c-ca427de06bc3"},"outputs":[{"name":"stdout","output_type":"stream","text":["Data saved to wired_news.csv\n"]}],"source":["# Save the DataFrame to a CSV file\n","csv_file = \"wired_news.csv\"\n","df.to_csv(csv_file, index=False)\n","print(f\"Data saved to {csv_file}\")"]},{"cell_type":"markdown","metadata":{"id":"-1SZT8VzTZNd"},"source":["## 🔗 Resources"]},{"cell_type":"markdown","metadata":{"id":"dUi2LtMLRDDR"},"source":["\n","\n","
\n","
\n","\n","\n","- 🚀 **Get your API Key:** [ScrapeGraphAI Dashboard](https://dashboard.scrapegraphai.com) \n","- 🐙 **GitHub:** [ScrapeGraphAI GitHub](https://github.com/scrapegraphai) \n","- 💼 **LinkedIn:** [ScrapeGraphAI LinkedIn](https://www.linkedin.com/company/scrapegraphai/) \n","- 🐦 **Twitter:** [ScrapeGraphAI Twitter](https://twitter.com/scrapegraphai) \n","- 💬 **Discord:** [Join our Discord Community](https://discord.gg/uJN7TYcpNa) \n","- 🦜 **Langchain:** [ScrapeGraph docs](https://python.langchain.com/docs/integrations/tools/scrapegraph/)\n","\n","Made with ❤️ by the [ScrapeGraphAI](https://scrapegraphai.com) Team \n"]}],"metadata":{"colab":{"authorship_tag":"ABX9TyMIy/dnR3c9alpVvODp/1Ia","provenance":[]},"kernelspec":{"display_name":"Python 3","name":"python3"},"language_info":{"codemirror_mode":{"name":"ipython","version":3},"file_extension":".py","mimetype":"text/x-python","name":"python","nbconvert_exporter":"python","pygments_lexer":"ipython3","version":"3.10.14"}},"nbformat":4,"nbformat_minor":0}
diff --git a/cookbook/wired-news/scrapegraph_llama_index.ipynb b/cookbook/wired-news/scrapegraph_llama_index.ipynb
new file mode 100644
index 0000000..0193ca8
--- /dev/null
+++ b/cookbook/wired-news/scrapegraph_llama_index.ipynb
@@ -0,0 +1,1438 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "ReBHQ5_834pZ"
+ },
+ "source": [
+ "\n",
+ "
\n",
+ ""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "jEkuKbcRrPcK"
+ },
+ "source": [
+ "## 🕷️ Extract Wired Science News Info with llama-index and ScrapegraphAI's APIs"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "oQa0HZo7nuB_"
+ },
+ "source": [
+ ""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "IzsyDXEWwPVt"
+ },
+ "source": [
+ "### 🔧 Install `dependencies`"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "os_vm0MkIxr9"
+ },
+ "outputs": [],
+ "source": [
+ "%%capture\n",
+ "!pip install llama-index-tools-scrapegraphai\n",
+ "!pip install llama-index"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "apBsL-L2KzM7"
+ },
+ "source": [
+ "### 🔑 Import `ScrapeGraph` API key"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "ol9gQbAFkh9b"
+ },
+ "source": [
+ "You can find the Scrapegraph API key [here](https://dashboard.scrapegraphai.com/)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "sffqFG2EJ8bI"
+ },
+ "outputs": [],
+ "source": [
+ "import getpass\n",
+ "import os\n",
+ "\n",
+ "if not os.environ.get(\"SGAI_API_KEY\"):\n",
+ " os.environ[\"SGAI_API_KEY\"] = getpass.getpass(\"Scrapegraph API key:\\n\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "jnqMB2-xVYQ7"
+ },
+ "source": [
+ "### 📝 Defining an `Output Schema` for Webpage Content Extraction\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "VZvxbjfXvbgd"
+ },
+ "source": [
+ "If you already know what you want to extract from a webpage, you can **define an output schema** using **Pydantic**. This schema acts as a \"blueprint\" that tells the AI how to structure the response.\n",
+ "\n",
+ "\n",
+ " Pydantic Schema Quick Guide
\n",
+ "\n",
+ "Types of Schemas \n",
+ "\n",
+ "1. Simple Schema \n",
+ "Use this when you want to extract straightforward information, such as a single piece of content. \n",
+ "\n",
+ "```python\n",
+ "from pydantic import BaseModel, Field\n",
+ "\n",
+ "# Simple schema for a single webpage\n",
+ "class PageInfoSchema(BaseModel):\n",
+ " title: str = Field(description=\"The title of the webpage\")\n",
+ " description: str = Field(description=\"The description of the webpage\")\n",
+ "\n",
+ "# Example Output JSON after AI extraction\n",
+ "{\n",
+ " \"title\": \"ScrapeGraphAI: The Best Content Extraction Tool\",\n",
+ " \"description\": \"ScrapeGraphAI provides powerful tools for structured content extraction from websites.\"\n",
+ "}\n",
+ "```\n",
+ "\n",
+ "2. Complex Schema (Nested) \n",
+ "If you need to extract structured information with multiple related items (like a list of repositories), you can **nest schemas**.\n",
+ "\n",
+ "```python\n",
+ "from pydantic import BaseModel, Field\n",
+ "from typing import List\n",
+ "\n",
+ "# Define a schema for a single repository\n",
+ "class RepositorySchema(BaseModel):\n",
+ " name: str = Field(description=\"Name of the repository (e.g., 'owner/repo')\")\n",
+ " description: str = Field(description=\"Description of the repository\")\n",
+ " stars: int = Field(description=\"Star count of the repository\")\n",
+ " forks: int = Field(description=\"Fork count of the repository\")\n",
+ " today_stars: int = Field(description=\"Stars gained today\")\n",
+ " language: str = Field(description=\"Programming language used\")\n",
+ "\n",
+ "# Define a schema for a list of repositories\n",
+ "class ListRepositoriesSchema(BaseModel):\n",
+ " repositories: List[RepositorySchema] = Field(description=\"List of GitHub trending repositories\")\n",
+ "\n",
+ "# Example Output JSON after AI extraction\n",
+ "{\n",
+ " \"repositories\": [\n",
+ " {\n",
+ " \"name\": \"google-gemini/cookbook\",\n",
+ " \"description\": \"Examples and guides for using the Gemini API\",\n",
+ " \"stars\": 8036,\n",
+ " \"forks\": 1001,\n",
+ " \"today_stars\": 649,\n",
+ " \"language\": \"Jupyter Notebook\"\n",
+ " },\n",
+ " {\n",
+ " \"name\": \"TEN-framework/TEN-Agent\",\n",
+ " \"description\": \"TEN Agent is a conversational AI powered by TEN, integrating Gemini 2.0 Multimodal Live API, OpenAI Realtime API, RTC, and more.\",\n",
+ " \"stars\": 3224,\n",
+ " \"forks\": 311,\n",
+ " \"today_stars\": 361,\n",
+ " \"language\": \"Python\"\n",
+ " }\n",
+ " ]\n",
+ "}\n",
+ "```\n",
+ "\n",
+ "Key Takeaways \n",
+ "- **Simple Schema**: Perfect for small, straightforward extractions. \n",
+ "- **Complex Schema**: Use nesting to extract lists or structured data, like \"a list of repositories.\" \n",
+ "\n",
+ "Both approaches give the AI a clear structure to follow, ensuring that the extracted content matches exactly what you need.\n",
+ " \n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "dlrOEgZk_8V4"
+ },
+ "outputs": [],
+ "source": [
+ "from pydantic import BaseModel, Field\n",
+ "from typing import List\n",
+ "\n",
+ "# Schema for a single news item\n",
+ "class NewsItemSchema(BaseModel):\n",
+ " category: str = Field(description=\"Category of the news (e.g., 'Health', 'Environment')\")\n",
+ " title: str = Field(description=\"Title of the news article\")\n",
+ " link: str = Field(description=\"URL to the news article\")\n",
+ " author: str = Field(description=\"Author of the news article\")\n",
+ "\n",
+ "# Schema that contains a list of news items\n",
+ "class ListNewsSchema(BaseModel):\n",
+ " news: List[NewsItemSchema] = Field(description=\"List of news articles with their details\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "cDGH0b2DkY63"
+ },
+ "source": [
+ "### 🚀 Initialize `ScrapegraphToolSpec` tools and start extraction"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "M1KSXffZopUD"
+ },
+ "source": [
+ "Here we use `scrapegraph_smartscraper` to extract structured data using AI from a webpage.\n",
+ "\n",
+ "\n",
+ "> If you already have an HTML file, you can upload it and use `scrapegraph_local_scrape` instead.\n",
+ "\n",
+ "You can find more info in the [official llama-index documentation](https://docs.llamaindex.ai/en/stable/api_reference/tools/scrapegraph/)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "ySoE0Rowjgp1"
+ },
+ "outputs": [],
+ "source": [
+ "from llama_index.tools.scrapegraph.base import ScrapegraphToolSpec\n",
+ "\n",
+ "scrapegraph_tool = ScrapegraphToolSpec()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "2FIKomclLNFx"
+ },
+ "outputs": [],
+ "source": [
+ "response = scrapegraph_tool.scrapegraph_smartscraper(\n",
+ " prompt=\"Extract the first 10 news in the page\",\n",
+ " url=\"https://www.wired.com/category/science/\",\n",
+ " api_key=os.environ.get(\"SGAI_API_KEY\"),\n",
+ " schema=ListNewsSchema\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "YZz1bqCIpoL8"
+ },
+ "source": [
+ "Print the response"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "F1VfD8B4LPc8",
+ "outputId": "cc51cf25-18bb-44bf-9242-125339623e3e"
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Science News:\n",
+ "{\n",
+ " \"request_id\": \"5369dc4a-ddd3-4e7d-9938-741f6388c804\",\n",
+ " \"status\": \"completed\",\n",
+ " \"website_url\": \"https://www.wired.com/category/science/\",\n",
+ " \"user_prompt\": \"Extract the first 10 news in the page\",\n",
+ " \"result\": {\n",
+ " \"news\": [\n",
+ " {\n",
+ " \"category\": \"WIRED World\",\n",
+ " \"title\": \"Transforming the Moon Into Humanity\\u2019s First Space Hub\",\n",
+ " \"link\": \"https://www.wired.com/story/moon-humanity-industrial-space-hub/\",\n",
+ " \"author\": \"Saurav Shroff\"\n",
+ " },\n",
+ " {\n",
+ " \"category\": \"WIRED World\",\n",
+ " \"title\": \"How Do You Live a Happier Life? Notice What Was There All Along\",\n",
+ " \"link\": \"https://www.wired.com/story/happiness-habituation-experiment-in-living/\",\n",
+ " \"author\": \"Tali Sharot\"\n",
+ " },\n",
+ " {\n",
+ " \"category\": \"Year In Review\",\n",
+ " \"title\": \"24 Things That Made the World a Better Place in 2024\",\n",
+ " \"link\": \"https://www.wired.com/story/24-things-that-made-the-world-a-better-place-in-2024-good-news/\",\n",
+ " \"author\": \"Rob Reddick\"\n",
+ " },\n",
+ " {\n",
+ " \"category\": \"Health\",\n",
+ " \"title\": \"Healthier Cities Will Require a Strong Dose of Nature\",\n",
+ " \"link\": \"https://www.wired.com/story/healthier-cities-will-require-a-strong-dose-of-nature/\",\n",
+ " \"author\": \"Kathy Willis\"\n",
+ " },\n",
+ " {\n",
+ " \"category\": \"Health\",\n",
+ " \"title\": \"There\\u2019s Still Time to Get Ahead of the Next Global Pandemic\",\n",
+ " \"link\": \"https://www.wired.com/story/global-pandemic-public-health-lessons-preparedness/\",\n",
+ " \"author\": \"Caitlin Rivers\"\n",
+ " },\n",
+ " {\n",
+ " \"category\": \"Health\",\n",
+ " \"title\": \"Give Your Social Health a Decent Workout\",\n",
+ " \"link\": \"https://www.wired.com/story/social-health-relationships-community/\",\n",
+ " \"author\": \"Kasley Killam\"\n",
+ " },\n",
+ " {\n",
+ " \"category\": \"Science\",\n",
+ " \"title\": \"The World\\u2019s First Crispr Drug Gets a Slow Start\",\n",
+ " \"link\": \"https://www.wired.com/story/the-worlds-first-crispr-drug-gets-a-slow-start-sickle-cell-beta-thalassemia-vertex/\",\n",
+ " \"author\": \"Emily Mullin\"\n",
+ " },\n",
+ " {\n",
+ " \"category\": \"Environment\",\n",
+ " \"title\": \"To Improve Your Gut Microbiome, Spend More Time in Nature\",\n",
+ " \"link\": \"https://www.wired.com/story/to-improve-your-gut-microbiome-spend-more-time-in-nature-kathy-willis/\",\n",
+ " \"author\": \"Kathy Willis\"\n",
+ " },\n",
+ " {\n",
+ " \"category\": \"Environment\",\n",
+ " \"title\": \"This Tropical Virus Is Spreading Out of the Amazon to the US and Europe\",\n",
+ " \"link\": \"https://www.wired.com/story/this-tropical-virus-is-spreading-out-of-the-amazon-to-the-us-and-europe/\",\n",
+ " \"author\": \"Geraldine Castro\"\n",
+ " },\n",
+ " {\n",
+ " \"category\": \"Environment\",\n",
+ " \"title\": \"How Christmas Trees Could Become a Source of Low-Carbon Protein\",\n",
+ " \"link\": \"https://www.wired.com/story/how-christmas-trees-could-become-a-source-of-low-carbon-protein/\",\n",
+ " \"author\": \"Alexa Phillips\"\n",
+ " },\n",
+ " {\n",
+ " \"category\": \"Environment\",\n",
+ " \"title\": \"Creating a Global Package to Solve the Problem of Plastics\",\n",
+ " \"link\": \"https://www.wired.com/story/global-plastics-treaty-united-nations/\",\n",
+ " \"author\": \"Susan Solomon\"\n",
+ " },\n",
+ " {\n",
+ " \"category\": \"Climate\",\n",
+ " \"title\": \"December Wildfires Are Now a Thing\",\n",
+ " \"link\": \"https://www.wired.com/story/december-wildfires-are-now-a-thing/\",\n",
+ " \"author\": \"Kylie Mohr\"\n",
+ " },\n",
+ " {\n",
+ " \"category\": \"Climate\",\n",
+ " \"title\": \"Generative AI and Climate Change Are on a Collision Course\",\n",
+ " \"link\": \"https://www.wired.com/story/true-cost-generative-ai-data-centers-energy/\",\n",
+ " \"author\": \"Sasha Luccioni\"\n",
+ " },\n",
+ " {\n",
+ " \"category\": \"Climate\",\n",
+ " \"title\": \"Climate Change Is Destroying Monarch Butterflies\\u2019 Winter Habitat\",\n",
+ " \"link\": \"https://www.wired.com/story/global-warming-threatens-the-monarch-butterfly-sanctuary-but-this-scientist-prepares-a-new-home-for-them/\",\n",
+ " \"author\": \"Andrea J. Arratibel\"\n",
+ " },\n",
+ " {\n",
+ " \"category\": \"Politics\",\n",
+ " \"title\": \"More Humanitarian Organizations Will Harness AI\\u2019s Potential\",\n",
+ " \"link\": \"https://www.wired.com/story/humanitarian-organizations-artificial-intelligence/\",\n",
+ " \"author\": \"David Miliband\"\n",
+ " },\n",
+ " {\n",
+ " \"category\": \"Energy\",\n",
+ " \"title\": \"Electric Vehicle Charging Is Going to Get Political\",\n",
+ " \"link\": \"https://www.wired.com/story/electric-vehicle-charging-is-going-to-get-political/\",\n",
+ " \"author\": \"Aarian Marshall\"\n",
+ " },\n",
+ " {\n",
+ " \"category\": \"Politics\",\n",
+ " \"title\": \"Big Tech Will Scour the Globe in Its Search for Cheap Energy\",\n",
+ " \"link\": \"https://www.wired.com/story/big-tech-data-centers-cheap-energy/\",\n",
+ " \"author\": \"Azeem Azhar\"\n",
+ " },\n",
+ " {\n",
+ " \"category\": \"Environment\",\n",
+ " \"title\": \"Humans Will Continue to Live in an Age of Incredible Food Waste\",\n",
+ " \"link\": \"https://www.wired.com/story/food-production-energy-waste/\",\n",
+ " \"author\": \"Vaclav Smil\"\n",
+ " },\n",
+ " {\n",
+ " \"category\": \"Energy\",\n",
+ " \"title\": \"A Uranium-Mining Boom Is Sweeping Through Texas\",\n",
+ " \"link\": \"https://www.wired.com/story/a-uranium-mining-boom-is-sweeping-through-texas-nuclear-energy/\",\n",
+ " \"author\": \"Dylan Baddour\"\n",
+ " },\n",
+ " {\n",
+ " \"category\": \"Science\",\n",
+ " \"title\": \"A Spacecraft Is About to Fly Into the Sun\\u2019s Atmosphere for the First Time\",\n",
+ " \"link\": \"https://www.wired.com/story/parker-solar-probe-atmosphere/\",\n",
+ " \"author\": \"Eric Berger, Ars Technica\"\n",
+ " },\n",
+ " {\n",
+ " \"category\": \"Science\",\n",
+ " \"title\": \"What\\u2019s the Winter Solstice, Anyway?\",\n",
+ " \"link\": \"https://www.wired.com/story/winter-solstice/\",\n",
+ " \"author\": \"Reece Rogers\"\n",
+ " },\n",
+ " {\n",
+ " \"category\": \"Physics and Math\",\n",
+ " \"title\": \"Viewers of Quantum Events Are Also Subject to Uncertainty\",\n",
+ " \"link\": \"https://www.wired.com/story/in-the-quantum-world-even-points-of-view-are-uncertain/\",\n",
+ " \"author\": \"Anil Ananthaswamy\"\n",
+ " },\n",
+ " {\n",
+ " \"category\": \"Science\",\n",
+ " \"title\": \"How Does a Movie Projector Show the Color Black?\",\n",
+ " \"link\": \"https://www.wired.com/story/how-does-a-movie-projector-show-the-color-black/\",\n",
+ " \"author\": \"Rhett Allain\"\n",
+ " },\n",
+ " {\n",
+ " \"category\": \"Science\",\n",
+ " \"title\": \"Why Can\\u2019t You Switch Seats in an Empty Airplane?\",\n",
+ " \"link\": \"https://www.wired.com/story/why-cant-you-switch-seats-in-an-empty-airplane/\",\n",
+ " \"author\": \"Rhett Allain\"\n",
+ " },\n",
+ " {\n",
+ " \"category\": \"Science\",\n",
+ " \"title\": \"The Simple Math Behind Public Key Cryptography\",\n",
+ " \"link\": \"https://www.wired.com/story/how-public-key-cryptography-really-works-using-only-simple-math/\",\n",
+ " \"author\": \"John Pavlus\"\n",
+ " },\n",
+ " {\n",
+ " \"category\": \"Biotech\",\n",
+ " \"title\": \"A Third Person Has Received a Transplant of a Genetically Engineered Pig Kidney\",\n",
+ " \"link\": \"https://www.wired.com/story/a-third-person-has-received-a-transplant-of-a-genetically-engineered-pig-kidney/\",\n",
+ " \"author\": \"Emily Mullin\"\n",
+ " },\n",
+ " {\n",
+ " \"category\": \"Biotech\",\n",
+ " \"title\": \"Muscle Implants Could Allow Mind-Controlled Prosthetics--No Brain Surgery Required\",\n",
+ " \"link\": \"https://www.wired.com/story/amputees-could-control-prosthetics-with-just-their-thoughts-no-brain-surgery-required-phantom-neuro/\",\n",
+ " \"author\": \"Emily Mullin\"\n",
+ " },\n",
+ " {\n",
+ " \"category\": \"Biotech\",\n",
+ " \"title\": \"Combining AI and Crispr Will Be Transformational\",\n",
+ " \"link\": \"https://www.wired.com/story/combining-ai-and-crispr-will-be-transformational/\",\n",
+ " \"author\": \"Jennifer Doudna\"\n",
+ " },\n",
+ " {\n",
+ " \"category\": \"Biotech\",\n",
+ " \"title\": \"Neuralink Plans to Test Whether Its Brain Implant Can Control a Robotic Arm\",\n",
+ " \"link\": \"https://www.wired.com/story/neuralink-robotic-arm-controlled-by-mind/\",\n",
+ " \"author\": \"Emily Mullin\"\n",
+ " },\n",
+ " {\n",
+ " \"category\": \"Health\",\n",
+ " \"title\": \"Meet the Next Generation of Doctors--and Their Surgical Robots\",\n",
+ " \"link\": \"https://www.wired.com/story/next-generation-doctors-surgical-robots/\",\n",
+ " \"author\": \"Neha Mukherjee\"\n",
+ " },\n",
+ " {\n",
+ " \"category\": \"Health\",\n",
+ " \"title\": \"AI Is Building Highly Effective Antibodies That Humans Can\\u2019t Even Imagine\",\n",
+ " \"link\": \"https://www.wired.com/story/labgenius-antibody-factory-machine-learning/\",\n",
+ " \"author\": \"Amit Katwala\"\n",
+ " },\n",
+ " {\n",
+ " \"category\": \"Psychology and Neuroscience\",\n",
+ " \"title\": \"The Race to Translate Animal Sounds Into Human Language\",\n",
+ " \"link\": \"https://www.wired.com/story/artificial-intelligence-translation-animal-sounds-human-language/\",\n",
+ " \"author\": \"Arik Kershenbaum\"\n",
+ " },\n",
+ " {\n",
+ " \"category\": \"Psychology and Neuroscience\",\n",
+ " \"title\": \"An Uncertain Future Requires Uncertain Prediction Skills\",\n",
+ " \"link\": \"https://www.wired.com/story/embrace-uncertainty-forecasting-prediction-skills/\",\n",
+ " \"author\": \"David Spiegelhalter\"\n",
+ " },\n",
+ " {\n",
+ " \"category\": \"Psychology and Neuroscience\",\n",
+ " \"title\": \"These Rats Learned to Drive--and They Love It\",\n",
+ " \"link\": \"https://www.wired.com/story/these-rats-learned-to-drive-and-they-love-it/\",\n",
+ " \"author\": \"Kelly Lambert\"\n",
+ " },\n",
+ " {\n",
+ " \"category\": \"Psychology and Neuroscience\",\n",
+ " \"title\": \"Scientists Are Unlocking the Secrets of Your \\u2018Little Brain\\u2019\",\n",
+ " \"link\": \"https://www.wired.com/story/cerebellum-brain-movement-feelings/\",\n",
+ " \"author\": \"R Douglas Fields\"\n",
+ " },\n",
+ " {\n",
+ " \"category\": \"Year in Review\",\n",
+ " \"title\": \"Beyond Meat Says Being Attacked Has Just Made It Stronger\",\n",
+ " \"link\": \"https://www.wired.com/story/beyond-meat-hits-back-against-the-haters-ethan-brown/\",\n",
+ " \"author\": \"Matt Reynolds\"\n",
+ " },\n",
+ " {\n",
+ " \"category\": \"Mental Health\",\n",
+ " \"title\": \"How to Manage Food Anxiety Over the Holidays\",\n",
+ " \"link\": \"https://www.wired.com/story/how-to-cope-with-food-anxiety-during-the-festive-season/\",\n",
+ " \"author\": \"Alison Fixsen\"\n",
+ " },\n",
+ " {\n",
+ " \"category\": \"extended trip\",\n",
+ " \"title\": \"NASA Postpones Return of Stranded Starliner Astronauts to March\",\n",
+ " \"link\": \"https://www.wired.com/story/boeing-starliner-astronauts-stranded-until-march-nasa/\",\n",
+ " \"author\": \"Fernanda Gonz\\u00e1lez\"\n",
+ " },\n",
+ " {\n",
+ " \"category\": \"Public Health\",\n",
+ " \"title\": \"CDC Confirms First US Case of Severe Bird Flu\",\n",
+ " \"link\": \"https://www.wired.com/story/cdc-confirms-first-us-case-of-severe-bird-flu/\",\n",
+ " \"author\": \"Emily Mullin\"\n",
+ " },\n",
+ " {\n",
+ " \"category\": \"Health\",\n",
+ " \"title\": \"Mega-Farms Are Driving the Threat of Bird Flu\",\n",
+ " \"link\": \"https://www.wired.com/story/mega-farms-are-driving-the-threat-of-bird-flu/\",\n",
+ " \"author\": \"Georgina Gustin\"\n",
+ " },\n",
+ " {\n",
+ " \"category\": \"Health\",\n",
+ " \"title\": \"RFK Plans to Take on Big Pharma. It\\u2019s Easier Said Than Done\",\n",
+ " \"link\": \"https://www.wired.com/story/rfks-plan-to-take-on-big-pharma/\",\n",
+ " \"author\": \"Emily Mullin\"\n",
+ " },\n",
+ " {\n",
+ " \"category\": \"Health\",\n",
+ " \"title\": \"Designer Babies Are Teenagers Now--and Some of Them Need Therapy Because of It\",\n",
+ " \"link\": \"https://www.wired.com/story/your-next-job-designer-baby-therapist/\",\n",
+ " \"author\": \"Emi Nietfeld\"\n",
+ " },\n",
+ " {\n",
+ " \"category\": \"Economics\",\n",
+ " \"title\": \"US Meat, Milk Prices Should Spike if Donald Trump Carries Out Mass Deportation Schemes\",\n",
+ " \"link\": \"https://www.wired.com/story/us-meat-milk-prices-should-spike-if-donald-trump-carries-out-mass-deportation-schemes/\",\n",
+ " \"author\": \"Matt Reynolds\"\n",
+ " },\n",
+ " {\n",
+ " \"category\": \"Health\",\n",
+ " \"title\": \"An Augmented Reality Program Can Help Patients Overcome Parkinson\\u2019s Symptoms\",\n",
+ " \"link\": \"https://www.wired.com/story/lining-up-tech-to-help-banish-tremors-strolll-parkinsons/\",\n",
+ " \"author\": \"Grace Browne\"\n",
+ " },\n",
+ " {\n",
+ " \"category\": \"Environment\",\n",
+ " \"title\": \"Meet the Plant Hacker Creating Flowers Never Seen (or Smelled) Before\",\n",
+ " \"link\": \"https://www.wired.com/story/meet-the-plant-hacker-creating-flowers-never-seen-or-smelled-before/\",\n",
+ " \"author\": \"Matt Reynolds\"\n",
+ " },\n",
+ " {\n",
+ " \"category\": \"Environment\",\n",
+ " \"title\": \"These 3 Things Are Standing in the Way of a Global Plastics Treaty\",\n",
+ " \"link\": \"https://www.wired.com/story/these-3-things-are-standing-in-the-way-of-a-global-plastics-treaty/\",\n",
+ " \"author\": \"Steve Fletcher and Samuel Winton\"\n",
+ " },\n",
+ " {\n",
+ " \"category\": \"Education\",\n",
+ " \"title\": \"Everyone Is Capable of Mathematical Thinking--Yes, Even You\",\n",
+ " \"link\": \"https://www.wired.com/story/everyone-is-capable-of-mathematical-thinking-yes-even-you/\",\n",
+ " \"author\": \"Kelsey Houston-Edwards\"\n",
+ " },\n",
+ " {\n",
+ " \"category\": \"Public Health\",\n",
+ " \"title\": \"A Mysterious Respiratory Disease Has the Democratic Republic of the Congo on High Alert\",\n",
+ " \"link\": \"https://www.wired.com/story/drc-mysterious-respiratory-disease-children-who-africa/\",\n",
+ " \"author\": \"Marta Musso\"\n",
+ " },\n",
+ " {\n",
+ " \"category\": \"Health\",\n",
+ " \"title\": \"Skip the Sea Kelp Supplements\",\n",
+ " \"link\": \"https://www.wired.com/story/pass-on-sea-kelp-supplements/\",\n",
+ " \"author\": \"Boutayna Chokrane\"\n",
+ " }\n",
+ " ]\n",
+ " },\n",
+ " \"error\": \"\"\n",
+ "}\n"
+ ]
+ }
+ ],
+ "source": [
+ "import json\n",
+ "\n",
+ "print(\"Science News:\")\n",
+ "print(json.dumps(response, indent=2))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "2as65QLypwdb"
+ },
+ "source": [
+ "### 💾 Save the output to a `CSV` file"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "HTLVFgbVLLBR"
+ },
+ "source": [
+ "Let's create a pandas dataframe and show the table with the extracted content"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 1000
+ },
+ "id": "1lS9O1KOI51y",
+ "outputId": "f3afda32-9fed-4f36-81c4-cd0b4ecf7f89"
+ },
+ "outputs": [
+ {
+ "data": {
+ "application/vnd.google.colaboratory.intrinsic+json": {
+ "summary": "{\n \"name\": \"df\",\n \"rows\": 40,\n \"fields\": [\n {\n \"column\": \"category\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 13,\n \"samples\": [\n \"Technology\",\n \"Biotech\",\n \"Science\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"title\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 40,\n \"samples\": [\n \"The Mystery of How Supermassive Black Holes Merge\",\n \"Humans Will Continue to Live in an Age of Incredible Food Waste\",\n \"Big Tech Will Scour the Globe in Its Search for Cheap Energy\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"link\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 40,\n \"samples\": [\n \"https://www.wired.com/story/how-do-merging-supermassive-black-holes-pass-the-final-parsec/\",\n \"https://www.wired.com/story/food-production-energy-waste/\",\n \"https://www.wired.com/story/big-tech-data-centers-cheap-energy/\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"author\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 37,\n \"samples\": [\n \"Luca Nardi\",\n \"Eve Thomas\",\n \"Mikael Roll\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}",
+ "type": "dataframe",
+ "variable_name": "df"
+ },
+ "text/html": [
+ "\n",
+ " \n",
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " category | \n",
+ " title | \n",
+ " link | \n",
+ " author | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 | \n",
+ " Science | \n",
+ " NASA Postpones Return of Stranded Starliner As... | \n",
+ " https://www.wired.com/story/boeing-starliner-a... | \n",
+ " Fernanda González | \n",
+ "
\n",
+ " \n",
+ " 1 | \n",
+ " Public Health | \n",
+ " CDC Confirms First US Case of Severe Bird Flu | \n",
+ " https://www.wired.com/story/cdc-confirms-first... | \n",
+ " Emily Mullin | \n",
+ "
\n",
+ " \n",
+ " 2 | \n",
+ " Science | \n",
+ " The Study That Called Out Black Plastic Utensi... | \n",
+ " https://www.wired.com/story/black-plastic-uten... | \n",
+ " Beth Mole, Ars Technica | \n",
+ "
\n",
+ " \n",
+ " 3 | \n",
+ " Health | \n",
+ " A Third Person Has Received a Transplant of a ... | \n",
+ " https://www.wired.com/story/a-third-person-has... | \n",
+ " Emily Mullin | \n",
+ "
\n",
+ " \n",
+ " 4 | \n",
+ " Health | \n",
+ " Antibodies Could Soon Help Slow the Aging Process | \n",
+ " https://www.wired.com/story/antibodies-could-s... | \n",
+ " Andrew Steele | \n",
+ "
\n",
+ " \n",
+ " 5 | \n",
+ " Science | \n",
+ " Good at Reading? Your Brain May Be Structured ... | \n",
+ " https://www.wired.com/story/good-at-reading-yo... | \n",
+ " Mikael Roll | \n",
+ "
\n",
+ " \n",
+ " 6 | \n",
+ " Environment | \n",
+ " Mega-Farms Are Driving the Threat of Bird Flu | \n",
+ " https://www.wired.com/story/mega-farms-are-dri... | \n",
+ " Georgina Gustin | \n",
+ "
\n",
+ " \n",
+ " 7 | \n",
+ " Environment | \n",
+ " How Christmas Trees Could Become a Source of L... | \n",
+ " https://www.wired.com/story/how-christmas-tree... | \n",
+ " Alexa Phillips | \n",
+ "
\n",
+ " \n",
+ " 8 | \n",
+ " Environment | \n",
+ " Creating a Global Package to Solve the Problem... | \n",
+ " https://www.wired.com/story/global-plastics-tr... | \n",
+ " Susan Solomon | \n",
+ "
\n",
+ " \n",
+ " 9 | \n",
+ " Environment | \n",
+ " These 3 Things Are Standing in the Way of a Gl... | \n",
+ " https://www.wired.com/story/these-3-things-are... | \n",
+ " Steve Fletcher and Samuel Winton | \n",
+ "
\n",
+ " \n",
+ " 10 | \n",
+ " Environment | \n",
+ " Environmental Sensing Is Here, Tracking Everyt... | \n",
+ " https://www.wired.com/story/environmental-sens... | \n",
+ " Sabrina Weiss | \n",
+ "
\n",
+ " \n",
+ " 11 | \n",
+ " Climate | \n",
+ " Generative AI and Climate Change Are on a Coll... | \n",
+ " https://www.wired.com/story/true-cost-generati... | \n",
+ " Sasha Luccioni | \n",
+ "
\n",
+ " \n",
+ " 12 | \n",
+ " Climate | \n",
+ " Climate Change Is Destroying Monarch Butterfli... | \n",
+ " https://www.wired.com/story/global-warming-thr... | \n",
+ " Andrea J. Arratibel | \n",
+ "
\n",
+ " \n",
+ " 13 | \n",
+ " Politics | \n",
+ " More Humanitarian Organizations Will Harness A... | \n",
+ " https://www.wired.com/story/humanitarian-organ... | \n",
+ " David Miliband | \n",
+ "
\n",
+ " \n",
+ " 14 | \n",
+ " Environment | \n",
+ " Chocolate Has a Sustainability Problem. Scienc... | \n",
+ " https://www.wired.com/story/chocolate-has-a-su... | \n",
+ " Eve Thomas | \n",
+ "
\n",
+ " \n",
+ " 15 | \n",
+ " Energy | \n",
+ " Big Tech Will Scour the Globe in Its Search fo... | \n",
+ " https://www.wired.com/story/big-tech-data-cent... | \n",
+ " Azeem Azhar | \n",
+ "
\n",
+ " \n",
+ " 16 | \n",
+ " Environment | \n",
+ " Humans Will Continue to Live in an Age of Incr... | \n",
+ " https://www.wired.com/story/food-production-en... | \n",
+ " Vaclav Smil | \n",
+ "
\n",
+ " \n",
+ " 17 | \n",
+ " Energy | \n",
+ " A Uranium-Mining Boom Is Sweeping Through Texas | \n",
+ " https://www.wired.com/story/a-uranium-mining-b... | \n",
+ " Dylan Baddour | \n",
+ "
\n",
+ " \n",
+ " 18 | \n",
+ " Space | \n",
+ " The End Is Near for NASA’s Voyager Probes | \n",
+ " https://www.wired.com/story/the-end-is-near-fo... | \n",
+ " Luca Nardi | \n",
+ "
\n",
+ " \n",
+ " 19 | \n",
+ " Space | \n",
+ " The Mystery of How Supermassive Black Holes Merge | \n",
+ " https://www.wired.com/story/how-do-merging-sup... | \n",
+ " Jonathan O’Callaghan | \n",
+ "
\n",
+ " \n",
+ " 20 | \n",
+ " Space | \n",
+ " Starship’s Next Launch Could Be Just Two Weeks... | \n",
+ " https://www.wired.com/story/starships-next-lau... | \n",
+ " Eric Berger, Ars Technica | \n",
+ "
\n",
+ " \n",
+ " 21 | \n",
+ " Math | \n",
+ " The Simple Math Behind Public Key Cryptography | \n",
+ " https://www.wired.com/story/how-public-key-cry... | \n",
+ " John Pavlus | \n",
+ "
\n",
+ " \n",
+ " 22 | \n",
+ " Math | \n",
+ " Everyone Is Capable of Mathematical Thinking--... | \n",
+ " https://www.wired.com/story/everyone-is-capabl... | \n",
+ " Kelsey Houston-Edwards | \n",
+ "
\n",
+ " \n",
+ " 23 | \n",
+ " Math | \n",
+ " The Physics of the Macy’s Thanksgiving Day Par... | \n",
+ " https://www.wired.com/story/the-physics-of-the... | \n",
+ " Rhett Allain | \n",
+ "
\n",
+ " \n",
+ " 24 | \n",
+ " Math | \n",
+ " Mathematicians Just Debunked the ‘Bunkbed Conj... | \n",
+ " https://www.wired.com/story/maths-bunkbed-conj... | \n",
+ " Joseph Howlett | \n",
+ "
\n",
+ " \n",
+ " 25 | \n",
+ " Biotech | \n",
+ " Muscle Implants Could Allow Mind-Controlled Pr... | \n",
+ " https://www.wired.com/story/amputees-could-con... | \n",
+ " Emily Mullin | \n",
+ "
\n",
+ " \n",
+ " 26 | \n",
+ " Biotech | \n",
+ " Combining AI and Crispr Will Be Transformational | \n",
+ " https://www.wired.com/story/combining-ai-and-c... | \n",
+ " Jennifer Doudna | \n",
+ "
\n",
+ " \n",
+ " 27 | \n",
+ " Biotech | \n",
+ " Neuralink Plans to Test Whether Its Brain Impl... | \n",
+ " https://www.wired.com/story/neuralink-robotic-... | \n",
+ " Emily Mullin | \n",
+ "
\n",
+ " \n",
+ " 28 | \n",
+ " Public Health | \n",
+ " A Mysterious Respiratory Disease Has the Democ... | \n",
+ " https://www.wired.com/story/drc-mysterious-res... | \n",
+ " Marta Musso | \n",
+ "
\n",
+ " \n",
+ " 29 | \n",
+ " Health | \n",
+ " Skip the Sea Kelp Supplements | \n",
+ " https://www.wired.com/story/pass-on-sea-kelp-s... | \n",
+ " Boutayna Chokrane | \n",
+ "
\n",
+ " \n",
+ " 30 | \n",
+ " Sports | \n",
+ " Why Soccer Players Are Training in the Dark | \n",
+ " https://www.wired.com/story/why-soccer-players... | \n",
+ " RM Clark | \n",
+ "
\n",
+ " \n",
+ " 31 | \n",
+ " Environment | \n",
+ " A Parasite That Eats Cattle Alive Is Creeping ... | \n",
+ " https://www.wired.com/story/a-parasite-that-ea... | \n",
+ " Geraldine Castro | \n",
+ "
\n",
+ " \n",
+ " 32 | \n",
+ " Technology | \n",
+ " Lasers Are Making It Easier to Find Buried Lan... | \n",
+ " https://www.wired.com/story/this-laser-system-... | \n",
+ " Ritsuko Kawai | \n",
+ "
\n",
+ " \n",
+ " 33 | \n",
+ " Health | \n",
+ " Mark Cuban’s War on Drug Prices: ‘How Much Fuc... | \n",
+ " https://www.wired.com/story/big-interview-mark... | \n",
+ " Marah Eakin | \n",
+ "
\n",
+ " \n",
+ " 34 | \n",
+ " Environment | \n",
+ " Can Artificial Rain, Drones, or Satellites Cle... | \n",
+ " https://www.wired.com/story/artificial-rain-dr... | \n",
+ " Arunima Kar | \n",
+ "
\n",
+ " \n",
+ " 35 | \n",
+ " Health | \n",
+ " These Stem Cell Treatments Are Worth Millions.... | \n",
+ " https://www.wired.com/story/stem-cells-cost-ri... | \n",
+ " Matt Reynolds | \n",
+ "
\n",
+ " \n",
+ " 36 | \n",
+ " Environment | \n",
+ " The $60 Billion Potential Hiding in Your Disca... | \n",
+ " https://www.wired.com/story/a-dollar60-billion... | \n",
+ " Vince Beiser | \n",
+ "
\n",
+ " \n",
+ " 37 | \n",
+ " Health | \n",
+ " Tune In to the Healing Powers of a Decent Play... | \n",
+ " https://www.wired.com/story/music-therapy-heal... | \n",
+ " Daniel Levitin | \n",
+ "
\n",
+ " \n",
+ " 38 | \n",
+ " Human History | \n",
+ " The Whole Story of How Humans Evolved From Gre... | \n",
+ " https://www.wired.com/story/the-whole-story-of... | \n",
+ " John Gowlett | \n",
+ "
\n",
+ " \n",
+ " 39 | \n",
+ " Health | \n",
+ " Why an Offline Nuclear Reactor Led to Thousand... | \n",
+ " https://www.wired.com/story/why-an-offline-nuc... | \n",
+ " Chris Baraniuk | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
\n",
+ "
\n",
+ "
\n"
+ ],
+ "text/plain": [
+ " category title \\\n",
+ "0 Science NASA Postpones Return of Stranded Starliner As... \n",
+ "1 Public Health CDC Confirms First US Case of Severe Bird Flu \n",
+ "2 Science The Study That Called Out Black Plastic Utensi... \n",
+ "3 Health A Third Person Has Received a Transplant of a ... \n",
+ "4 Health Antibodies Could Soon Help Slow the Aging Process \n",
+ "5 Science Good at Reading? Your Brain May Be Structured ... \n",
+ "6 Environment Mega-Farms Are Driving the Threat of Bird Flu \n",
+ "7 Environment How Christmas Trees Could Become a Source of L... \n",
+ "8 Environment Creating a Global Package to Solve the Problem... \n",
+ "9 Environment These 3 Things Are Standing in the Way of a Gl... \n",
+ "10 Environment Environmental Sensing Is Here, Tracking Everyt... \n",
+ "11 Climate Generative AI and Climate Change Are on a Coll... \n",
+ "12 Climate Climate Change Is Destroying Monarch Butterfli... \n",
+ "13 Politics More Humanitarian Organizations Will Harness A... \n",
+ "14 Environment Chocolate Has a Sustainability Problem. Scienc... \n",
+ "15 Energy Big Tech Will Scour the Globe in Its Search fo... \n",
+ "16 Environment Humans Will Continue to Live in an Age of Incr... \n",
+ "17 Energy A Uranium-Mining Boom Is Sweeping Through Texas \n",
+ "18 Space The End Is Near for NASA’s Voyager Probes \n",
+ "19 Space The Mystery of How Supermassive Black Holes Merge \n",
+ "20 Space Starship’s Next Launch Could Be Just Two Weeks... \n",
+ "21 Math The Simple Math Behind Public Key Cryptography \n",
+ "22 Math Everyone Is Capable of Mathematical Thinking--... \n",
+ "23 Math The Physics of the Macy’s Thanksgiving Day Par... \n",
+ "24 Math Mathematicians Just Debunked the ‘Bunkbed Conj... \n",
+ "25 Biotech Muscle Implants Could Allow Mind-Controlled Pr... \n",
+ "26 Biotech Combining AI and Crispr Will Be Transformational \n",
+ "27 Biotech Neuralink Plans to Test Whether Its Brain Impl... \n",
+ "28 Public Health A Mysterious Respiratory Disease Has the Democ... \n",
+ "29 Health Skip the Sea Kelp Supplements \n",
+ "30 Sports Why Soccer Players Are Training in the Dark \n",
+ "31 Environment A Parasite That Eats Cattle Alive Is Creeping ... \n",
+ "32 Technology Lasers Are Making It Easier to Find Buried Lan... \n",
+ "33 Health Mark Cuban’s War on Drug Prices: ‘How Much Fuc... \n",
+ "34 Environment Can Artificial Rain, Drones, or Satellites Cle... \n",
+ "35 Health These Stem Cell Treatments Are Worth Millions.... \n",
+ "36 Environment The $60 Billion Potential Hiding in Your Disca... \n",
+ "37 Health Tune In to the Healing Powers of a Decent Play... \n",
+ "38 Human History The Whole Story of How Humans Evolved From Gre... \n",
+ "39 Health Why an Offline Nuclear Reactor Led to Thousand... \n",
+ "\n",
+ " link \\\n",
+ "0 https://www.wired.com/story/boeing-starliner-a... \n",
+ "1 https://www.wired.com/story/cdc-confirms-first... \n",
+ "2 https://www.wired.com/story/black-plastic-uten... \n",
+ "3 https://www.wired.com/story/a-third-person-has... \n",
+ "4 https://www.wired.com/story/antibodies-could-s... \n",
+ "5 https://www.wired.com/story/good-at-reading-yo... \n",
+ "6 https://www.wired.com/story/mega-farms-are-dri... \n",
+ "7 https://www.wired.com/story/how-christmas-tree... \n",
+ "8 https://www.wired.com/story/global-plastics-tr... \n",
+ "9 https://www.wired.com/story/these-3-things-are... \n",
+ "10 https://www.wired.com/story/environmental-sens... \n",
+ "11 https://www.wired.com/story/true-cost-generati... \n",
+ "12 https://www.wired.com/story/global-warming-thr... \n",
+ "13 https://www.wired.com/story/humanitarian-organ... \n",
+ "14 https://www.wired.com/story/chocolate-has-a-su... \n",
+ "15 https://www.wired.com/story/big-tech-data-cent... \n",
+ "16 https://www.wired.com/story/food-production-en... \n",
+ "17 https://www.wired.com/story/a-uranium-mining-b... \n",
+ "18 https://www.wired.com/story/the-end-is-near-fo... \n",
+ "19 https://www.wired.com/story/how-do-merging-sup... \n",
+ "20 https://www.wired.com/story/starships-next-lau... \n",
+ "21 https://www.wired.com/story/how-public-key-cry... \n",
+ "22 https://www.wired.com/story/everyone-is-capabl... \n",
+ "23 https://www.wired.com/story/the-physics-of-the... \n",
+ "24 https://www.wired.com/story/maths-bunkbed-conj... \n",
+ "25 https://www.wired.com/story/amputees-could-con... \n",
+ "26 https://www.wired.com/story/combining-ai-and-c... \n",
+ "27 https://www.wired.com/story/neuralink-robotic-... \n",
+ "28 https://www.wired.com/story/drc-mysterious-res... \n",
+ "29 https://www.wired.com/story/pass-on-sea-kelp-s... \n",
+ "30 https://www.wired.com/story/why-soccer-players... \n",
+ "31 https://www.wired.com/story/a-parasite-that-ea... \n",
+ "32 https://www.wired.com/story/this-laser-system-... \n",
+ "33 https://www.wired.com/story/big-interview-mark... \n",
+ "34 https://www.wired.com/story/artificial-rain-dr... \n",
+ "35 https://www.wired.com/story/stem-cells-cost-ri... \n",
+ "36 https://www.wired.com/story/a-dollar60-billion... \n",
+ "37 https://www.wired.com/story/music-therapy-heal... \n",
+ "38 https://www.wired.com/story/the-whole-story-of... \n",
+ "39 https://www.wired.com/story/why-an-offline-nuc... \n",
+ "\n",
+ " author \n",
+ "0 Fernanda González \n",
+ "1 Emily Mullin \n",
+ "2 Beth Mole, Ars Technica \n",
+ "3 Emily Mullin \n",
+ "4 Andrew Steele \n",
+ "5 Mikael Roll \n",
+ "6 Georgina Gustin \n",
+ "7 Alexa Phillips \n",
+ "8 Susan Solomon \n",
+ "9 Steve Fletcher and Samuel Winton \n",
+ "10 Sabrina Weiss \n",
+ "11 Sasha Luccioni \n",
+ "12 Andrea J. Arratibel \n",
+ "13 David Miliband \n",
+ "14 Eve Thomas \n",
+ "15 Azeem Azhar \n",
+ "16 Vaclav Smil \n",
+ "17 Dylan Baddour \n",
+ "18 Luca Nardi \n",
+ "19 Jonathan O’Callaghan \n",
+ "20 Eric Berger, Ars Technica \n",
+ "21 John Pavlus \n",
+ "22 Kelsey Houston-Edwards \n",
+ "23 Rhett Allain \n",
+ "24 Joseph Howlett \n",
+ "25 Emily Mullin \n",
+ "26 Jennifer Doudna \n",
+ "27 Emily Mullin \n",
+ "28 Marta Musso \n",
+ "29 Boutayna Chokrane \n",
+ "30 RM Clark \n",
+ "31 Geraldine Castro \n",
+ "32 Ritsuko Kawai \n",
+ "33 Marah Eakin \n",
+ "34 Arunima Kar \n",
+ "35 Matt Reynolds \n",
+ "36 Vince Beiser \n",
+ "37 Daniel Levitin \n",
+ "38 John Gowlett \n",
+ "39 Chris Baraniuk "
+ ]
+ },
+ "execution_count": 8,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "import pandas as pd\n",
+ "\n",
+ "# Convert dictionary to DataFrame\n",
+ "df = pd.DataFrame(response[\"result\"][\"news\"])\n",
+ "df"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "v0CBYVk7qA5Z"
+ },
+ "source": [
+ "Save it to CSV"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "BtEbB9pmQGhO",
+ "outputId": "09838ab0-9dd3-4386-e14f-5ce1c1f1a871"
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Data saved to wired_news.csv\n"
+ ]
+ }
+ ],
+ "source": [
+ "# Save the DataFrame to a CSV file\n",
+ "csv_file = \"wired_news.csv\"\n",
+ "df.to_csv(csv_file, index=False)\n",
+ "print(f\"Data saved to {csv_file}\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "-1SZT8VzTZNd"
+ },
+ "source": [
+ "## 🔗 Resources"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "dUi2LtMLRDDR"
+ },
+ "source": [
+ "\n",
+ "\n",
+ "
\n",
+ "
\n",
+ "\n",
+ "\n",
+ "- 🚀 **Get your API Key:** [ScrapeGraphAI Dashboard](https://dashboard.scrapegraphai.com) \n",
+ "- 🐙 **GitHub:** [ScrapeGraphAI GitHub](https://github.com/scrapegraphai) \n",
+ "- 💼 **LinkedIn:** [ScrapeGraphAI LinkedIn](https://www.linkedin.com/company/scrapegraphai/) \n",
+ "- 🐦 **Twitter:** [ScrapeGraphAI Twitter](https://twitter.com/scrapegraphai) \n",
+ "- 💬 **Discord:** [Join our Discord Community](https://discord.gg/uJN7TYcpNa) \n",
+ "- 🦙 **LlamaIndex:** [ScrapeGraph docs](https://docs.llamaindex.ai/en/stable/api_reference/tools/scrapegraph/)\n",
+ "\n",
+ "Made with ❤️ by the [ScrapeGraphAI](https://scrapegraphai.com) Team \n"
+ ]
+ }
+ ],
+ "metadata": {
+ "colab": {
+ "provenance": []
+ },
+ "kernelspec": {
+ "display_name": "Python 3",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.10.14"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}
\ No newline at end of file
diff --git a/cookbook/wired-news/scrapegraph_sdk.ipynb b/cookbook/wired-news/scrapegraph_sdk.ipynb
deleted file mode 100644
index 7a77ec1..0000000
--- a/cookbook/wired-news/scrapegraph_sdk.ipynb
+++ /dev/null
@@ -1 +0,0 @@
-{"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"provenance":[],"collapsed_sections":["IzsyDXEWwPVt"],"authorship_tag":"ABX9TyOuIM2TXW/T6WG0o+zLNRh+"},"kernelspec":{"name":"python3","display_name":"Python 3"},"language_info":{"name":"python"}},"cells":[{"cell_type":"markdown","source":["\n","
\n",""],"metadata":{"id":"ReBHQ5_834pZ"}},{"cell_type":"markdown","source":["## 🕷️ Extract Wired Science News with Official Scrapegraph SDK"],"metadata":{"id":"jEkuKbcRrPcK"}},{"cell_type":"markdown","source":[""],"metadata":{"id":"6Rgz5T4eHBVz"}},{"cell_type":"markdown","source":["### 🔧 Install `dependencies`"],"metadata":{"id":"IzsyDXEWwPVt"}},{"cell_type":"code","execution_count":null,"metadata":{"id":"os_vm0MkIxr9"},"outputs":[],"source":["%%capture\n","!pip install scrapegraph-py"]},{"cell_type":"markdown","source":["### 🔑 Import `ScrapeGraph` API key"],"metadata":{"id":"apBsL-L2KzM7"}},{"cell_type":"markdown","source":["You can find the Scrapegraph API key [here](https://dashboard.scrapegraphai.com/)"],"metadata":{"id":"ol9gQbAFkh9b"}},{"cell_type":"code","source":["import getpass\n","import os\n","\n","if not os.environ.get(\"SGAI_API_KEY\"):\n"," os.environ[\"SGAI_API_KEY\"] = getpass.getpass(\"Scrapegraph API key:\\n\")"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"sffqFG2EJ8bI","executionInfo":{"status":"ok","timestamp":1734531168564,"user_tz":-60,"elapsed":17535,"user":{"displayName":"ScrapeGraphAI","userId":"10474323355016263615"}},"outputId":"bcdd9ae1-151e-41d6-df36-8835c660460f"},"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["SGAI_API_KEY not found in environment.\n","Please enter your SGAI_API_KEY: ··········\n","SGAI_API_KEY has been set in the environment.\n"]}]},{"cell_type":"markdown","source":["### 📝 Defining an `Output Schema` for Webpage Content Extraction\n"],"metadata":{"id":"jnqMB2-xVYQ7"}},{"cell_type":"markdown","source":["If you already know what you want to extract from a webpage, you can **define an output schema** using **Pydantic**. This schema acts as a \"blueprint\" that tells the AI how to structure the response.\n","\n","\n"," Pydantic Schema Quick Guide
\n","\n","Types of Schemas \n","\n","1. Simple Schema \n","Use this when you want to extract straightforward information, such as a single piece of content. \n","\n","```python\n","from pydantic import BaseModel, Field\n","\n","# Simple schema for a single webpage\n","class PageInfoSchema(BaseModel):\n"," title: str = Field(description=\"The title of the webpage\")\n"," description: str = Field(description=\"The description of the webpage\")\n","\n","# Example Output JSON after AI extraction\n","{\n"," \"title\": \"ScrapeGraphAI: The Best Content Extraction Tool\",\n"," \"description\": \"ScrapeGraphAI provides powerful tools for structured content extraction from websites.\"\n","}\n","```\n","\n","2. Complex Schema (Nested) \n","If you need to extract structured information with multiple related items (like a list of repositories), you can **nest schemas**.\n","\n","```python\n","from pydantic import BaseModel, Field\n","from typing import List\n","\n","# Define a schema for a single repository\n","class RepositorySchema(BaseModel):\n"," name: str = Field(description=\"Name of the repository (e.g., 'owner/repo')\")\n"," description: str = Field(description=\"Description of the repository\")\n"," stars: int = Field(description=\"Star count of the repository\")\n"," forks: int = Field(description=\"Fork count of the repository\")\n"," today_stars: int = Field(description=\"Stars gained today\")\n"," language: str = Field(description=\"Programming language used\")\n","\n","# Define a schema for a list of repositories\n","class ListRepositoriesSchema(BaseModel):\n"," repositories: List[RepositorySchema] = Field(description=\"List of GitHub trending repositories\")\n","\n","# Example Output JSON after AI extraction\n","{\n"," \"repositories\": [\n"," {\n"," \"name\": \"google-gemini/cookbook\",\n"," \"description\": \"Examples and guides for using the Gemini API\",\n"," \"stars\": 8036,\n"," \"forks\": 1001,\n"," \"today_stars\": 649,\n"," \"language\": \"Jupyter Notebook\"\n"," },\n"," {\n"," \"name\": \"TEN-framework/TEN-Agent\",\n"," \"description\": \"TEN Agent is a conversational AI powered by TEN, integrating Gemini 2.0 Multimodal Live API, OpenAI Realtime API, RTC, and more.\",\n"," \"stars\": 3224,\n"," \"forks\": 311,\n"," \"today_stars\": 361,\n"," \"language\": \"Python\"\n"," }\n"," ]\n","}\n","```\n","\n","Key Takeaways \n","- **Simple Schema**: Perfect for small, straightforward extractions. \n","- **Complex Schema**: Use nesting to extract lists or structured data, like \"a list of repositories.\" \n","\n","Both approaches give the AI a clear structure to follow, ensuring that the extracted content matches exactly what you need.\n"," \n"],"metadata":{"id":"VZvxbjfXvbgd"}},{"cell_type":"code","source":["from pydantic import BaseModel, Field\n","from typing import List\n","\n","# Schema for a single news item\n","class NewsItemSchema(BaseModel):\n"," category: str = Field(description=\"Category of the news (e.g., 'Health', 'Environment')\")\n"," title: str = Field(description=\"Title of the news article\")\n"," link: str = Field(description=\"URL to the news article\")\n"," author: str = Field(description=\"Author of the news article\")\n","\n","# Schema that contains a list of news items\n","class ListNewsSchema(BaseModel):\n"," news: List[NewsItemSchema] = Field(description=\"List of news articles with their details\")"],"metadata":{"id":"dlrOEgZk_8V4"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["### 🚀 Initialize `SGAI Client` and start extraction"],"metadata":{"id":"cDGH0b2DkY63"}},{"cell_type":"markdown","source":["Initialize the client for scraping (there's also an async version [here](https://github.com/ScrapeGraphAI/scrapegraph-sdk/blob/main/scrapegraph-py/examples/async_smartscraper_example.py))"],"metadata":{"id":"4SLJgXgcob6L"}},{"cell_type":"code","source":["from scrapegraph_py import Client\n","\n","# Initialize the client with explicit API key\n","sgai_client = Client(api_key=sgai_api_key)"],"metadata":{"id":"PQI25GZvoCSk"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["Here we use `Smartscraper` service to extract structured data using AI from a webpage.\n","\n","\n","> If you already have an HTML file, you can upload it and use `Localscraper` instead.\n","\n","\n","\n"],"metadata":{"id":"M1KSXffZopUD"}},{"cell_type":"code","source":["# Request for Trending Repositories\n","repo_response = sgai_client.smartscraper(\n"," website_url=\"https://www.wired.com/category/science/\",\n"," user_prompt=\"Extract the first 10 news in the page\",\n"," output_schema=ListNewsSchema,\n",")"],"metadata":{"id":"2FIKomclLNFx"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["Print the response"],"metadata":{"id":"YZz1bqCIpoL8"}},{"cell_type":"code","source":["import json\n","\n","# Print the response\n","request_id = repo_response['request_id']\n","result = repo_response['result']\n","\n","print(f\"Request ID: {request_id}\")\n","print(\"Science News:\")\n","print(json.dumps(result, indent=2))"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"F1VfD8B4LPc8","executionInfo":{"status":"ok","timestamp":1734531480725,"user_tz":-60,"elapsed":207,"user":{"displayName":"ScrapeGraphAI","userId":"10474323355016263615"}},"outputId":"3085c572-15a8-4e69-b889-0eaab6ddeb33"},"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["Request ID: 6bf82e33-44af-4064-83c6-b447192d68da\n","Science News:\n","{\n"," \"news\": [\n"," {\n"," \"category\": \"Science\",\n"," \"title\": \"The Study That Called Out Black Plastic Utensils Had a Major Math Error\",\n"," \"link\": \"https://www.wired.com/story/black-plastic-utensils-study-math-error-correction/\",\n"," \"author\": \"Beth Mole, Ars Technica\"\n"," },\n"," {\n"," \"category\": \"Environment\",\n"," \"title\": \"Generative AI and Climate Change Are on a Collision Course\",\n"," \"link\": \"https://www.wired.com/story/true-cost-generative-ai-data-centers-energy/\",\n"," \"author\": \"Sasha Luccioni\"\n"," },\n"," {\n"," \"category\": \"Xenotransplantation\",\n"," \"title\": \"A Third Person Has Received a Transplant of a Genetically Engineered Pig Kidney\",\n"," \"link\": \"https://www.wired.com/story/a-third-person-has-received-a-transplant-of-a-genetically-engineered-pig-kidney/\",\n"," \"author\": \"Emily Mullin\"\n"," },\n"," {\n"," \"category\": \"Health\",\n"," \"title\": \"Antibodies Could Soon Help Slow the Aging Process\",\n"," \"link\": \"https://www.wired.com/story/antibodies-could-soon-help-slow-the-aging-process/\",\n"," \"author\": \"Andrew Steele\"\n"," },\n"," {\n"," \"category\": \"Science\",\n"," \"title\": \"Good at Reading? Your Brain May Be Structured Differently\",\n"," \"link\": \"https://www.wired.com/story/good-at-reading-your-brain-may-be-structured-differently/\",\n"," \"author\": \"Mikael Roll\"\n"," },\n"," {\n"," \"category\": \"Health\",\n"," \"title\": \"Mega-Farms Are Driving the Threat of Bird Flu\",\n"," \"link\": \"https://www.wired.com/story/mega-farms-are-driving-the-threat-of-bird-flu/\",\n"," \"author\": \"Georgina Gustin\"\n"," },\n"," {\n"," \"category\": \"Health\",\n"," \"title\": \"RFK Plans to Take on Big Pharma. It\\u2019s Easier Said Than Done\",\n"," \"link\": \"https://www.wired.com/story/rfks-plan-to-take-on-big-pharma/\",\n"," \"author\": \"Emily Mullin\"\n"," },\n"," {\n"," \"category\": \"Environment\",\n"," \"title\": \"How Christmas Trees Could Become a Source of Low-Carbon Protein\",\n"," \"link\": \"https://www.wired.com/story/how-christmas-trees-could-become-a-source-of-low-carbon-protein/\",\n"," \"author\": \"Alexa Phillips\"\n"," },\n"," {\n"," \"category\": \"Environment\",\n"," \"title\": \"Creating a Global Package to Solve the Problem of Plastics\",\n"," \"link\": \"https://www.wired.com/story/global-plastics-treaty-united-nations/\",\n"," \"author\": \"Susan Solomon\"\n"," },\n"," {\n"," \"category\": \"Environment\",\n"," \"title\": \"These 3 Things Are Standing in the Way of a Global Plastics Treaty\",\n"," \"link\": \"https://www.wired.com/story/these-3-things-are-standing-in-the-way-of-a-global-plastics-treaty/\",\n"," \"author\": \"Steve Fletcher and Samuel Winton\"\n"," }\n"," ]\n","}\n"]}]},{"cell_type":"markdown","source":["### 💾 Save the output to a `CSV` file"],"metadata":{"id":"2as65QLypwdb"}},{"cell_type":"markdown","source":["Let's create a pandas dataframe and show the table with the extracted content"],"metadata":{"id":"HTLVFgbVLLBR"}},{"cell_type":"code","source":["import pandas as pd\n","\n","# Convert dictionary to DataFrame\n","df = pd.DataFrame(result[\"news\"])\n","df"],"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":363},"id":"1lS9O1KOI51y","executionInfo":{"status":"ok","timestamp":1734531552354,"user_tz":-60,"elapsed":232,"user":{"displayName":"ScrapeGraphAI","userId":"10474323355016263615"}},"outputId":"b4eb35de-4b8c-4ede-bdaf-db95367d8ca6"},"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":[" category title \\\n","0 Science The Study That Called Out Black Plastic Utensi... \n","1 Environment Generative AI and Climate Change Are on a Coll... \n","2 Xenotransplantation A Third Person Has Received a Transplant of a ... \n","3 Health Antibodies Could Soon Help Slow the Aging Process \n","4 Science Good at Reading? Your Brain May Be Structured ... \n","5 Health Mega-Farms Are Driving the Threat of Bird Flu \n","6 Health RFK Plans to Take on Big Pharma. It’s Easier S... \n","7 Environment How Christmas Trees Could Become a Source of L... \n","8 Environment Creating a Global Package to Solve the Problem... \n","9 Environment These 3 Things Are Standing in the Way of a Gl... \n","\n"," link \\\n","0 https://www.wired.com/story/black-plastic-uten... \n","1 https://www.wired.com/story/true-cost-generati... \n","2 https://www.wired.com/story/a-third-person-has... \n","3 https://www.wired.com/story/antibodies-could-s... \n","4 https://www.wired.com/story/good-at-reading-yo... \n","5 https://www.wired.com/story/mega-farms-are-dri... \n","6 https://www.wired.com/story/rfks-plan-to-take-... \n","7 https://www.wired.com/story/how-christmas-tree... \n","8 https://www.wired.com/story/global-plastics-tr... \n","9 https://www.wired.com/story/these-3-things-are... \n","\n"," author \n","0 Beth Mole, Ars Technica \n","1 Sasha Luccioni \n","2 Emily Mullin \n","3 Andrew Steele \n","4 Mikael Roll \n","5 Georgina Gustin \n","6 Emily Mullin \n","7 Alexa Phillips \n","8 Susan Solomon \n","9 Steve Fletcher and Samuel Winton "],"text/html":["\n"," \n","
\n","\n","
\n"," \n"," \n"," | \n"," category | \n"," title | \n"," link | \n"," author | \n","
\n"," \n"," \n"," \n"," 0 | \n"," Science | \n"," The Study That Called Out Black Plastic Utensi... | \n"," https://www.wired.com/story/black-plastic-uten... | \n"," Beth Mole, Ars Technica | \n","
\n"," \n"," 1 | \n"," Environment | \n"," Generative AI and Climate Change Are on a Coll... | \n"," https://www.wired.com/story/true-cost-generati... | \n"," Sasha Luccioni | \n","
\n"," \n"," 2 | \n"," Xenotransplantation | \n"," A Third Person Has Received a Transplant of a ... | \n"," https://www.wired.com/story/a-third-person-has... | \n"," Emily Mullin | \n","
\n"," \n"," 3 | \n"," Health | \n"," Antibodies Could Soon Help Slow the Aging Process | \n"," https://www.wired.com/story/antibodies-could-s... | \n"," Andrew Steele | \n","
\n"," \n"," 4 | \n"," Science | \n"," Good at Reading? Your Brain May Be Structured ... | \n"," https://www.wired.com/story/good-at-reading-yo... | \n"," Mikael Roll | \n","
\n"," \n"," 5 | \n"," Health | \n"," Mega-Farms Are Driving the Threat of Bird Flu | \n"," https://www.wired.com/story/mega-farms-are-dri... | \n"," Georgina Gustin | \n","
\n"," \n"," 6 | \n"," Health | \n"," RFK Plans to Take on Big Pharma. It’s Easier S... | \n"," https://www.wired.com/story/rfks-plan-to-take-... | \n"," Emily Mullin | \n","
\n"," \n"," 7 | \n"," Environment | \n"," How Christmas Trees Could Become a Source of L... | \n"," https://www.wired.com/story/how-christmas-tree... | \n"," Alexa Phillips | \n","
\n"," \n"," 8 | \n"," Environment | \n"," Creating a Global Package to Solve the Problem... | \n"," https://www.wired.com/story/global-plastics-tr... | \n"," Susan Solomon | \n","
\n"," \n"," 9 | \n"," Environment | \n"," These 3 Things Are Standing in the Way of a Gl... | \n"," https://www.wired.com/story/these-3-things-are... | \n"," Steve Fletcher and Samuel Winton | \n","
\n"," \n","
\n","
\n","
\n","
\n"],"application/vnd.google.colaboratory.intrinsic+json":{"type":"dataframe","variable_name":"df","summary":"{\n \"name\": \"df\",\n \"rows\": 10,\n \"fields\": [\n {\n \"column\": \"category\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 4,\n \"samples\": [\n \"Environment\",\n \"Health\",\n \"Science\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"title\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 10,\n \"samples\": [\n \"Creating a Global Package to Solve the Problem of Plastics\",\n \"Generative AI and Climate Change Are on a Collision Course\",\n \"Mega-Farms Are Driving the Threat of Bird Flu\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"link\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 10,\n \"samples\": [\n \"https://www.wired.com/story/global-plastics-treaty-united-nations/\",\n \"https://www.wired.com/story/true-cost-generative-ai-data-centers-energy/\",\n \"https://www.wired.com/story/mega-farms-are-driving-the-threat-of-bird-flu/\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"author\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 9,\n \"samples\": [\n \"Susan Solomon\",\n \"Sasha Luccioni\",\n \"Georgina Gustin\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}"}},"metadata":{},"execution_count":8}]},{"cell_type":"markdown","source":["Save it to CSV"],"metadata":{"id":"v0CBYVk7qA5Z"}},{"cell_type":"code","source":["# Save the DataFrame to a CSV file\n","csv_file = \"wired_news.csv\"\n","df.to_csv(csv_file, index=False)\n","print(f\"Data saved to {csv_file}\")"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"BtEbB9pmQGhO","executionInfo":{"status":"ok","timestamp":1734531564909,"user_tz":-60,"elapsed":215,"user":{"displayName":"ScrapeGraphAI","userId":"10474323355016263615"}},"outputId":"9fedc5ee-3009-45a1-ad44-6e77bee574ea"},"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["Data saved to wired_news.csv\n"]}]},{"cell_type":"markdown","source":["## 🔗 Resources"],"metadata":{"id":"-1SZT8VzTZNd"}},{"cell_type":"markdown","source":["\n","\n","
\n","
\n","\n","\n","- 🚀 **Get your API Key:** [ScrapeGraphAI Dashboard](https://dashboard.scrapegraphai.com) \n","- 🐙 **GitHub:** [ScrapeGraphAI GitHub](https://github.com/scrapegraphai) \n","- 💼 **LinkedIn:** [ScrapeGraphAI LinkedIn](https://www.linkedin.com/company/scrapegraphai/) \n","- 🐦 **Twitter:** [ScrapeGraphAI Twitter](https://twitter.com/scrapegraphai) \n","- 💬 **Discord:** [Join our Discord Community](https://discord.gg/uJN7TYcpNa) \n","\n","Made with ❤️ by the [ScrapeGraphAI](https://scrapegraphai.com) Team \n"],"metadata":{"id":"dUi2LtMLRDDR"}}]}
\ No newline at end of file
diff --git a/scrapegraph-py/.gitignore b/scrapegraph-py/.gitignore
index ff84231..aca520b 100644
--- a/scrapegraph-py/.gitignore
+++ b/scrapegraph-py/.gitignore
@@ -145,5 +145,6 @@ cython_debug/
# macOS
.DS_Store
+**/.DS_Store
dev.ipynb
\ No newline at end of file
diff --git a/scrapegraph-py/cookbook/README.md b/scrapegraph-py/cookbook/README.md
deleted file mode 100644
index fe1cb2c..0000000
--- a/scrapegraph-py/cookbook/README.md
+++ /dev/null
@@ -1,9 +0,0 @@
-## 📚 Official Cookbook
-
-Looking for examples and guides? Then head over to the official ScrapeGraph SDK [Cookbook](https://github.com/ScrapeGraphAI/scrapegraph-sdk/tree/main/cookbook)!
-
-The cookbook provides step-by-step instructions, practical examples, and tips to help you get started and make the most out of ScrapeGraph SDK.
-
-You will find some colab notebooks with our partners as well, such as Langchain 🦜 and LlamaIndex 🦙
-
-Happy scraping! 🚀