Skip to content

ScrapeSmart is a MERN-based Amazon Smart TV scraper that extracts product details, pricing, offers, and AI-generated review summaries. Built with Node.js, Puppeteer, Express, React.js, and MongoDB, it provides an interactive UI for seamless product data retrieval and storage.

License

Notifications You must be signed in to change notification settings

RyomenDev/ScrapeSmart

Repository files navigation

ScrapeSmart

ScrapeSmart is a MERN-based Amazon Smart TV scraper that extracts product details, pricing, offers, and AI-generated review summaries. Built with Node.js, Puppeteer, Express, React.js, and MongoDB, it provides an interactive UI for seamless product data retrieval and storage.

Description

Image

1. Backend (Node.js + Express.js)

  • Use Puppeteer (for JavaScript-rendered pages) to scrape Amazon product details.
  • Store the extracted data in MongoDB.
  • Implement an API endpoint (/scrape) to trigger the scraper.

2. Frontend (React.js)

  • Create a UI to input the Amazon product link.
  • Display scraped product details (name, price, images, offers, etc.).
  • Show an AI-generated summary of customer reviews using Gemini API, instead of OpenAI, as OpenAI does not offer a free tier.

3. Database (MongoDB)

Store scraped product data for future reference.

Tech Stack:

  • Backend: Node.js, Express.js, Puppeteer
  • Frontend: React.js, Tailwind CSS
  • Database: MongoDB (to store scraped data)
  • AI Summary: Gemini API for review summarization
  • Api & Documentation: Swagger (OPENAPI)

🚀 Features

✅ Extracts product name, price, ratings, and discount information
✅ Fetches bank offers, "About This Item" section, and product specifications
✅ Scrapes product images and manufacturer details
✅ AI-generated customer review summary (Gemini API)
✅ Interactive UI for entering and displaying scraped product details
✅ Stores scraped data in MongoDB for easy retrieval

🔍 How It Works

  • Enter an Amazon Smart TV product link in the UI.
  • Click Scrape to fetch product details.
  • View structured product data with AI-generated review insights.
  • Data is stored in MongoDB for future reference.

Implementation

1. Scraper Utility

  • The scraper function extracts product details from a given URL.
  • Extracted data includes:
    • Name
    • Rating & Number of Ratings
    • Price & Discount
    • Bank Offers
    • About Information
    • Product Specifications
    • Images & Manufacturer Images
    • Customer Reviews

2. Generating Review Summary with OpenAI

  • The extracted reviews are processed using OpenAI's API to generate a concise review summary.
  • Uses Gemini (generateReviewSummaryGemini) instead of OpenAI, as OpenAI does not offer a free tier.
  • The generated summary provides a quick insight into customer opinions.

3. Saving Product Data

  • After scraping and processing the reviews, the product details (including the generated summary) are stored in the database using MongoDB.
  • A new product instance is created and saved asynchronously.

HOW TO RUN

1️⃣ Create .env File

Define your environment variables:

♣️ A. Client

VITE_SERVER_URL="http://localhost:5000"

♣️ B. Server

VITE_SERVER_URL=http://localhost:5000
MONGO_URI=mongodb://mongodb_container:27017/scrapesmart # if using Container image
# MONGO_URI=mongodb+srv://yourUsername:[email protected]/yourDatabase # if using cloud Db
# # MONGO_URI="mongodb://localhost:27017/" # If use locally setup DB
PORT=5000
SERVER_URL="http://localhost:5000"
OPENAI_API_KEY="sk-proj-" # get from [Google AI for Developers](https://ai.google.dev/)
GEMINI_API_KEY="AIz.."

2️⃣ Run Everything

docker-compose up --build
docker-compose down # – Stops and removes all containers, networks, and volumes defined in the docker-compose.yml file.

TryOut Links


References

About

ScrapeSmart is a MERN-based Amazon Smart TV scraper that extracts product details, pricing, offers, and AI-generated review summaries. Built with Node.js, Puppeteer, Express, React.js, and MongoDB, it provides an interactive UI for seamless product data retrieval and storage.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published