Amazon Best Sellers Web Scraper

Overview

This Python script uses Selenium to scrape product information from Amazon's Best Sellers section. It focuses on products offering discounts greater than 50% in 10 different categories and saves the data into structured formats (CSV or JSON). The script automates login using valid Amazon credentials and extracts key product details from each category.

Features

Authentication: Logs in to Amazon using provided credentials.
Data Collection: Scrapes details of up to 1500 best-selling products from each category.
- Product Name
- Product Price
- Sale Discount
- Best Seller Rating
- Ship From
- Sold By
- Rating
- Product Description
- Number Bought in the Past Month (if available)
- Category Name
- All Available Images
Error Handling: Robust handling of missing elements, timeouts, and page load issues.
Data Storage: Saves scraped data into a CSV or JSON file for analysis.

Prerequisites

Python: Install Python 3.7 or later.
Libraries:
- Selenium: Install using pip install selenium.
WebDriver:
- Download the appropriate WebDriver (e.g., ChromeDriver) and ensure it's in your system PATH.
Amazon Account: Provide valid Amazon credentials for authentication.

Setup Instructions

Clone the Repository:

git clone <repository_url>
cd amazon-scraper

Install Dependencies:
```
pip install selenium
```
Download WebDriver:
- Download ChromeDriver from here.
- Place it in your system PATH or the script directory.
Update Credentials:
- Replace [email protected] and your_password in the script with your Amazon login credentials.
Run the Script:
```
python amazon_scraper.py
```

How It Works

Authentication:
- The script navigates to the Amazon login page and authenticates using the provided email and password.
Category Navigation:
- Visits the URLs of the 10 specified Best Seller categories.
Data Extraction:
- Collects product details, including the name, price, rating, and more.
- Skips products with missing or inaccessible data.
Data Storage:
- Saves the scraped data as amazon_best_sellers.csv or amazon_best_sellers.json in the script's directory.

Output Format

CSV File:
- Columns include Name, Price, Discount, Rating, Ship From, Sold By, etc.
JSON File:
- Structured JSON with the same details.

Example URLs

Best Seller Section:
- Best Sellers
Sample Categories:
- Kitchen
- Shoes
- Computers
- Electronics

Notes

Scraping Amazon may violate their Terms of Service. Ensure you comply with their policies.
If the page structure changes, you may need to update the script's XPath or CSS selectors.

Troubleshooting

Login Issues:
- Ensure your credentials are correct.
- Check for CAPTCHA prompts during login.
Missing WebDriver:
- Verify that ChromeDriver is installed and in your PATH.
Slow Page Load:
- Increase wait times using Selenium's WebDriverWait.
Blocked Requests:
- Reduce the scraping speed to avoid being flagged by Amazon.

License

This project is for educational purposes only. Use responsibly and adhere to Amazon's terms of service.

Contact

For questions or suggestions, reach out to: [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
LICENSE		LICENSE
README.md		README.md
Task.py		Task.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Amazon Best Sellers Web Scraper

Overview

Features

Prerequisites

Setup Instructions

How It Works

Output Format

Example URLs

Notes

Troubleshooting

License

Contact

About

Releases

Packages

Languages

License

venkat-0706/Amazon-WebScraper

Folders and files

Latest commit

History

Repository files navigation

Amazon Best Sellers Web Scraper

Overview

Features

Prerequisites

Setup Instructions

How It Works

Output Format

Example URLs

Notes

Troubleshooting

License

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages