Disclaimer: This is a personal project and is not affiliated with, endorsed by, or connected to my employer (AWS) or Amazon.com, Inc. in any way.
An intelligent web scraper that monitors Whole Foods organic egg availability on Amazon and sends email notifications when stock is found. Built with enterprise-grade reliability, anti-bot detection, and proxy rotation.
-
Stealth Browser Automation
- Headless Chrome powered by Playwright with anti-bot detection
- Randomized user agents and delays to mimic human behavior
- Proxy rotation to avoid IP blocks
- Automatic CAPTCHA detection and retry logic
-
Robust Location Handling
- Automatically updates delivery location
- Multiple selector fallbacks for UI changes
- Keyboard navigation backup for click failures
-
Smart Stock Detection
- Parses product listings for organic eggs
- Detects "Out of Stock" status across different UI patterns
- Screenshot capture for debugging and verification
-
Notification System
- Email notifications via Gmail SMTP
- Detailed product availability reports
- Error alerts with diagnostic information
- TypeScript - Type safety and modern JavaScript features
- Playwright - Modern browser automation
- Node.js - Runtime environment
- GitHub Actions - CI/CD and scheduled runs
- Nodemailer - Email notifications
-
Anti-Bot Measures
- Implemented sophisticated browser fingerprint randomization
- Built proxy rotation system for IP address diversity
- Added human-like behavior patterns and delays
-
Resilient Automation
- Developed multi-stage fallback mechanisms for UI interactions
- Created comprehensive error handling and retry logic
- Built intelligent element selection strategies
-
Production-Grade Architecture
- Structured logging with rotation and error tracking
- Environment-based configuration management
- Modular code design for maintainability
-
Web Scraping Resilience
- Amazon's dynamic UI requires flexible selector strategies
- Bot detection systems need constant adaptation
- Proxy quality significantly impacts reliability
-
Browser Automation
- Playwright offers superior stability vs Puppeteer
- Headless browsers require careful resource management
- Error handling needs to account for network/UI/timing issues
-
Infrastructure
- GitHub Actions provides reliable scheduling
- Proxy servers are crucial for production scraping
- Logging is essential for debugging production issues
- Clone the repository
- Install dependencies:
npm install
-
Create a
.env
file based on.env.example
. Required Environment Variables:PROXY_SERVER
- Proxy server URLPROXY_USERNAME
- Proxy authentication usernamePROXY_PASSWORD
- Proxy authentication passwordEMAIL_USER
- Gmail addressEMAIL_APP_PASSWORD
- Gmail app-specific passwordNOTIFICATION_EMAIL
- Recipient email address
-
Run the script:
npm start
Run the script with email testing:
npm start -- --test-email
This project is designed to run on GitHub Actions:
- Fork this repository
- Configure repository secrets:
- All environment variables listed above
- GitHub Actions will:
- Run every 3 hours automatically
- Can be triggered manually via workflow_dispatch
- Install dependencies and Playwright
- Execute the stock check
- Send email notifications if stock is found
The script includes comprehensive debugging features:
- Screenshots saved on each check
- Detailed logging with Winston
- Visual browser mode for local debugging
- Error screenshots for failed location updates