This project is a web scraping and caching application built with Flask and Scrapy. It fetches product information from two different websites, caches the data, and serves it through a web interface. The cache is updated periodically to ensure the data remains fresh.
- Scrapes product data (images and titles) from two websites.
- Caches scraped data locally to reduce redundant network requests.
- Background thread for periodic cache updates.
- Flask-based web interface to display the data.
- Organized and modular codebase for maintainability.
- Python 3.7+
- pip (Python package manager)
-
Clone the repository:
https://github.com/aryala7/BigBossScraper.git cd flask-scrapy-caching
-
Set up a virtual environment (optional but recommended):
python3 -m venv venv source venv/bin/activate # On Windows, use venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
-
Start the application:
python app.py
-
Open your browser and navigate to
http://127.0.0.1:5000/
to view the application.
.
├── app.py # Main Flask application
├── cache_manager.py # Handles data scraping and caching logic
├── templates/ # HTML templates for the web interface
├── static/ # Static assets (CSS, JS, images)
├── requirements.txt # Python dependencies
└── README.md # Project documentation
- Displays a random sample of 20 products from each category.
- Displays all products for a specified category (
glas
orfliese
).
- Cache expiration: The cache is updated every 15 minutes by default. You can modify
CACHE_EXPIRATION
incache_manager.py
to change this interval.
- Create a new branch:
git checkout -b feature-name
- Make your changes and commit them:
git commit -am "Add new feature"
- Push the branch and create a pull request:
git push origin feature-name
Currently, there are no automated tests included. You can add tests using frameworks like pytest
or unittest
.
This project is licensed under the MIT License. See the LICENSE
file for details.
Contributions are welcome! Feel free to open issues or submit pull requests with improvements or fixes.
- Flask: For providing a lightweight web framework.
- Scrapy: For its powerful web scraping capabilities.
- Requests: For handling HTTP requests.