A lightweight and modular web scraping library built with Python. This project provides a simple interface for fetching and parsing web content.
- Clean and modular architecture
- HTML content parsing
- Mock fetcher for testing and development
- Easy-to-use API
├── demo.py # Demo script showing usage
├── scraper/ # Main package directory
│ ├── __init__.py # Package initialization
│ ├── main.py # Main scraping coordinator
│ ├── parser.py # HTML content parser
│ └── fetcher.py # Web content fetcher
Clone the repository:
git clone https://github.com/dimikarl2022/python-web-scraper.git
cd python-web-scraper
Basic usage example:
from scraper import scrape_website
# Scrape a website
url = "https://example.com"
content = scrape_website(url)
# Print extracted content
for item in content:
print(item)
Run the demo script:
python demo.py
- WebFetcher: Handles webpage content retrieval
- ContentParser: Parses HTML content and extracts text
- Main Scraper: Coordinates fetching and parsing operations
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.