🧠 Philosophy Web Scraper 🌐

Welcome to the Philosophy Web Scraper project! This repository contains Python scripts and resources for efficiently scraping data about various philosophers and their philosophies from Wikipedia.

📂 Project Structure

source.py: for page source extraction
Extraction.ipynb & site_extraction.ipynb: for extraction of text from page source.
site_scraper/: A Scrapy project folder for organized web scraping.

📜 Extraction Methods

Beautiful Soup:
- Parsed HTML files to extract relevant paragraph information.
- Utilized the requests module for direct scraping from the Wikipedia website.
Scrapy:
- Employed for structured scraping of additional philosopher data, including images and captions.
- To run the Scrapy crawler, navigate to the site_scraper folder, then proceed to the spiders directory.
- Execute the desired crawler with the following command:
```
scrapy crawl <crawler_name> -o output.json
```
- Note: The name of the crawler can be found in the corresponding file. You can output the data in various formats such as JSON, CSV, etc.

🚀 Getting Started

Prerequisites

Make sure you have the following Python packages installed in your system:

beautifulsoup4
requests
lxml
scrapy

You can install these packages using pip:

pip install beautifulsoup4 requests lxml scrapy

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
site_scraper		site_scraper
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
extraction.ipynb		extraction.ipynb
site_extraction.ipynb		site_extraction.ipynb
source.py		source.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧠 Philosophy Web Scraper 🌐

📂 Project Structure

📜 Extraction Methods

🚀 Getting Started

Prerequisites

About

Releases

Packages

Languages

License

MONARCH1108/web_scraper

Folders and files

Latest commit

History

Repository files navigation

🧠 Philosophy Web Scraper 🌐

📂 Project Structure

📜 Extraction Methods

🚀 Getting Started

Prerequisites

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages