Skip to content

An AI webscraper platform using Python, selenium, ollama

Notifications You must be signed in to change notification settings

CipherEnigma/PyCrawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Web Scraping Project

Overview

This project is a web scraping tool designed to extract data from websites using Selenium and Python. It includes scripts to parse, scrape, and manage the extracted data efficiently.

Project Files

  • main.py: The entry point of the project. This script coordinates the scraping and parsing processes.
  • scrape.py: Contains functions and logic for performing web scraping tasks using Selenium.
  • parse.py: Handles the parsing of scraped data into structured formats (e.g., JSON, CSV).
  • requirements.txt: Lists all the Python dependencies required for the project.
  • chromedriver.exe: The ChromeDriver executable required by Selenium for browser automation.

Prerequisites

  1. Python: Ensure you have Python 3.x installed on your system.
  2. Google Chrome: The version of Chrome must match the version of chromedriver.exe.
  3. Selenium: Install Selenium via pip (or ensure it’s listed in requirements.txt).

Installation

  1. Clone the repository:

    git clone https://github.com/your-repo/web-scraping-tool.git
    cd web-scraping-tool
  2. Install dependencies:

    pip install -r requirements.txt
  3. Ensure chromedriver.exe is in the project root or accessible via your system's PATH.

Usage

  1. Update the configuration in main.py or scrape.py to specify the target website and scraping parameters.

  2. Run the main script:

    python main.py
  3. The parsed data will be saved to the specified output file or displayed on the console, depending on the implementation in parse.py.

Notes

  • Ensure compliance with the website's terms of service before scraping.
  • You may need to update chromedriver.exe if Chrome gets updated.

License

This project is open-source. You can modify and distribute it as per the terms of the MIT License.


Let me know if you need further customization!

About

An AI webscraper platform using Python, selenium, ollama

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages