Skip to content

Latest commit

 

History

History
25 lines (18 loc) · 842 Bytes

README.md

File metadata and controls

25 lines (18 loc) · 842 Bytes

news-scraper

Web-scraper for news articles from BBC News using Scrapy, a small personal project for scraping, please refrain from using this for commercial purposes
NOTE: Requires virtualenv, virtualenvwrapper

Installation

  • Fork this repository: $ git clone https://github.com/Ocre42/news-scraper.git
  • $ cd news-scraper\Scraper
  • $ pip install Scrapy

Settings.py

  • Identify yourself on with USER_AGENT
  • Make sure ROBOTSTXT_OBEY is True
  • You can modify the DOWNLOAD_DELAY and AUTOTHROTTLE_ENABLED, default should be 1 second per download

Usage

  • $ cd Scraper
  • $ scrapy crawl news
  • Or you can save the results into files such as json
  • $ scrapy crawl news -o results.json

Enjoy and crawl responsibly!