Skip to content

Web-scraper for news articles from BBC News, a small personal project for scraping.

Notifications You must be signed in to change notification settings

Ocre42/news-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

news-scraper

Web-scraper for news articles from BBC News using Scrapy, a small personal project for scraping, please refrain from using this for commercial purposes
NOTE: Requires virtualenv, virtualenvwrapper

Installation

  • Fork this repository: $ git clone https://github.com/Ocre42/news-scraper.git
  • $ cd news-scraper\Scraper
  • $ pip install Scrapy

Settings.py

  • Identify yourself on with USER_AGENT
  • Make sure ROBOTSTXT_OBEY is True
  • You can modify the DOWNLOAD_DELAY and AUTOTHROTTLE_ENABLED, default should be 1 second per download

Usage

  • $ cd Scraper
  • $ scrapy crawl news
  • Or you can save the results into files such as json
  • $ scrapy crawl news -o results.json

Enjoy and crawl responsibly!

About

Web-scraper for news articles from BBC News, a small personal project for scraping.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages