news-scraper

Web-scraper for news articles from BBC News using Scrapy, a small personal project for scraping, please refrain from using this for commercial purposes
NOTE: Requires virtualenv, virtualenvwrapper

Installation

Fork this repository: $ git clone https://github.com/Ocre42/news-scraper.git
$ cd news-scraper\Scraper
$ pip install Scrapy

Settings.py

Identify yourself on with USER_AGENT
Make sure ROBOTSTXT_OBEY is True
You can modify the DOWNLOAD_DELAY and AUTOTHROTTLE_ENABLED, default should be 1 second per download

Usage

$ cd Scraper
$ scrapy crawl news
Or you can save the results into files such as json
$ scrapy crawl news -o results.json

Enjoy and crawl responsibly!

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Scraper		Scraper
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

news-scraper

Installation

Settings.py

Usage

About

Releases

Packages

Languages

Ocre42/news-scraper

Folders and files

Latest commit

History

Repository files navigation

news-scraper

Installation

Settings.py

Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages