Scraper

Generalized scraper for BIDS institutional ecosystem mapping. See below for types of spiders.

Installation

Clone the repository.

git clone [email protected]:BIDS-projects/scraper

Setup your virtual environment. The following will create a new environment called scraper.

conda create -n scraper python=2.7

Activate your virtual environment, and install all dependencies from requirements.txt.

source activate scraper
pip install -r requirements.txt

Installation complete. See "How to Use" to get started.

How to Use

Make sure to activate your virtual environment, if you haven't already. (If you are in the environment, your prompt will be prefixed by (scraper))

source activate scraper

To run a spider, use the following, where project is the directory for your project, and spider is the name of the spider.

make crawl project=[project] spider=[spider]

See below for more information about each spider, and specific instructions for how to use each.

Web Labs Raw

The raw spider saves raw HTML and a many-to-many connecting webpages with links. To launch labs/labs/spiders/dlab.py, use

make crawl project=labs/labs spider=weblabs

Deployment

Some sections are specific to BIDS IEM team members.

ScrapingHub

Deploy using make deploy path=[target], where target is the path to the directory containing your spider.

Production

You must have an account on Mercury, setup through BIDS IEM. SSH onto server.

ssh [username]@mercury.dlab.berkeley.edu

[More instructions coming soon]

Name		Name	Last commit message	Last commit date
Latest commit History 145 Commits
labs		labs
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
makefile		makefile
requirements.txt		requirements.txt
research-outline.md		research-outline.md
todo.md		todo.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Scraper

Installation

How to Use

Web Labs Raw

Deployment

ScrapingHub

Production

About

Uh oh!

Releases

Packages

Contributors 5

Uh oh!

Languages

BIDS-projects/scraper

Folders and files

Latest commit

History

Repository files navigation

Scraper

Installation

How to Use

Web Labs Raw

Deployment

ScrapingHub

Production

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Uh oh!

Languages

Packages