forked from AmbientLighter/rpn-fas
-
Notifications
You must be signed in to change notification settings - Fork 0
/
README
35 lines (25 loc) · 1.94 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
Parser for open government data at http://rnp.fas.gov.ru/
Parsed data available at scraped_items.csv
Parsed at 26.08.2010 (now it's changed)
The program works in scrapy framework (scrapy.org), version 0.8 and 0.9.
In order to run parser, type ./scrapy-ctl.py runspider fas/spiders/rnp.py
------------------------------------------------
This program itself is cross-platform, as it has been tested by developer with following settings:
- Linux Ubuntu 10.04 (Lucid Lynx) + Python 2.6, both Scrapy 0.8 and 0.9, Twisted 10.0.0
- Windows Vista + Python 2.6, both Scrapy 0.8 and 0.9, Twisted 10.1.0
Note that it doesn't claim to work with scrapy 0.10!
------------------------------------------------
Windows users should follow these instructions:
- Install Python 2.6 (http://www.python.org/ftp/python/2.6/python-2.6.msi)
- Install Twisted-10.1.0 (http://pypi.python.org/packages/2.6/T/Twisted/Twisted-10.1.0.winxp32-py2.6.msi#md5=bab4611abfdd63fc92424f4afa1e2865)
- Install zope.interface-3.6.1 either using official EGG distribution (http://pypi.python.org/packages/2.6/z/zope.interface/zope.interface-3.6.1-py2.6-win32.egg#md5=0d09224cea973b75feec65f84660742b) or non-official MSI at http://feisley.com/python/zope.interface-3.5.0.win32-py2.6.exe
- Install libxml2: http://users.skynet.be/sbi/libxml-python/binaries/libxml2-python-2.7.4.win32-py2.6.exe
- Install http://pypi.python.org/packages/any/S/Scrapy/Scrapy-0.9.win32.exe#md5=35c412031e87609a17560d3e50d7edde
Note that MSI are explicitly stated just for convenience.
All the setup takes about 5 minutes.
- Checkout and unzip source code from http://github.com/AmbientLighter/rpn-fas/zipball/master to (for example) the rpn-fas folder.
- Open console at rpn-fas and type C:\Python26\python.exe scrapy-ctl.py runspider fas\spiders\rnp.py
Parsing takes about 40 min, so please be patient.
------------------------------------------------
Author: Viktor Mireyev
License: Creative Commons Attribution 3.0 Unported License