Skip to content

Commit af5c2b0

Browse files
authored
Merge pull request #98 from openeventdata/mordecai
Add Mordecai docs and test (closes #93)
2 parents 6306288 + 4195185 commit af5c2b0

File tree

2 files changed

+46
-7
lines changed

2 files changed

+46
-7
lines changed

README.md

+33-7
Original file line numberDiff line numberDiff line change
@@ -6,17 +6,43 @@ phoenix_pipeline
66

77
Turning news into events since 2014.
88

9-
This system links a series of Python programs to convert the files which have been
10-
downloaded by a [web scraper](https://github.com/openeventdata/scraper) to coded event data which is uploaded to a web site
11-
designated in the config file. The system processes a single day of information, but this
12-
can be derived from multiple text files. The pipeline also implements a filter for
13-
source URLs as defined by the keys in the `source_keys.txt` file. These keys
14-
correspond to the `source` field in the MongoDB instance.
9+
This system links a series of Python programs to convert the files which have
10+
been downloaded by a [web scraper](https://github.com/openeventdata/scraper) to
11+
coded event data which is uploaded to a web site designated in the config file.
12+
The system processes a single day of information, but this can be derived from
13+
multiple text files. The pipeline also implements a filter for source URLs as
14+
defined by the keys in the `source_keys.txt` file. These keys correspond to the
15+
`source` field in the MongoDB instance.
1516

1617
For more information please visit the [documentation](http://phoenix-pipeline.readthedocs.org/en/latest/).
1718

19+
## Requirements
20+
21+
The pipeline requires either
22+
[Petrarch](https://github.com/openeventdata/petrarch) or
23+
[Petrarch2](https://github.com/openeventdata/petrarch2) to be installed. Both
24+
are Python programs and can be installed from Github using pip.
25+
26+
The pipeline assumes that stories are stored in a MongoDB in a particular
27+
format. This format is the one used by the OEDA news RSS scraper. See [the
28+
code](https://github.com/openeventdata/scraper/blob/master/mongo_connection.py)
29+
for details on it structures stories in the Mongo. Using this pipeline with
30+
differently formatted databases will require changing field names throughout
31+
the code. The pipeline also requires that stories have been parsed with
32+
Stanford CoreNLP. See the [simple and
33+
stable](https://github.com/openeventdata/stanford_pipeline) way to do this, or
34+
the [experimental distributed](https://github.com/oudalab/biryani) approach.
35+
36+
The pipeline requires one of two geocoding systems to be running: CLIFF-CLAVIN
37+
or Mordecai. For CLIFF, see a VM version
38+
[here](https://github.com/ahalterman/CLIFF-up) or a Docker container version
39+
[here](https://github.com/openeventdata/cliff_container). For Mordecai, see the
40+
setup instructions [here](https://github.com/openeventdata/mordecai). The
41+
version of the pipeline deployed in production currently uses CLIFF/CLAVIN, but
42+
future development will focus on improvements to Mordecai.
43+
1844
##Running
1945

2046
To run the program:
2147

22-
python pipeline.py
48+
`python pipeline.py`

tests/test_geolocation.py

+13
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
from bson.objectid import ObjectId
2+
import datetime
3+
import sys
4+
import os
5+
sys.path.append(os.path.dirname(os.path.realpath(__file__)) + "/../")
6+
import geolocation
7+
import utilities
8+
9+
def test_geo_config():
10+
server_details, geo_details, file_details, petrarch_version = utilities.parse_config('PHOX_config.ini')
11+
geo_keys = geo_details._asdict().keys()
12+
assert geo_keys == ['geo_service', 'cliff_host', 'cliff_port', 'mordecai_host', 'mordecai_port']
13+

0 commit comments

Comments
 (0)