Merge pull request #98 from openeventdata/mordecai

ahalterman · web-flow · commit af5c2b077978 · 2016-07-06T18:19:27.000-04:00
Add Mordecai docs and test (closes #93)
diff --git a/README.md b/README.md
@@ -6,17 +6,43 @@ phoenix_pipeline
 
 Turning news into events since 2014.
 
-This system links a series of Python programs to convert the files which have been 
-downloaded by a [web scraper](https://github.com/openeventdata/scraper) to coded event data which is uploaded to a web site
-designated in the config file. The system processes a single day of information, but this 
-can be derived from multiple text files. The pipeline also implements a filter for
-source URLs as defined by the keys in the `source_keys.txt` file. These keys
-correspond to the `source` field in the MongoDB instance.
+This system links a series of Python programs to convert the files which have
+been downloaded by a [web scraper](https://github.com/openeventdata/scraper) to
+coded event data which is uploaded to a web site designated in the config file.
+The system processes a single day of information, but this can be derived from
+multiple text files. The pipeline also implements a filter for source URLs as
+defined by the keys in the `source_keys.txt` file. These keys correspond to the
+`source` field in the MongoDB instance.
 
 For more information please visit the [documentation](http://phoenix-pipeline.readthedocs.org/en/latest/).
 
+## Requirements
+
+The pipeline requires either
+[Petrarch](https://github.com/openeventdata/petrarch) or
+[Petrarch2](https://github.com/openeventdata/petrarch2) to be installed. Both
+are Python programs and can be installed from Github using pip.
+
+The pipeline assumes that stories are stored in a MongoDB in a particular
+format. This format is the one used by the OEDA news RSS scraper. See [the
+code](https://github.com/openeventdata/scraper/blob/master/mongo_connection.py)
+for details on it structures stories in the Mongo. Using this pipeline with
+differently formatted databases will require changing field names throughout
+the code. The pipeline also requires that stories have been parsed with
+Stanford CoreNLP. See the [simple and
+stable](https://github.com/openeventdata/stanford_pipeline) way to do this, or
+the [experimental distributed](https://github.com/oudalab/biryani) approach.
+
+The pipeline requires one of two geocoding systems to be running: CLIFF-CLAVIN
+or Mordecai. For CLIFF, see a VM version
+[here](https://github.com/ahalterman/CLIFF-up) or a Docker container version
+[here](https://github.com/openeventdata/cliff_container). For Mordecai, see the
+setup instructions [here](https://github.com/openeventdata/mordecai). The
+version of the pipeline deployed in production currently uses CLIFF/CLAVIN, but
+future development will focus on improvements to Mordecai.
+
 ##Running
 
 To run the program:
 
-    python pipeline.py
+`python pipeline.py`
diff --git a/tests/test_geolocation.py b/tests/test_geolocation.py
@@ -0,0 +1,13 @@
+from bson.objectid import ObjectId
+import datetime
+import sys
+import os
+sys.path.append(os.path.dirname(os.path.realpath(__file__)) + "/../")
+import geolocation
+import utilities
+
+def test_geo_config():
+    server_details, geo_details, file_details, petrarch_version = utilities.parse_config('PHOX_config.ini')
+    geo_keys = geo_details._asdict().keys()
+    assert geo_keys == ['geo_service', 'cliff_host', 'cliff_port', 'mordecai_host', 'mordecai_port']
+