@@ -6,17 +6,43 @@ phoenix_pipeline
6
6
7
7
Turning news into events since 2014.
8
8
9
- This system links a series of Python programs to convert the files which have been
10
- downloaded by a [ web scraper] ( https://github.com/openeventdata/scraper ) to coded event data which is uploaded to a web site
11
- designated in the config file. The system processes a single day of information, but this
12
- can be derived from multiple text files. The pipeline also implements a filter for
13
- source URLs as defined by the keys in the ` source_keys.txt ` file. These keys
14
- correspond to the ` source ` field in the MongoDB instance.
9
+ This system links a series of Python programs to convert the files which have
10
+ been downloaded by a [ web scraper] ( https://github.com/openeventdata/scraper ) to
11
+ coded event data which is uploaded to a web site designated in the config file.
12
+ The system processes a single day of information, but this can be derived from
13
+ multiple text files. The pipeline also implements a filter for source URLs as
14
+ defined by the keys in the ` source_keys.txt ` file. These keys correspond to the
15
+ ` source ` field in the MongoDB instance.
15
16
16
17
For more information please visit the [ documentation] ( http://phoenix-pipeline.readthedocs.org/en/latest/ ) .
17
18
19
+ ## Requirements
20
+
21
+ The pipeline requires either
22
+ [ Petrarch] ( https://github.com/openeventdata/petrarch ) or
23
+ [ Petrarch2] ( https://github.com/openeventdata/petrarch2 ) to be installed. Both
24
+ are Python programs and can be installed from Github using pip.
25
+
26
+ The pipeline assumes that stories are stored in a MongoDB in a particular
27
+ format. This format is the one used by the OEDA news RSS scraper. See [ the
28
+ code] ( https://github.com/openeventdata/scraper/blob/master/mongo_connection.py )
29
+ for details on it structures stories in the Mongo. Using this pipeline with
30
+ differently formatted databases will require changing field names throughout
31
+ the code. The pipeline also requires that stories have been parsed with
32
+ Stanford CoreNLP. See the [ simple and
33
+ stable] ( https://github.com/openeventdata/stanford_pipeline ) way to do this, or
34
+ the [ experimental distributed] ( https://github.com/oudalab/biryani ) approach.
35
+
36
+ The pipeline requires one of two geocoding systems to be running: CLIFF-CLAVIN
37
+ or Mordecai. For CLIFF, see a VM version
38
+ [ here] ( https://github.com/ahalterman/CLIFF-up ) or a Docker container version
39
+ [ here] ( https://github.com/openeventdata/cliff_container ) . For Mordecai, see the
40
+ setup instructions [ here] ( https://github.com/openeventdata/mordecai ) . The
41
+ version of the pipeline deployed in production currently uses CLIFF/CLAVIN, but
42
+ future development will focus on improvements to Mordecai.
43
+
18
44
##Running
19
45
20
46
To run the program:
21
47
22
- python pipeline.py
48
+ ` python pipeline.py `
0 commit comments