Process data for the Global Plant Phenology Data Portal. This repository accumulates data from partners, processes it (in python) and then runs the ontology-data-pipeline to process inferences and aggregate data. The pipeline calls the Plant Phenology Ontology for term annotation.
cd test_data ./test.sh
Some of the projects contain scripts which help you fetch data from source API's. You will need to run these scripts using python:
# updating npn (NPN contains NEON data)
cd projects/npn
python data_fetcher.py ../../data/npn/input
Following is an example using NPN data:
# process NPN data
# This assumes that you have data processed and living in ../../data/npn/processed
cd projects/npn
# first make sure the 'processed' directory exists under /data/npn/. If not, then create it:
mkdir ../../data/npn/processed
# pre-process NPN data, specifying the chunk-size to use in pre-processor
# stores output data/npn/output/data.csv
python data_preprocessor.py 150000
cd ../..
./run.sh npn
# Loading Data
If you choose to load data into an elasticsearch instance, you can run the loader script
cd data tar zcvf - data/npn/processed/output_reasoned_csv/* | ssh -l exouser 149.165.159.216 "cd /home/jdeck/data/ppo; tar xvzf -"
./loadit.sh
Some information about the ppo-data-pipeline is mentioned in:
[Brian J. Stucky, Rob Guralnick, John Deck, Ellen G. Denny, Kjell Bolmgren, and Ramona Walls. *The Plant Phenology Ontology: A New Informatics Resource for Large-Scale Integration of Plant Phenology Data*, Front Plant Sci. 2018; 9: 517, doi: 10.3389/fpls.2018.00517](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5938398/)