Skip to content

tpc_main_pipeline

Valerio Arnaboldi edited this page May 9, 2018 · 2 revisions

Textpressocentral main pipeline is responsible for incrementally downloading papers in raw format (pdf or xml) from various sources, converting them to cas files, and then indexing them. The script run_tpc_pipeline_incremental.sh under the project tpctools launches an incremental round of the pipeline. To regularly update the system, a cron job can be set up to execute the script.

Examples

Execute the main pipeline weekly (at 8:05am on Sunday) and redirect messages to the standard logger

Add the following cron job to the root crontab:

5 8 * * 0 /usr/local/bin/run_tpc_pipeline_incremental.sh 2>&1 | logger &