This repository contains accompanying material to our article "Web-Scraping for Non-Programmers: Introducing OXPath for Digital Library Metadata Harvesting" published in the Code4Lib Journal.
You can read the article here.
To get a copy of the materials, use either git clone
, or download the repository's content as a compressed file (.zip) and extract it anywhere on your local file system.
Currently, OXPath is only supported on 64bit Linux-based operating systems. Java has to be installed on this system (version 8 or above). Optionally, the xvfb package has to be installed if you want to use the option to run Firefox in the X virtual framebuffer for the evaluation of OXPath expressions.
$ sudo apt install xvfb
OXPath can be obtained from: www.oxpath.org/download
Pick the latest version of the CLI jar and save it to a convenient location.
To run the example, fire up a command line where you extracted/cloned this repository to, and input the following command, where
-query bicc_WP.oxp
: reads the OXPath expression in the filebicc_WP.oxp
-logfile VerboseLog4j.properties
: uses the provided custom logging configuration file for more verbose logging (recommended for OXPath beginners)-browser firefox_xvfb
: runs the OXPath evaluation with a virtual frame buffer-output bicc_WP.xml
: re-directs the XML output to the filebicc_WP.xml
-xml
: sets the output format to XML
$ java -jar oxpath-2.0-cli.jar -query bicc_WP.oxp -logfile VerboseLog4j.properties -browser firefox_xvfb -output bicc_WP.xml -xml
(will be released soon, more info to follow)
- Mandy Neumann - Lead author of the article
This project is licensed under the Apache 2.0 License - see the LICENSE file for details.
- Jan Steinberg - Contributor of the OXPath expression script and second author
- Philipp Schaer - Third author and proof-reader