The present OAI harvester is an implementation of a harvester to collect iconographic material from the Online Catalogue of the Photographic Collection (Fototeca) at the Max-Planck Center for the History of Art and Architecture – Bibliotheca Hertziana (Biblhertz).
The repository contains the following files:
- requirements.txt All the required modules that have to be installed in order to run Biblhertz_OAI_harvester.ipynb. Please run
pip install -r requirements.txt
- Biblhertz_IMG_harvester.py A python script of an image harvester. The latter is based on the database collected with the Biblhertz_OAI_harvester.py. In order to run the script, the nature of the images to be downloaded has to be specified. The script takes the following parameters:
--type [Zeichnung Text Ort ...]
--artist [Caravaggio Bernini ...]
--title []
--date_begin [1560]
--date_end [1760]
--medium [Marmor Öl Papier ...]
--all [True | False]
- Biblhertz_OAI_harvester.py A python script version of an OAI harvester which queries the [https://oai.biblhertz.it/foto/oai-pmh] url, retrieves all the objects with identifiers '08######' and stores their corresponding information in a biblhertz.db database. To run on a terminal
python Biblhertz_OAI_harvester.py
- Biblhertz_OAI_harvester.ipynb A python notebook version of an OAI harvester which queries the [https://oai.biblhertz.it/foto/oai-pmh] url, retrieves all the objects with identifiers '08######' and stores their corresponding information in a biblhertz.db database
- Biblhertz_foto_retrieval.ipynb A first draft to collect digital images based on a local .xml file of the online database. Will be deleted soon
First run Biblhertz_OAI_harvester.py in your terminal. The database biblhertz.db will be created in your current folder. Then, if you want to collect images on top of the metadata, run the Biblhertz_OAI_harvester.py script. A folder called biblhertz_images will be created in your current directory.
python Biblhertz_OAI_harvester.py
python Biblhertz_IMG_harvester.py --type Zeichnung Malerei --date_begin 1560
The Fototeca provides documentation about its OAI system on its GitHub account 'hertzphoto':
- [Identifiers[(https://github.com/hertzphoto/RomaFototeca/blob/master/documentation/identifiers.md)
- Mapping notes
- OAI Repository
- Query parameters
The data in the Online catalogue is organized according to the Marburger Informations-, Dokumentations-, und Adminisstrations-System (MIDAS) (Bove, Heusigner and Kailus. 2001).
The Städel Museum provides best-practice documentation about their OAI interface
The Fondazione Federico Zeri – Università di Bologna provides a good example of query system on through as web app