This is a basic framework of Python scripts that generate ISO 19115 geospatial metadata records (ISO 19139 and ISO 19115-3 XML). It is used for describing geological models from the AuScope 3D Geological Models website https://geomodels.auscope.org.au, but could be adapted for other kinds of metadata
The framework is capable of generating metadata records from these sources:
- CKAN API
- PDF geoscience report files
- ISO 19115-3 XML (e.g. geonetwork)
- ISO 19139 XML (e.g. geonetwork)
- OAI-PMH (e.g. dSpace)
For ISO 19115-3 and ISO 19139 the framework does little more than customisation of the XML records. For the other sources XML is generated from scratch.
Assumes PDM https://github.com/pdm-project/pdm is installed
git clone --recurse-submodules https://github.com/AuScope/metarecogen
cd metarecogen
pdm install
NB: AuScope 'geomodelportal' repository is included in the git clone as a submodule. This allows the scripts to copy some model data (i.e. geospatial coordinates) for inclusion in the metadata record
Table of fields output for each source type
Field | CKAN | ISO 19115-3 | ISO 19139 | OAI-PMH | |
---|---|---|---|---|---|
Id | Y | Y | Y | Y | |
Title | Y | Y | Y | Y | Y |
Abstract | Y | Y | Y | Y | Y |
Organisation Name | Y | Y | Y | Y | Y |
Creation Date | Y | Y | Y | Y | Y |
Publication Date | Y | Y | Y | Y | |
Spatial Coordinates | Y | Y | Y | Y | Y |
Custom keywords | Y | Y | Y | Y | |
Fixed Keywords | Y | Y | Y | Y | Y |
License | Y | Y | Y | Y | Y |
Maintenance Freq | Y | Y | Y | Y | Y |
Lineage | Y | Y | Y | Y | Y |
NB: 'Fixed keywords' do not vary from record to record, 'Custom keywords' are tailored to each record
Table of output XML
Input source | Output ISO XML standard |
---|---|
ISO 19115-3 | |
CKAN | ISO 19115-3 |
ISO 19115-3 | ISO 19115-3 |
ISO 19139 | ISO 19139 |
OAI-PMH | ISO 19115-3 |
This project is written in Python and uses PDM https://github.com/pdm-project/pdm for its package management. PDM requires python version 3.7 or higher.
To generate metadata from PDF file, this project uses AWS Bedrock to run a Claude LLM and assumes that the correct AWS credentials have been set up in the user's environment.
cd src
eval $(pdm venv activate)
./process.py
XML files are written to 'output' directory (defined in constants.py)
The framework is configured via the config.py file. Its format is described in CONFIG.md
There are very basic tests in tests, run via using pytest
pytest