Metapype is a Python 3 library for building, saving, and exporting scientific metadata using a flexible metadata content model in a hierarchical tree-like structure. Metapype only provides the primitives (as a Python application programmable interface) for operating on metadata; it is expected that client applications will use the Metapype API to build more robust and user-friendly applications.
This version of Metapype is designed to reflect the content structure of the Ecological Metadata Language (EML) XML schema (version 2.2.0); it is not, however, locked into a specific version of EML. As such, Metapype can support multiple versions of metadata standards.
Metapype is divided into a metadata content model and a set of validation rules that enforce model compliance to a specific metadata standard, including its version. Compliance only implies that the metadata model conforms to the syntactical requirements of the metadata standard, but not necessarily to semantic requirements, like adherence to a particular controlled vocabulary for keywords or requiring additional descriptive content within the model when the standard does not require it. As an extension to the validation component of Metapype, there is also an evaluation process that ensures the model conforms to practices and requirements beyond the metadata standard.
The metadata content model is designed as an ordered tree. Node children follow a rule-based order and can be reached via directed edges from their parent node. Each child node can have only one parent, which is represented by a single reverse edge from the child to its parent; only the root node may be without a "true" parent.
Metapype uses a Node
class for modeling elements of the EML XML schema. Node
instances must be constructed with a name
parameter, which corresponds to
the element name as declared in the schema*. Node edges for either parent or
children are constructed of the node's respective address in memory. Node
instances capture the primary characteristics of the corresponding EML element
being modeled, including XML attributes, content, and sub-elements of complex
types. Attributes of a node instance may be accessed through setters and
getters (see the Node API).
A compliant EML 2.2.0 metadata content model tree is found in the following diagram (this diagram represents an instance of a Metapype tree model):
Metadata validation is critical to ensure that the working model complies with all requirements and constraints declared within the metadata standard, and in this case, the EML 2.2.0 XML schema. A valid metadata content model implies that every node of the tree complies with the syntax of the metadata standard being modeled. Metapype can process the model instance through a series of validation rules to ensure compliance with the EML XML schema. A critical exception is raised if any node within the model fails validation during processing.
A metadata validation rule is a codified set of constraints declared
in a JSON format. A rule is divided into three distinct sections that are used
to verify: 1) attribute information, 2) sub-element (or children) definitions,
and 3) content information. Rule functions implicitly
return None
unless an exception occurs during the evaluation process;
exceptions are of the class exceptions.MetapypeRuleError
.
The following is an example of the "access" rule:
"accessRule" : [
{
"id": [false],
"system": [false],
"scope": [false, "document", "system"],
"order": [false, "allowFirst", "denyFirst"],
"authSystem": [true]
},
[
[["allow", 1, 1], ["deny", 1, 1], 1, null]
],
{
"content_rules" : ["emptyContent"]
}
],
This access_rule
example demonstrates the use of all three rule sections: 1)
The element specific constraints validate the value of a specific node attribute.
In this case, the order
attribute, if defined, must have a value of either
allowFirst
or denyFirst
only. 2) The XML attributes rule defines the
set of acceptable attributes to be id
, system
, order
, and authSystem
;
all defined attributes are optional except the authSystem
,
which is required. And finally, 3) the sub-element children section defines a
single node choice of either an allow
or deny
child, along with the
cardinality of 1 to infinity.
Similar to the process used for validation of a model instance to the EML XML
schema, Metapype also supports a process to evaluate a model instance
against rules that are more semantic in nature. Evaluation rules differ from
validation rules in that they do not result in an exception, but rather
evaluate one or more nodes to somewhat "soft" requirements (e.g., the number
of active words in a title or that an individaulName
node has both surName
AND givenName
children, even though the EML XML schema does not require
both) and only return information about the evaluation outcome for a node.
And because evaluation rules do not have to test every node in the model
instance, there can be fewer rules that need to be executed.
Evaluation rules are written as Python conditional statements (e.g., if
,
else
, or elif
), but do not follow any typical pattern because of the wide
difference in the constraints declared for various nodes. Evaluation rules
simply return a Unicode string value with an explanation of the evaluation
result, which could be the string PASS
if the evaluation is acceptable or a
recommendation for improving the node content. The user accessible function
evaluate.tree
requires an empty Python dictionary to be passed as a
parameter. This dictionary is recursively populated with evaluation results
from the evaluate.node
function and assigned to the node_id
key.
As an example, the following evaluation rule inspects the title node:
def _title_rule(node: Node) -> str:
evaluation = PASS
title = node.content
if title is not None:
length = len(title.split(' '))
if length < 10:
_ = ('"{0}" is too short, should '
'have at least 10 words')
evaluation = _.format(title)
return evaluation
The resulting dictionary from running the evaluate.tree
function returns
the following:
{
140101084713592: '(title) "Green sea turtle counts: Tortuga Island 20017" is too short, should have at least 10 words',
140101084713704: '(individualName) Should have both a "givenName" and "surName"',
140101084713872: '(individualName) PASS'
}
The Metapype Python API can be used to generate metadata that is compliant
with the Ecological Metadata Language standard. Using Metapype for this
purpose is typically separated into two steps: first, build a Metapype model
instance beginning with an "eml" node as the root node, followed by
validating the model instance to ensure it complies with the EML standard.
Validation can be performed on a single node using the validate.node
function or on an entire model tree beginning with a specified root node
using the validate.tree
function (to validate for EML compliance, the model
root node as defined by the "eml" name should be passed).
The following code example demonstrates these two steps:
import logging
from metapype.eml.exceptions import MetapypeRuleError
import metapype.eml.names as names
import metapype.eml.validate as validate
from metapype.model.node import Node
def main():
eml = Node(names.EML)
eml.add_attribute('packageId', 'edi.23.1')
eml.add_attribute('system', 'metapype')
access = Node(names.ACCESS, parent=eml)
access.add_attribute('authSystem', 'pasta')
access.add_attribute('order', 'allowFirst')
eml.add_child(access)
allow = Node(names.ALLOW, parent=access)
access.add_child(allow)
principal = Node(names.PRINCIPAL, parent=allow)
principal.content = 'uid=gaucho,o=EDI,dc=edirepository,dc=org'
allow.add_child(principal)
permission = Node(names.PERMISSION, parent=allow)
permission.content = 'all'
allow.add_child(permission)
dataset = Node(names.DATASET, parent=eml)
eml.add_child(dataset)
title = Node(names.TITLE, parent=dataset)
title.content = 'Green sea turtle counts: Tortuga Island 20017'
dataset.add_child(title)
creator = Node(names.CREATOR, parent=dataset)
dataset.add_child(creator)
individualName_creator = Node(names.INDIVIDUALNAME, parent=creator)
creator.add_child(individualName_creator)
surName_creator = Node(names.SURNAME, parent=individualName_creator)
surName_creator.content = 'Gaucho'
individualName_creator.add_child(surName_creator)
contact = Node(names.CONTACT, parent=dataset)
dataset.add_child(contact)
individualName_contact = Node(names.INDIVIDUALNAME, parent=contact)
contact.add_child(individualName_contact)
surName_contact = Node(names.SURNAME, parent=individualName_contact)
surName_contact.content = 'Gaucho'
individualName_contact.add_child(surName_contact)
try:
validate.tree(eml)
except MetapypeRuleError as e:
logging.error(e)
return 0
if __name__ == "__main__":
main()
In the above example, the "eml" node is created and used as the anchor node for the root of the model tree. Sub-element descendant nodes are created and then added to higher-level nodes as children. Although this example is quite small, it does produce metadata that conforms to EML 2.1.1 standard.
Metapype can convert a model instance tree to either EML-specific XML or to
JSON Unicode strings by using either the export.to_xml
or io.to_json
functions, respectively. The string value can then be saved directly to the file
system or passed to other functions for additional processing. Metapype can also
convert a Metapype valid JSON Unicode string back into a model instance tree by
using the io.from_json
function. This is important for saving incomplete or
sub-tree instances to the file system (e.g., for reusable model components),
which can be reloaded at a later time. The following code example demonstrates
using both the XML and JSON functions:
import json
import metapype.eml.export
import metapype.eml.names as names
import metapype.model.metapype_io
from metapype.model.node import Node
# Write EML XML
eml = Node(names.EML)
eml.add_attribute('packageId', 'edi.23.1')
eml.add_attribute('system', 'metapype')
xml_str = metapype.eml.export.to_xml(eml)
with open('test_eml.xml', 'w') as f:
f.write(xml_str)
# Write JSON
json_str = metapype.model.metapype_io.to_json(eml)
with open('test_eml.json', 'w') as f:
f.write(json_str)
# Read JSON
with open('test_eml.json', 'r') as f:
json_str = f.read()
eml = metapype.model.metapype_io.from_json(json.loads(json_str))
*Actually, the metadata content model may contain any hierarchical content that can be entered into the existing Node data structure, but will not validate as an EML metadata content model either at the node or the tree level.