Skip to content
Esmé Cowles edited this page Jul 24, 2015 · 26 revisions

Portland Common Data Model

Introduction

The Portland Common Data Model (PCDM) is a flexible, extensible domain model that is intended to underlie a wide array of repository and DAMS applications. The primary objective of this model is to establish a framework that developers of tools [e.g., Hydra-based engines, such as Sufia, Curate, Worthwhile, Avalon; Islandora; custom Fedora sites) can use for working with models in a general way, allowing adopters to easily use custom models with any tool. Given this interoperability goal, the initial work has been focused on structural metadata and access control, since these are the key actionable metadata.

To encourage adoption, this model must support the most complex use cases, which include rich hierarchies of inter-related collections and works, but also elegantly support the simplest use cases, such as a single user-contributed file with a few fields of metadata. It must provide a compact interface that tool developers can easily implement, but also be extensible enough for adopters to customize to their local needs.

As the community migrates to Fedora 4, much of our metadata is migrating to RDF. This model encourages linked data best practices, such as using URIs to identify all resources, using widely-used vocabularies where possible, and subclassing existing classes and properties when creating new terms.

Source Ontology

Scope

Work on this model extends across multiple communities, but there is no expectation that everyone in these communities will want to use this model. Initial discussions were focused on interoperability within the Hydra community (including some who use non-Fedora backends), and then expanded to include people who use Islandora and other tools. This diversity, and the diversity of use cases discussed, means that we don't expect every adopter to implement this model in same way or with the same tools. We expect implementers to extend this model to fit their local needs, and hope that the model will help provide a framework for implementers to share RDF vocabularies and implementations.

Namespaces

Prefix URI
acl http://www.w3.org/ns/auth/acl#
dc http://purl.org/dc/elements/1.1/
dcterms http://purl.org/dc/terms/
pcdm http://pcdm.org/models#
foaf http://xmlns.com/foaf/0.1/
gen http://www.w3.org/2006/gen/ont#
iana http://www.iana.org/assignments/relation/ (see note)
ldp http://www.w3.org/ns/ldp#
ore http://www.openarchives.org/ore/terms/
rdfs http://www.w3.org/2000/01/rdf-schema#

Note on IANA link relations namespace

While the HTML page describing the IANA link relations is http://www.iana.org/assignments/link-relations/, the actual namespace URI is http://www.iana.org/assignments/relation/. The namespace URI and term URIs are not dereferenceable, and the documentation (RFC 5988) is very oblique (only referencing the full URI in the "self" example). So the confusion is understandable and the "/relation/" namespace URI is correct.

Domain Model

Domain model ORE ordering extension

Core Classes

pcdm:Object

Subclass of: ore:Aggregation

An Object is an intellectual entity, sometimes called a "work", "digital object", etc. Objects have descriptive metadata, access metadata, may contain files and other Objects as member "parts" or "components". Each level of a work is therefore represented by an Object instance, and is capable of standing on its own, being linked to from Collections and other Objects. Member Objects can be ordered using the ORE Proxy class (see Ordering extension below).

Property Range Usage Obligation
Has Member (pcdm:hasMember < ore:aggregates pcdm:Object Links to a related Object. Typically used to link to component parts, such as a book linking to a page. min 0, max unbounded
Has File (pcdm:hasFile < ldp:contains pcdm:File Links to a File contained by this Object. min 0, max unbounded Any resource may be contained by at most 1 other resource. Other entities linking to a file should generally link to the parent Object instead.
Has Related File (pcdm:hasRelatedFile < ldp:contains pcdm:File Links to a File which is related to this Object, but doesn’t directly describe or represent it, such as technical metadata about other files. min 0, max unbounded
Aggregates (ore:aggregates < dcterms:hasPart pcdm:Object Links to an Object that is related to this Object, but not a component part of it. Typically used for documentation, thumbnails, etc. min 0, max unbounded

pcdm:Collection

Subclass of: ore:Aggregation

A Collection is a group of resources. Collections have descriptive metadata, access metadata, and may links to Objects and/or Collections. By default, member Objects and Collections are an unordered set, but can be ordered using the ORE Proxy class (see Ordering extension below).

Property Range Usage Obligation
Aggregates (ore:aggregates < dcterms:hasPart) pcdm:Object Links to an Object that is related to the collection, but not a member of it. Typically used for documentation, thumbnails, etc. min 0, max unbounded
Has Member (pcdm:hasMember < ore:aggregates) pcdm:Collection ∪ pcdm:Object Links to an Object that is a member of this Collection, or a child Collection. min 0, max unbounded

pcdm:AdministrativeSet

Subclass of: ldp:Container

An Administrative Set is a grouping of resources that an administrative unit is ultimately responsible for managing. The set itself helps to manage the items within it. An Object or Collection may be contained by only one AdministrativeSet.

Property Range Usage Obligation
Contains (ldp:contains) Links to Collections and Objects contained by this AdministrativeSet. Contains is a transitive relationship. Any resource may be contained by at most 1 other resource.

pcdm:File

A File is a sequence of binary data and is described by some accompanying metadata. The metadata typically includes at least basic technical metadata (size, content type, modification date, etc.), but can also include properties related to preservation, digitization process, provenance, etc. Files MUST be contained by exactly one Object.

Property Range Usage Obligation
Size (dcterms:extent) dcterms:SizeOrDuration File size in bytes, typically system-supplied. 0 or 1
Content Type (dc:format) xsd:string MIME type 0 or 1
Checksum premis:hasMessageDigest?, nfo:hashValue?, fedora:digest? xsd:string or xsd:anyURI May have more than one checksum using different algorithms (differentiated with either URN syntax or separate properties for each algorithm).
Creation Date (dcterms:created) rdfs:Literal 0 or 1
Modification Date (dcterms:modified) rdfs:Literal The last modification date 0 or 1
Label (rdfs:label) xsd:string A human readable label or string that can be used as a simple surrogate for the resource. min 0, max unbounded

Multiple Relationships Between Entities

Both the relationship between Collection and Object and the relationship between Object and File have multiple predicates to express different kinds of relationships.

Membership vs. Aggregation:

  • pcdm:hasMember indicates that a resource is a constituent part of the parent resource, such as a page within a book, or a song within an album. This is the typical relationship between these entities.
  • ore:aggregates indicates a different kind of relationship, typically around documenting the parent entity. For example, the cover image within the book or album.

Contained Files vs. Related Files expresses a similar distinction:

  • pcdm:hasFile indicates that a File is a representation of the Object that contains it, such as a TIFF image representing a painting, or a MP3 file representing a song.
  • pcdm:hasRelatedFile indicates a different kind of relationship, typically documenting the parent entity or one of the other Files. For example, technical metadata about the TIFF image in XML format.

Ordering Extension

This optional class (and additional properties on Collection and Object) serve as an extension to the core classes to support ordering the members of a Object or Collection. Members do not have to have an ordering proxy node (i.e., some members may be ordered while others are unordered), and members may have more than one ordering proxy node, allowing them to appear in multiple positions in the list.

Ordering Extension

ore:Proxy

A Proxy indicates a Resource in the context of a Collection (see: http://www.openarchives.org/ore/1.0/datamodel#Proxy)

Property Range Usage Obligation
Proxy For (ore:proxyFor) rdf:Resource Links to the resource being ordered. min 1, max 1
Proxy In (ore:proxyIn) ore:Aggregation Links to the aggregation the resource is being ordered in. min 1, max 1
Next (iana:next) ore:Proxy Links to the resource after the current resource (omit for the last resource). min 0, max 1
Previous (iana:previous) ore:Proxy Links to the resource before the current resource (omit for the first resource). min 0, max 1

pcdm:Collection (extension)

To improve usability and performance of sorting, a Collection with ordered members may link to the first and last resources.

Property Range Usage Obligation
First (iana:first) ore:Proxy Links to the Proxy for the first Object in the collection. min 0, max 1
Last (iana:last) ore:Proxy Links to the Proxy for the last Object in the collection. min 0, max 1

pcdm:Object (extension)

To improve usability and performance of sorting, an Object with ordered member Objects may link to the first and last Objects.

Property Range Usage Obligation
First (iana:first) ore:Proxy Links to the Proxy for the first File in the Object. min 0, max 1
Last (iana:last) ore:Proxy Links to the Proxy for the last FileS in the Object. min 0, max 1

Notes

  • Proxies are not pcdm:contained by anything
  • Files cannot be ordered within an Object
  • Related objects cannot be ordered within an Object or Collection
  • Administrative Sets must not contain Administrative Sets (question)
  • It is possible to have a File associated with a Proxy, for example to model a collection specific thumbnail, however support for this is not required by the application profile. (question)

WebACL

WebACLs are used to specify what actions users can perform on resources. Each ACL is created as its own resource which links to the users, resources, and actions allowed. Users and resources can both be identified individually or using classes. The WebACL ontology includes several actions (read, write, append, control). Hydra access control has historically also had a discover permission, and adopters may create new actions for permissions they wish to assign separately (e.g., download).

Each Collection, Object and File instance can be assigned its own Web ACL. For example, an Object and its thumbnail image might be assigned a public ACL, but the high-resolution master image might be limited to a specific group of users.

acl:Authorization

Property Range Usage Obligation
Agent (acl:agent) foaf:Agent Individual user this ACL applies to. min 0, max unbounded
Agent Class (acl:agentClass) rdfs:Class Class of users this ACL applies to. min 0, max unbounded
Mode (acl:mode) rdfs:Class Actions permitted by this ACL (e.g., acl:Read, acl:Write, hydra:Discover, etc.). min 1, max unbounded
Resource (acl:accessTo) gen:InformationResource Individual resource this ACL applies to. min 0, max unbounded
Resource Class (acl:accessToClass) rdfs:Class Class of resources this ACL applies to. min 0, max unbounded

Appendix I. Usage Guidelines

Different adopters will adopt different conventions for how to use these classes. But these are some guidelines for how to structure complex objects.

Structure

  • Create a single Object instance and attach descriptive metadata for the work as a whole.
  • If there is only a single content file (plus derivatives), attach it directly to the Object.
  • If there are multiple content files, attach each content file and its derivatives to a separate component Object instance. This keeps derivatives from different content files clearly separated, so each content file can have its own thumbnail image, OCR text, etc.

Descriptive Metadata

  • For broadest interoperability, we suggest using commonly-used vocabularies like Dublin Core and FOAF for descriptive metadata.
  • In cases where these vocabularies don’t meet your needs, use them as much as possible and use other vocabularies (or create your own) to complement them.
  • Where possible, use URIs from established vocabularies when referring to names, subjects, places, and other entities that would typically have authority records in traditional library systems.

Technical Metadata

The Technical Metadata Application Profile defines properties for expressing technical metadata about files, etc. Please use that as a definitive reference and add any comments or make changes directly on that page.

General notes about Technical Metadata:

  • Attach technical metadata directly to File instances. This includes format information, runtime/codec/etc. details, digitization information, provenance, etc.
  • Descriptive metadata beyond a simple filename or label should be attached to the parent Object record instead.

Appendix II. Related Resources

The finer points of this model and how to implement it in Fedora 4/LDP are still under active discussion. Below are the current working documents:

Prior Work

This model came out of discussions at HydraConnect 2, which were fleshed out in Google Docs, Github issues, at the Hydra Developers - Making Progress Fall 2014 workshop and Code4Lib 2015. Here are some of the working documents: