-
Notifications
You must be signed in to change notification settings - Fork 11
Home
The Portland Common Data Model (PCDM) is a flexible, extensible domain model that is intended to underlie a wide array of repository and DAMS applications. The primary objective of this model is to establish a framework that developers of tools [e.g., Hydra-based engines, such as Sufia, Curate, Worthwhile, Avalon; Islandora; custom Fedora sites) can use for working with models in a general way, allowing adopters to easily use custom models with any tool. Given this interoperability goal, the initial work has been focused on structural metadata and access control, since these are the key actionable metadata.
To encourage adoption, this model must support the most complex use cases, which include rich hierarchies of inter-related collections and works, but also elegantly support the simplest use cases, such as a single user-contributed file with a few fields of metadata. It must provide a compact interface that tool developers can easily implement, but also be extensible enough for adopters to customize to their local needs.
As the community migrates to Fedora 4, much of our metadata is migrating to RDF. This model encourages linked data best practices, such as using URIs to identify all resources, using widely-used vocabularies where possible, and subclassing existing classes and properties when creating new terms.
Work on this model extends across multiple communities, but there is no expectation that everyone in these communities will want to use this model. Initial discussions were focused on interoperability within the Hydra community (including some who use non-Fedora backends), and then expanded to include people who use Islandora and other tools. This diversity, and the diversity of use cases discussed, means that we don't expect every adopter to implement this model in same way or with the same tools. We expect implementers to extend this model to fit their local needs, and hope that the model will help provide a framework for implementers to share RDF vocabularies and implementations.
Prefix | URI |
---|---|
acl | http://www.w3.org/ns/auth/acl# |
dc | http://purl.org/dc/elements/1.1/ |
dcterms | http://purl.org/dc/terms/ |
pcdm | http://pcdm.org/models# |
foaf | http://xmlns.com/foaf/0.1/ |
gen | http://www.w3.org/2006/gen/ont# |
iana | http://www.iana.org/assignments/relation/ (see note) |
ldp | http://www.w3.org/ns/ldp# |
ore | http://www.openarchives.org/ore/terms/ |
rdfs | http://www.w3.org/2000/01/rdf-schema# |
Note on IANA link relations namespace
While the HTML page describing the IANA link relations is http://www.iana.org/assignments/link-relations/, the actual namespace URI is http://www.iana.org/assignments/relation/. The namespace URI and term URIs are not dereferenceable, and the documentation (RFC 5988) is very oblique (only referencing the full URI in the "self" example). So the confusion is understandable and the "/relation/" namespace URI is correct.
An Object is an intellectual entity, sometimes called a "work", "digital object", etc. Objects have descriptive metadata, access metadata, may contain files and other Objects as member "parts" or "components". Each level of a work is therefore represented by an Object instance, and is capable of standing on its own, being linked to from Collections and other Objects. Member Objects can be ordered using the ORE Proxy class (see Ordering extension below).
Property | Range | Usage | Obligation |
---|---|---|---|
Has Member (pcdm:hasMember < ore:aggregates | pcdm:Object | Links to a related Object. Typically used to link to component parts, such as a book linking to a page. | min 0, max unbounded |
Has File (pcdm:hasFile < ore:aggregates | pcdm:File | Links to a File contained by this Object. | min 0, max unbounded Any resource may be contained by at most 1 other resource. Other entities linking to a file should generally link to the parent Object instead. |
Has Related File (pcdm:hasRelatedFile < ore:aggregates | pcdm:File | Links to a File which is related to this Object, but doesn’t directly describe or represent it, such as technical metadata about other files. | min 0, max unbounded |
Aggregates (ore:aggregates < dcterms:hasPart | pcdm:Object | Links to an Object that is related to this Object, but not a component part of it. Typically used for documentation, thumbnails, etc. | min 0, max unbounded |
A Collection is a group of resources. Collections have descriptive metadata, access metadata, and may links to Objects and/or Collections. By default, member Objects and Collections are an unordered set, but can be ordered using the ORE Proxy class (see Ordering extension below).
Property | Range | Usage | Obligation |
---|---|---|---|
Aggregates (ore:aggregates < dcterms:hasPart) | pcdm:Object | Links to an Object that is related to the collection, but not a member of it. Typically used for documentation, thumbnails, etc. | min 0, max unbounded |
Has Member (pcdm:hasMember < ore:aggregates) | pcdm:Collection ∪ pcdm:Object | Links to an Object that is a member of this Collection, or a child Collection. | min 0, max unbounded |
An Administrative Set is a grouping of resources that an administrative unit is ultimately responsible for managing. The set itself helps to manage the items within it. An Object or Collection may be contained by only one AdministrativeSet.
Property | Range | Usage | Obligation |
---|---|---|---|
Contains (ldp:contains) | Links to Collections and Objects contained by this AdministrativeSet. Contains is a transitive relationship. | Any resource may be contained by at most 1 other resource. |
A File is a sequence of binary data and is described by some accompanying metadata. The metadata typically includes at least basic technical metadata (size, content type, modification date, etc.), but can also include properties related to preservation, digitization process, provenance, etc. Files MUST be contained by exactly one Object.
Property | Range | Usage | Obligation |
---|---|---|---|
Size (dcterms:extent) | dcterms:SizeOrDuration | File size in bytes, typically system-supplied. | 0 or 1 |
Content Type (dc:format) | xsd:string | MIME type | 0 or 1 |
Checksum | premis:hasMessageDigest?, nfo:hashValue?, fedora:digest? | xsd:string or xsd:anyURI | May have more than one checksum using different algorithms (differentiated with either URN syntax or separate properties for each algorithm). |
Creation Date (dcterms:created) | rdfs:Literal | 0 or 1 | |
Modification Date (dcterms:modified) | rdfs:Literal | The last modification date | 0 or 1 |
Label (rdfs:label) | xsd:string | A human readable label or string that can be used as a simple surrogate for the resource. | min 0, max unbounded |
When binary data is served by another application, it may be appropriate to create a File object with no content to model the external content, hold related technical metadata, etc. Fedora 4's external content feature is one way to implement this link to the external content.
Both the relationship between Collection and Object and the relationship between Object and File have multiple predicates to express different kinds of relationships.
Membership vs. Aggregation:
-
pcdm:hasMember
indicates that a resource is a constituent part of the parent resource, such as a page within a book, or a song within an album. This is the typical relationship between these entities. -
ore:aggregates
indicates a different kind of relationship, typically around documenting the parent entity. For example, the cover image within the book or album.
Contained Files vs. Related Files expresses a similar distinction:
-
pcdm:hasFile
indicates that a File is a representation of the Object that contains it, such as a TIFF image representing a painting, or a MP3 file representing a song. -
pcdm:hasRelatedFile
indicates a different kind of relationship, typically documenting the parent entity or one of the other Files. For example, technical metadata about the TIFF image in XML format.
This optional class (and additional properties on Collection and Object) serve as an extension to the core classes to support ordering the members of a Object or Collection. Members do not have to have an ordering proxy node (i.e., some members may be ordered while others are unordered), and members may have more than one ordering proxy node, allowing them to appear in multiple positions in the list.
A Proxy indicates a Resource in the context of a Collection (see: http://www.openarchives.org/ore/1.0/datamodel#Proxy)
Property | Range | Usage | Obligation |
---|---|---|---|
Proxy For (ore:proxyFor) | rdf:Resource | Links to the resource being ordered. | min 1, max 1 |
Proxy In (ore:proxyIn) | ore:Aggregation | Links to the aggregation the resource is being ordered in. | min 1, max 1 |
Next (iana:next) | ore:Proxy | Links to the resource after the current resource (omit for the last resource). | min 0, max 1 |
Previous (iana:previous) | ore:Proxy | Links to the resource before the current resource (omit for the first resource). | min 0, max 1 |
To improve usability and performance of sorting, a Collection with ordered members may link to the first and last resources.
Property | Range | Usage | Obligation |
---|---|---|---|
First (iana:first) | ore:Proxy | Links to the Proxy for the first Object in the collection. | min 0, max 1 |
Last (iana:last) | ore:Proxy | Links to the Proxy for the last Object in the collection. | min 0, max 1 |
To improve usability and performance of sorting, an Object with ordered member Objects may link to the first and last Objects.
Property | Range | Usage | Obligation |
---|---|---|---|
First (iana:first) | ore:Proxy | Links to the Proxy for the first File in the Object. | min 0, max 1 |
Last (iana:last) | ore:Proxy | Links to the Proxy for the last FileS in the Object. | min 0, max 1 |
- Proxies are not pcdm:contained by anything
- Files cannot be ordered within an Object
- Related objects cannot be ordered within an Object or Collection
- Administrative Sets must not contain Administrative Sets (question)
- It is possible to have a File associated with a Proxy, for example to model a collection specific thumbnail, however support for this is not required by the application profile. (question)
WebACLs are used to specify what actions users can perform on resources. Each ACL is created as its own resource which links to the users, resources, and actions allowed. Users and resources can both be identified individually or using classes. The WebACL ontology includes several actions (read, write, append, control). Hydra access control has historically also had a discover permission, and adopters may create new actions for permissions they wish to assign separately (e.g., download).
Each Collection, Object and File instance can be assigned its own Web ACL. For example, an Object and its thumbnail image might be assigned a public ACL, but the high-resolution master image might be limited to a specific group of users.
Property | Range | Usage | Obligation |
---|---|---|---|
Agent (acl:agent) | foaf:Agent | Individual user this ACL applies to. | min 0, max unbounded |
Agent Class (acl:agentClass) | rdfs:Class | Class of users this ACL applies to. | min 0, max unbounded |
Mode (acl:mode) | rdfs:Class | Actions permitted by this ACL (e.g., acl:Read, acl:Write, hydra:Discover, etc.). | min 1, max unbounded |
Resource (acl:accessTo) | gen:InformationResource | Individual resource this ACL applies to. | min 0, max unbounded |
Resource Class (acl:accessToClass) | rdfs:Class | Class of resources this ACL applies to. | min 0, max unbounded |
Different adopters will adopt different conventions for how to use these classes. But these are some guidelines for how to structure complex objects.
- Create a single Object instance and attach descriptive metadata for the work as a whole.
- If there is only a single content file (plus derivatives), attach it directly to the Object.
- If there are multiple content files, attach each content file and its derivatives to a separate component Object instance. This keeps derivatives from different content files clearly separated, so each content file can have its own thumbnail image, OCR text, etc.
- For broadest interoperability, we suggest using commonly-used vocabularies like Dublin Core and FOAF for descriptive metadata.
- In cases where these vocabularies don’t meet your needs, use them as much as possible and use other vocabularies (or create your own) to complement them.
- Where possible, use URIs from established vocabularies when referring to names, subjects, places, and other entities that would typically have authority records in traditional library systems.
The Technical Metadata Application Profile defines properties for expressing technical metadata about files, etc. Please use that as a definitive reference and add any comments or make changes directly on that page.
General notes about Technical Metadata:
- Attach technical metadata directly to File instances. This includes format information, runtime/codec/etc. details, digitization information, provenance, etc.
- Descriptive metadata beyond a simple filename or label should be attached to the parent Object record instead.
The finer points of this model and how to implement it in Fedora 4/LDP are still under active discussion. Below are the current working documents:
This model came out of discussions at HydraConnect 2, which were fleshed out in Google Docs, Github issues, at the Hydra Developers - Making Progress Fall 2014 workshop and Code4Lib 2015. Here are some of the working documents:
- PCDM Google Group
- PCDM Wiki Homepage
- IRC: #pcdm on irc.freenode.net
- Published ontologies
- PCDM Committers
- PCDM Committers Process
- PCDM Contributors
- PCDM Community Meetings
- Community Resources