Home

Mappings for batch conversion from CONTENTdm

Update appropriate mapping.yml file
Define namespace prefixes for compact URLs

namespaces:
  rdf: "http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  rdfs: "http://www.w3.org/2000/01/rdf-schema#"
  owl: "http://www.w3.org/2002/07/owl#"
  skos: "http://www.w3.org/2004/02/skos/core#"
  dct: "http://purl.org/dc/terms/"
  dce: "http://purl.org/dc/elements/1.1/"
  bibo: "http://purl.org/ontology/bibo/"
  foaf: "http://xmlns.com/foaf/0.1/"
  geo: "http://www.w3.org/2003/01/geo/wgs84_pos#"
  frbr: "http://iflastandards.info/ns/fr/frbr/frbrer/"
  mads: "http://www.loc.gov/mads/rdf/v1#"
  marcrel: "http://id.loc.gov/vocabulary/relators/"
  modsrdf: "http://www.loc.gov/standards/mods/modsrdf/v1/"
  premis: "http://www.loc.gov/premis/rdf/v1#"
  oregon: "http://opaquenamespace.org/ns/"
  vra: "http://www.loc.gov/standards/vracore/vocab/"
  aat: "http://vocab.getty.edu/resource/aat/"
  dwc: "http://rs.tdwg.org/dwc/terms/"
  exif: "http://www.w3.org/2003/12/exif/ns"
  holding: "http://purl.org/ontology/holding#"
  schema: "http://schema.org/"
  oad: "http://lod.xdams.org/reload/oad/"
  swpo: "http://sw-portal.deri.org/ontologies/swportal#"
  rdam: "http://rdaregistry.info/Elements/m/"
  rdaw: "http://rdaregistry.info/Elements/w/"
  rdae: "http://rdaregistry.info/Elements/e/"
  rdai: "http://rdaregistry.info/Elements/i/"
  rdaa: "http://rdaregistry.info/Elements/a/"
  archives: "http://data.archiveshub.ac.uk/def/"

Mapping

Start mappings mappings:
Identify set to map with CISONICK for collection alias, then indent terms below the collection alias.

 sheetmusic:
    title: dct:title
    creato: 
      method: lcnaf #dct:creator
    captio: oregon:captionTitle 
    other: dct:alternative
    composer: 
      method: lcnaf #marcrel:cmp
    catalo: SKIP

Make sure to use CISONICK for each element from desc.all file
Use Dublin Core as a base element set
Use additional Linked Open Data (LOD) predicates
Use additional established opaquenamespace/Oregon Digital terms (make sure terms are added to Opaque Namespace
Don’t use Term - SKIP

Methods

Use a method for cleaning up known data errors or mapping strings to URIs
Define all methods in a comment (so that programmer knows intent of method)

# METHODS:
#
# *xsd_date - converts date to yyyy-mm-dd format and serializes it as a literal of type xsd:date 
# *folkrights - separates rights statement from copyright holder for OAC rights
# *geographic - attempts to identify a geonames URI locations in field. Will strip "(Ore.)" from string before searching.

For each field in your mapping, add property in a comment next to the method for transformations

    composer: 
      method: lcnaf #marcrel:cmp

Finalizing

Use Oregon Digital Git best practices and make changes / additions on a branch, commit with helpful commit message, then push for merge
Validate syntax before commit

Running the script

Source image files can be stored in a location other than the metadata/COLLECTION folder, and the new path can be referenced with the command line parameter --image-file-path
Source image files can be mapped to a different file name using a CSV file specified in the command line parameter --image-file. The CSV file must have the columns in the format of old_file,new_file and have no heading. The file is read in and a hash of old->new can then be used in the cleanup task to convert from the old filename to the new one. See the herbarium_cleanup method in cleanup.rb for an example.

Using cdm2bag with complex objects

To modify cdm2bag to ingest a collection with complex objects edit the following files:

mapping.yml

mark the field containing .cpd files with the temporary term oregon:full e.g. find: oregon:full # Cleanup will look for .cpd files

cleanup.rb

add code to retrieve and parse the compound file, if present

    def COLLECTION_cleanup(collection, graph, subject)
      full_stmt = graph.query([subject, @namespaces['oregon']['full'], nil])
      full_file = full_stmt.first.object.to_s.downcase
      graph.delete(full_stmt) # This filename isn't saved so we don't need this triple anymore.
      if full_file.end_with? '.cpd'
        # Load the compound object data into the graph.
        graph = load_compound_objects(collection, graph, subject)
      else
        # Do something here if necessary.
      end
      graph
    end

When the cdm2bag script is run, it:

Iterates over the records in the desc.all file and pulls out any complex items (identified by a *.cpd filename in the find field)
Processes and bags the non-complex items in folders with names starting with 00001 and up
Processes the complex items and bags them into folders whose names start after the last non-complex item
Moves any non-complex bags that are missing files to a 'missing' folder

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

Mappings for batch conversion from CONTENTdm

Mapping

Methods

Finalizing

Running the script

Using cdm2bag with complex objects

Clone this wiki locally