Skip to content
Mike Eaton edited this page Apr 1, 2015 · 6 revisions

Mappings for batch conversion from CONTENTdm

CDM2bag

  • Update appropriate mapping.yml file
  • Define namespace prefixes for compact URLs
namespaces:
  rdf: "http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  rdfs: "http://www.w3.org/2000/01/rdf-schema#"
  owl: "http://www.w3.org/2002/07/owl#"
  skos: "http://www.w3.org/2004/02/skos/core#"
  dct: "http://purl.org/dc/terms/"
  dce: "http://purl.org/dc/elements/1.1/"
  bibo: "http://purl.org/ontology/bibo/"
  foaf: "http://xmlns.com/foaf/0.1/"
  geo: "http://www.w3.org/2003/01/geo/wgs84_pos#"
  frbr: "http://iflastandards.info/ns/fr/frbr/frbrer/"
  mads: "http://www.loc.gov/mads/rdf/v1#"
  marcrel: "http://id.loc.gov/vocabulary/relators/"
  modsrdf: "http://www.loc.gov/standards/mods/modsrdf/v1/"
  premis: "http://www.loc.gov/premis/rdf/v1#"
  oregon: "http://opaquenamespace.org/ns/"
  vra: "http://www.loc.gov/standards/vracore/vocab/"
  aat: "http://vocab.getty.edu/resource/aat/"
  dwc: "http://rs.tdwg.org/dwc/terms/"
  exif: "http://www.w3.org/2003/12/exif/ns"
  holding: "http://purl.org/ontology/holding#"
  schema: "http://schema.org/"
  oad: "http://lod.xdams.org/reload/oad/"
  swpo: "http://sw-portal.deri.org/ontologies/swportal#"
  rdam: "http://rdaregistry.info/Elements/m/"
  rdaw: "http://rdaregistry.info/Elements/w/"
  rdae: "http://rdaregistry.info/Elements/e/"
  rdai: "http://rdaregistry.info/Elements/i/"
  rdaa: "http://rdaregistry.info/Elements/a/"
  archives: "http://data.archiveshub.ac.uk/def/"

Mapping

  • Start mappings mappings:
  • Identify set to map with CISONICK for collection alias, then indent terms below the collection alias.
 sheetmusic:
    title: dct:title
    creato: 
      method: lcnaf #dct:creator
    captio: oregon:captionTitle 
    other: dct:alternative
    composer: 
      method: lcnaf #marcrel:cmp
    catalo: SKIP 
  • Make sure to use CISONICK for each element from desc.all file
  • Use Dublin Core as a base element set
  • Use additional Linked Open Data (LOD) predicates
  • Use additional established opaquenamespace/Oregon Digital terms (make sure terms are added to Opaque Namespace
  • Don’t use Term - SKIP

Methods

  • Use a method for cleaning up known data errors or mapping strings to URIs
  • Define all methods in a comment (so that programmer knows intent of method)
# METHODS:
#
# *xsd_date - converts date to yyyy-mm-dd format and serializes it as a literal of type xsd:date 
# *folkrights - separates rights statement from copyright holder for OAC rights
# *geographic - attempts to identify a geonames URI locations in field. Will strip "(Ore.)" from string before searching. 
  • For each field in your mapping, add property in a comment next to the method for transformations
    composer: 
      method: lcnaf #marcrel:cmp

Finalizing

  • Use Oregon Digital Git best practices and make changes / additions on a branch, commit with helpful commit message, then push for merge
  • Validate syntax before commit

Running the script

  • Source image files can be stored in a location other than the metadata/COLLECTION folder, and the new path can be referenced with the command line parameter --image-file-path
  • Source image files can be mapped to a different file name using a CSV file specified in the command line parameter --image-file. The CSV file must have the columns in the format of old_file,new_file and have no heading. The file is read in and a hash of old->new can then be used in the cleanup task to convert from the old filename to the new one. See the herbarium_cleanup method in cleanup.rb for an example.

Using cdm2bag with complex objects

To modify cdm2bag to ingest a collection with complex objects edit the following files:

mapping.yml

  • mark the field containing .cpd files with the temporary term oregon:full e.g. find: oregon:full # Cleanup will look for .cpd files

cleanup.rb

  • add code to retrieve and parse the compound file, if present
    def COLLECTION_cleanup(collection, graph, subject)
      full_stmt = graph.query([subject, @namespaces['oregon']['full'], nil])
      full_file = full_stmt.first.object.to_s.downcase
      graph.delete(full_stmt) # This filename isn't saved so we don't need this triple anymore.
      if full_file.end_with? '.cpd'
        # Load the compound object data into the graph.
        graph = load_compound_objects(collection, graph, subject)
      else
        # Do something here if necessary.
      end
      graph
    end

When the cdm2bag script is run, it:

  1. Iterates over the records in the desc.all file and pulls out any complex items (identified by a *.cpd filename in the find field)
  2. Processes and bags the non-complex items in folders with names starting with 00001 and up
  3. Processes the complex items and bags them into folders whose names start after the last non-complex item
  4. Moves any non-complex bags that are missing files to a 'missing' folder