-
Notifications
You must be signed in to change notification settings - Fork 2
Home
Mike Eaton edited this page Apr 1, 2015
·
6 revisions
- Update appropriate mapping.yml file
- Define namespace prefixes for compact URLs
namespaces:
rdf: "http://www.w3.org/1999/02/22-rdf-syntax-ns#"
rdfs: "http://www.w3.org/2000/01/rdf-schema#"
owl: "http://www.w3.org/2002/07/owl#"
skos: "http://www.w3.org/2004/02/skos/core#"
dct: "http://purl.org/dc/terms/"
dce: "http://purl.org/dc/elements/1.1/"
bibo: "http://purl.org/ontology/bibo/"
foaf: "http://xmlns.com/foaf/0.1/"
geo: "http://www.w3.org/2003/01/geo/wgs84_pos#"
frbr: "http://iflastandards.info/ns/fr/frbr/frbrer/"
mads: "http://www.loc.gov/mads/rdf/v1#"
marcrel: "http://id.loc.gov/vocabulary/relators/"
modsrdf: "http://www.loc.gov/standards/mods/modsrdf/v1/"
premis: "http://www.loc.gov/premis/rdf/v1#"
oregon: "http://opaquenamespace.org/ns/"
vra: "http://www.loc.gov/standards/vracore/vocab/"
aat: "http://vocab.getty.edu/resource/aat/"
dwc: "http://rs.tdwg.org/dwc/terms/"
exif: "http://www.w3.org/2003/12/exif/ns"
holding: "http://purl.org/ontology/holding#"
schema: "http://schema.org/"
oad: "http://lod.xdams.org/reload/oad/"
swpo: "http://sw-portal.deri.org/ontologies/swportal#"
rdam: "http://rdaregistry.info/Elements/m/"
rdaw: "http://rdaregistry.info/Elements/w/"
rdae: "http://rdaregistry.info/Elements/e/"
rdai: "http://rdaregistry.info/Elements/i/"
rdaa: "http://rdaregistry.info/Elements/a/"
archives: "http://data.archiveshub.ac.uk/def/"
- Start mappings
mappings:
- Identify set to map with CISONICK for collection alias, then indent terms below the collection alias.
sheetmusic:
title: dct:title
creato:
method: lcnaf #dct:creator
captio: oregon:captionTitle
other: dct:alternative
composer:
method: lcnaf #marcrel:cmp
catalo: SKIP
- Make sure to use CISONICK for each element from desc.all file
- Use Dublin Core as a base element set
- Use additional Linked Open Data (LOD) predicates
- Use additional established opaquenamespace/Oregon Digital terms (make sure terms are added to Opaque Namespace
- Don’t use Term - SKIP
- Use a method for cleaning up known data errors or mapping strings to URIs
- Define all methods in a comment (so that programmer knows intent of method)
# METHODS:
#
# *xsd_date - converts date to yyyy-mm-dd format and serializes it as a literal of type xsd:date
# *folkrights - separates rights statement from copyright holder for OAC rights
# *geographic - attempts to identify a geonames URI locations in field. Will strip "(Ore.)" from string before searching.
- For each field in your mapping, add property in a comment next to the method for transformations
composer:
method: lcnaf #marcrel:cmp
- Use Oregon Digital Git best practices and make changes / additions on a branch, commit with helpful commit message, then push for merge
- Validate syntax before commit
- Source image files can be stored in a location other than the metadata/COLLECTION folder, and the new path can be referenced with the command line parameter --image-file-path
- Source image files can be mapped to a different file name using a CSV file specified in the command line parameter --image-file. The CSV file must have the columns in the format of old_file,new_file and have no heading. The file is read in and a hash of old->new can then be used in the cleanup task to convert from the old filename to the new one. See the herbarium_cleanup method in cleanup.rb for an example.
To modify cdm2bag to ingest a collection with complex objects edit the following files:
mapping.yml
- mark the field containing .cpd files with the temporary term oregon:full e.g. find: oregon:full # Cleanup will look for .cpd files
cleanup.rb
- add code to retrieve and parse the compound file, if present
def COLLECTION_cleanup(collection, graph, subject)
full_stmt = graph.query([subject, @namespaces['oregon']['full'], nil])
full_file = full_stmt.first.object.to_s.downcase
graph.delete(full_stmt) # This filename isn't saved so we don't need this triple anymore.
if full_file.end_with? '.cpd'
# Load the compound object data into the graph.
graph = load_compound_objects(collection, graph, subject)
else
# Do something here if necessary.
end
graph
end
When the cdm2bag script is run, it:
- Iterates over the records in the desc.all file and pulls out any complex items (identified by a *.cpd filename in the find field)
- Processes and bags the non-complex items in folders with names starting with 00001 and up
- Processes the complex items and bags them into folders whose names start after the last non-complex item
- Moves any non-complex bags that are missing files to a 'missing' folder