ArchivesSpace, ArcLight, and Hyrax Workflow

This repo contains documentation and scripts for how the M.E. Grenander Department of Special Collections & Archives connects ArchivesSpace, ArcLight, and Hyrax and keeps everything synced together. It contains:

Documentation for uploading digital object in Hyrax using existing
Overnight exporting and indexing scripts that update data between each service

Updated documentation for this repo is on our documentation site:

Uploading Digital Objects to Hyrax with Existing Description

Uploading Digital Objects to Hyrax

Go to Hyrax and login, or create an account and request uploading access
- Let Greg know when you create an account and return when you have upload permissions.
Once you have upload permissions, go to Arclight, find the file that represents the digital object you want to upload. From the URI, copy the long string of letters and numbers right after the “aspace_”. This is the unique ArchivesSpace ID for that record.
- Notice the collection ID is in the URI as well.

In your Dashboard, select “Works” on the left side menu

Select the “Add new work” button on the right side

For most cases, select “Digital Archival Objects” and then the “Create Work” button.

In the “Descriptions” tab, enter only the ArchivesSpace ID, and the Collection number

Select the “Load Record” button to pull additional metadata from Arclight (JavaScript file)

Add additional Metadata, Resource Type and Rights Statement is required, while “Additional fields” are not

In the “Files” tab, browse and upload any files represented by the Arclight record. These can be PDFs, Office documents (doc, docx, ppt, xlsx, etc.), or any image file.

Select the Visibility of the work on the right side, and Save the work.

Overnight Export and Indexing Scripts

High-Level Overview

What Each Script Does

exportPublicData.py

Each night, exportPublicData.py uses ArchivesSnake to query ArchivesSpace for resources updated since the last run.
For collections with the complete set of DACS-minimum elements it exports EAD 2002 files and for collections with only abstracts and extents it saves them to Pipe-delimited CSVs.
It also builds a CSV of local subjects and collection IDs.
All this data is pushed to Github.

staticPages.py

exportPublicData.py runs staticPages.py when its finished, which builds static browse pages for all collections, including a complete A-Z list, alpha lists for each collecting area, and pages for each local subject.

Indexing Shell Scripts

Later, collection data is updated with git pull and indexNewEAD.sh indexes EAD files updated in the past day with find -mtime -1 into the ArcLight Solr instance.
There are also additional indexing shell scripts for ad hoc updates.
- indexAllEAD.sh reindexes all EAD files
- indexOneEAD.sh indexes only one EAD by collection ID (./indexOneEAD.sh apap101)
- indexOneNDPA.sh indexes one NDPA EAD file, necessary because they have the same collection ID prefixes
- indexNewNoLog.sh indexes one EAD file, but logs to the stdout instead of a log file
- indexOneURL.sh indexes via a URL instead of from disk (not actively used)

processNewUploads.py

Finally, processNewUploads.py queries the Hyrax Solr index for new uploads that are connected to ArchivesSpace ref_ids, but do not have accession numbers.
It downloads the new binaries and metadata and creates basic Archival Information Packages (AIPs) using bagit-python
It then uses ArchivesSnake to add a new Digital Object Record in ArchivesSpace that links to the object in Hyrax
Last, it adds a new accession ID in Hyrax
(Also check out Noah Huffman's talk that probably does this better [Direct Link].)

dacs.py

A simple library that converts Posix timestamps and ISO 8601 Dates to DACS-compliant display dates.
exportPublicData.py uses this to make dates for the static browse pages.

image_a_day.py

Queries the Bing background image API each night to display new background images for ArchivesSpace and Find-It just for fun.

Example crontab

# get new image from Bing
0 2 * * * source /home/user/.bashrc; pyenv activate aspaceExport && python /opt/lib/ArchivesSpace-ArcLight-Workflow/image_a_day.py 1>> /media/SPE/indexing-logs/image_a_day.log 2>&1 && pyenv deactivate

# export data from ASpace
0 0 * * * source /home/user/.bashrc; pyenv activate aspaceExport && python /opt/lib/ArchivesSpace-ArcLight-Workflow/exportPublicData.py 1>> /media/SPE/indexing-logs/export.log 2>&1 && pyenv deactivate

# pull new EADs from Gitub
30 0 * * * echo "$(date) $line git pull" >> /media/SPE/indexing-logs/git.log && git --git-dir=/opt/lib/collections/.git --work-tree=/opt/lib/collections pull 1>> /media/SPE/indexing-logs/git.log 2>&1

# Index modified apap collections
5 1 * * * /opt/lib/ArchivesSpace-ArcLight-Workflow/indexNewEAD.sh "apap"

# Index modified ua collections
15 1 * * * /opt/lib/ArchivesSpace-ArcLight-Workflow/indexNewEAD.sh "ua"

# Index modified ndpa collections
25 1 * * * /opt/lib/ArchivesSpace-ArcLight-Workflow/indexNewEAD.sh "ndpa"

# Index modified ger collections
35 1 * * * /opt/lib/ArchivesSpace-ArcLight-Workflow/indexNewEAD.sh "ger"

# Index modified mss collections
45 1 * * * /opt/lib/ArchivesSpace-ArcLight-Workflow/indexNewEAD.sh "mss"

# Download new Hyrax uploads and create new ASpace digital objects
0 2 * * * source /home/user/.bashrc; pyenv activate processNewUploads && python /opt/lib/ArchivesSpace-ArcLight-Workflow/processNewUploads.py 1>> /media/SPE/indexing-logs/processNewUploads.log 2>&1 && pyenv deactivate

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ArchivesSpace, ArcLight, and Hyrax Workflow

Uploading Digital Objects to Hyrax with Existing Description

Overnight Export and Indexing Scripts

High-Level Overview

What Each Script Does

exportPublicData.py

staticPages.py

Indexing Shell Scripts

processNewUploads.py

dacs.py

image_a_day.py

Example crontab

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 73 Commits
__pycache__		__pycache__
img		img
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
browsePagesData.py		browsePagesData.py
buildSelections.py		buildSelections.py
dacs.py		dacs.py
export.sh		export.sh
exportPublicData.py		exportPublicData.py
image_a_day.py		image_a_day.py
indexAllEAD.sh		indexAllEAD.sh
indexNewEAD.sh		indexNewEAD.sh
indexNewNoLog.sh		indexNewNoLog.sh
indexOneEAD.sh		indexOneEAD.sh
indexOneNDPA.sh		indexOneNDPA.sh
indexOneURL.sh		indexOneURL.sh
processNewUploads.py		processNewUploads.py
staticPages.py		staticPages.py

License

UAlbanyArchives/ArchivesSpace-ArcLight-Workflow

Folders and files

Latest commit

History

Repository files navigation

ArchivesSpace, ArcLight, and Hyrax Workflow

Uploading Digital Objects to Hyrax with Existing Description

Overnight Export and Indexing Scripts

High-Level Overview

What Each Script Does

Indexing Shell Scripts

Example crontab

About

Resources

License

Stars

Watchers

Forks

Languages