-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
DEV-1373 make catalog indexing date independent (#52)
- Add `CICTL::Journal` class for writing dated files to a predefined location to record Zephir files indexed. - Add `journal_directory` to `Services` with `ENV`-overridable default location for journal files. - Add `cictl continue` command that calls `cictl all` or `cictl since` depending on presence or absence of relevant jourmals. - TIDY: remove deprecated docker-compose.yml version. - Address a number of nokogiri/rexml vulnerabilities identified by Dependabot. - Address #50 availability maps should account for icus. - Remove `standardrb` exception for `lib/translation_maps` (mostly) by changing single to double quotes. - Remove `ht_namespace_map` and unused reference to it. - Remove unused umich translation maps. - Run many cictl tests in temp directory with `around` block. - Add `CICTL::Examples` helpers to tidy up test setup.
- Loading branch information
Showing
23 changed files
with
783 additions
and
469 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,4 @@ | ||
--- | ||
version: '3' | ||
|
||
services: | ||
traject: | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,66 @@ | ||
# frozen_string_literal: true | ||
|
||
require_relative "../services" | ||
|
||
module CICTL | ||
# A class that enables date-independent catalog indexing using the filesystem. | ||
# | ||
# Each time a full or update file is indexed, writes an (empty) file of the form | ||
# hathitrust_catalog_indexer_journal_upd_YYYYMMDD.txt or | ||
# hathitrust_catalog_indexer_journal_full_YYYYMMDD.txt in the journal directory. | ||
# | ||
# When we use the index command `cictl continue` | ||
# we calculate the earliest zephir file not yet indexed and proceed in order from | ||
# that point. | ||
# | ||
# Nomenclature note: "journal" is the closest semantic match to "log" I could find. | ||
# This is a log, of sorts, but the term was already taken. | ||
class Journal | ||
attr_reader :date | ||
|
||
FILENAME_PATTERN = /hathitrust_catalog_indexer_journal_(full|upd)_(\d{8})\.txt/ | ||
|
||
def self.filename_for(date:, full:) | ||
yyyymmdd = date.strftime "%Y%m%d" | ||
type = full ? "full" : "upd" | ||
"hathitrust_catalog_indexer_journal_#{type}_#{yyyymmdd}.txt" | ||
end | ||
|
||
def initialize(date: Date.today - 1, full: false) | ||
@date = date | ||
@full = full | ||
end | ||
|
||
# Use the built-in but append the date and full/upd because that's what we care about. | ||
def to_s | ||
super.tap do |s| | ||
s.gsub!(/>$/, " [#{date} #{full? ? "full" : "upd"}]>") | ||
end | ||
end | ||
|
||
def full? | ||
@full | ||
end | ||
|
||
# Of the form `hathitrust_catalog_indexer_journal_(full|upd)_YYYYMMDD.txt` | ||
def file | ||
self.class.filename_for(date: date, full: full?) | ||
end | ||
|
||
def path | ||
File.join(HathiTrust::Services[:journal_directory], file) | ||
end | ||
|
||
def exist? | ||
File.exist? path | ||
end | ||
|
||
def missing? | ||
!exist? | ||
end | ||
|
||
def write! | ||
FileUtils.touch path | ||
end | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,18 +1,27 @@ | ||
require 'ht_traject/ht_constants' | ||
require 'match_map' | ||
require "ht_traject/ht_constants" | ||
require "match_map" | ||
|
||
mm = MatchMap.new | ||
|
||
mm[/^umall$/] = HathiTrust::Constants::FT | ||
mm[/world$/] = HathiTrust::Constants::FT # matches world, ic-world, und-world | ||
mm[/^cc.*/] = HathiTrust::Constants::FT | ||
mm[/^pd(?:us)?$/] = HathiTrust::Constants::FT # pd or pdus | ||
# Note: orph, orphcand, and umall are unattested in rights_current as of Oct 2024 | ||
|
||
mm[/^ic$/] = HathiTrust::Constants::SO | ||
mm[/^orph$/] = HathiTrust::Constants::SO | ||
mm[/^nobody$/] = HathiTrust::Constants::SO | ||
mm[/^und$/] = HathiTrust::Constants::SO | ||
mm[/^pd-p/] = HathiTrust::Constants::SO # pd-pvt or pd-private | ||
mm[/^opb?$/] = HathiTrust::Constants::SO | ||
# Full Text | ||
mm["pd"] = HathiTrust::Constants::FT # [1] | ||
mm["ic-world"] = HathiTrust::Constants::FT # [7] | ||
mm["pdus"] = HathiTrust::Constants::FT # [9] | ||
mm[/^cc-/] = HathiTrust::Constants::FT # [10-15, 17, 20-25] | ||
mm["und-world"] = HathiTrust::Constants::FT # [18] | ||
|
||
# Search Only | ||
mm["ic"] = HathiTrust::Constants::SO # [2] | ||
mm["op"] = HathiTrust::Constants::SO # [3] | ||
mm["orph"] = HathiTrust::Constants::SO # [4] | ||
mm["und"] = HathiTrust::Constants::SO # [5] | ||
mm["umall"] = HathiTrust::Constants::SO # [6] | ||
mm["nobody"] = HathiTrust::Constants::SO # [8] | ||
mm["orphcand"] = HathiTrust::Constants::SO # [16] | ||
mm["icus"] = HathiTrust::Constants::SO # [19] | ||
mm["pd-pvt"] = HathiTrust::Constants::SO # [26] | ||
mm["supp"] = HathiTrust::Constants::SO # [27] | ||
|
||
mm |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,17 +1,26 @@ | ||
require 'ht_traject/ht_constants' | ||
require "ht_traject/ht_constants" | ||
|
||
mm = MatchMap.new | ||
|
||
mm['umall'] = HathiTrust::Constants::FT | ||
mm['world'] = HathiTrust::Constants::FT # matches world, ic-world, und-world | ||
mm[/^cc.*/] = HathiTrust::Constants::FT | ||
mm['pd'] = HathiTrust::Constants::FT | ||
# Note: orph, orphcand, and umall are unattested in rights_current as of Oct 2024 | ||
|
||
mm['pdus'] = HathiTrust::Constants::SO | ||
mm['ic'] = HathiTrust::Constants::SO | ||
mm[/^opb?$/] = HathiTrust::Constants::SO | ||
mm['orph'] = HathiTrust::Constants::SO | ||
mm['nobody'] = HathiTrust::Constants::SO | ||
mm['und'] = HathiTrust::Constants::SO | ||
# Full Text | ||
mm["pd"] = HathiTrust::Constants::FT # [1] | ||
mm["ic-world"] = HathiTrust::Constants::FT # [7] | ||
mm[/^cc-/] = HathiTrust::Constants::FT # [10-15, 17, 20-25] | ||
mm["und-world"] = HathiTrust::Constants::FT # [18] | ||
mm["icus"] = HathiTrust::Constants::FT # [19] | ||
|
||
# Search Only | ||
mm["ic"] = HathiTrust::Constants::SO # [2] | ||
mm["op"] = HathiTrust::Constants::SO # [3] | ||
mm["orph"] = HathiTrust::Constants::SO # [4] | ||
mm["und"] = HathiTrust::Constants::SO # [5] | ||
mm["umall"] = HathiTrust::Constants::SO # [6] | ||
mm["nobody"] = HathiTrust::Constants::SO # [8] | ||
mm["pdus"] = HathiTrust::Constants::SO # [9] | ||
mm["orphcand"] = HathiTrust::Constants::SO # [16] | ||
mm["pd-pvt"] = HathiTrust::Constants::SO # [26] | ||
mm["supp"] = HathiTrust::Constants::SO # [27] | ||
|
||
mm |
Oops, something went wrong.