Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gbl4 migration 2023 #1

Merged
merged 11 commits into from
Dec 1, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
The diff you're trying to view is too large. We only load the first 3000 changed files.
22 changes: 22 additions & 0 deletions .github/workflows/lint.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
name: lint records
on:
push:
paths-ignore:
- '**.md'
pull_request:
paths-ignore:
- '**.md'

jobs:
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: setup ruby
uses: ruby/setup-ruby@v1
with:
bundler-cache: true
- name: lint v1 records
run: bundle exec rake lint:v1
- name: lint aardvark records
run: bundle exec rake lint:aardvark
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
# ignore .DS_Store files

.DS_Store
.idea
1 change: 1 addition & 0 deletions .ruby-version
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
3.2.2
4 changes: 0 additions & 4 deletions .travis.yml

This file was deleted.

6 changes: 6 additions & 0 deletions Gemfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
source 'https://rubygems.org'

gem 'json_schemer'
gem 'rake'
gem 'ruby-progressbar'
gem 'sdr_cli', github: 'NYULibraries/sdr-cli', branch: 'main'
48 changes: 48 additions & 0 deletions Gemfile.lock
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
GIT
remote: https://github.com/NYULibraries/sdr-cli.git
revision: 45cbc9e648a6ad3ce7071b84adf6e2d0168cae77
branch: main
specs:
sdr_cli (0.1.0)
dotenv (~> 2.7)
faraday (~> 2.7)
thor (~> 1.2.2)

GEM
remote: https://rubygems.org/
specs:
base64 (0.2.0)
dotenv (2.8.1)
faraday (2.7.11)
base64
faraday-net_http (>= 2.0, < 3.1)
ruby2_keywords (>= 0.0.4)
faraday-net_http (3.0.2)
hana (1.3.7)
json_schemer (2.1.1)
hana (~> 1.3)
regexp_parser (~> 2.0)
simpleidn (~> 0.2)
rake (13.0.6)
regexp_parser (2.8.2)
ruby-progressbar (1.13.0)
ruby2_keywords (0.0.5)
simpleidn (0.2.1)
unf (~> 0.1.4)
thor (1.2.2)
unf (0.1.4)
unf_ext
unf_ext (0.0.9.1)

PLATFORMS
arm64-darwin-22
x86_64-linux

DEPENDENCIES
json_schemer
rake
ruby-progressbar
sdr_cli!

BUNDLED WITH
2.4.21
29 changes: 3 additions & 26 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,36 +1,13 @@
# NYU Libraries GeoBlacklight Metadata Repository

This repository is the most current source for geospatial metadata housed within the NYU Libraries collection, the [Spatial Data Repository](geo.nyu.edu). We encourage any institution to index our metadata into their own discovery environment(s).
[OpenGeoMetadata/edu.nyu](https://github.com/OpenGeoMetadata/edu.nyu) is the cannonical, most current source for geospatial metadata housed within the NYU Libraries collection in out [Spatial Data Repository](geo.nyu.edu).

#### Metadata validation checks
You might currently be looking at [OpenGeoMetadata/edu.nyu](https://github.com/OpenGeoMetadata/edu.nyu) or at NYU's internal fork [NYU-DataServices/gis-metadata-staging](https://github.com/NYU-DataServices/gis-metadata-staging) — the latter is where NYU staff works on in-process records to lint and stage in our staging instance.

Our metadata is validated to be in compliance with the current [GeoBlacklight 1.0 schema](https://github.com/geoblacklight/geoblacklight/blob/master/schema/geoblacklight-schema.md) through Travis-CI. Click the "build-failing" button to see which of our records (if any) have validation errors.
[OpenGeoMetadata/edu.nyu](https://github.com/OpenGeoMetadata/edu.nyu) should never be committed to directly—it should only take pull requests from [NYU-DataServices/gis-metadata-staging](https://github.com/NYU-DataServices/gis-metadata-staging).

[![Build Status](https://api.travis-ci.org/OpenGeoMetadata/edu.nyu.svg?branch=master)](https://travis-ci.org/OpenGeoMetadata/edu.nyu)

#### Contribution and enhancement status

![Open for metadata contributions](https://upload.wikimedia.org/wikipedia/commons/archive/0/0e/20170421060213%21Location_dot_green.svg) *Open for metadata contributions and enhancements*

#### Suggested enhancements our existing metadata

There are many potential ways members of the GeoBlacklight community can enhance our metadata. Some examples include (but are not limited to):
* Fixing typos
* Normalizing string values for subjects and placenames
* Adding placename strings to records
* Enhancing descriptions
* Correcting errors on bounding box values
* Suggesting references for contextual information
* Submitting fixes for invalid records

#### Preferred process for submitting enhancements

We prefer that enhancements be consolidated into as few branches as possible. Here is a suggested workflow:
* Fork this reports
* Make all changes to the files locally while preserving our naming convention
* Submit a pull request based on your fork and describe the nature of your changes
* Assign the review task to Andrew Battista or Taylor Hixson

#### Contact Information

If you have any questions about remediating this metadata or would like to discuss larger-scale remediation projects, please reach out to Andrew Battista or Taylor Hixson or create an issue within this repository.
50 changes: 22 additions & 28 deletions Rakefile
Original file line number Diff line number Diff line change
@@ -1,32 +1,26 @@
desc "Validate all geoblacklight.json records"
task :validate_all do
require 'geo_combine'
require 'find'
paths = Find.find(Dir.pwd).select{ |x| x.include?("geoblacklight.json")}
records_invalid = 0
records_valid = 0
invalid_paths = []
puts "Validating #{paths.count} Geoblacklight records:"
paths.each_with_index do |path, idx|
rec = GeoCombine::Geoblacklight.new(File.read(path))
begin
rec.valid?
records_valid += 1
if (idx % 10 == 0)
print(idx)
else
print(".")
end
rescue
records_invalid += 1
invalid_paths << path
print("X")
end
end
require_relative 'lib/lint'

namespace :lint do
AARDVARK_SCHEMA_URL = 'https://opengeometadata.org/schema/geoblacklight-schema-aardvark.json'
OGM_V1_SCHEMA_URL = 'https://opengeometadata.org/schema/geoblacklight-schema-1.0.json'

if records_invalid > 0
raise "Contains #{records_invalid} invalid records:\n#{invalid_paths}"
desc "lint version 1 geoblacklight.json records"
task :v1 do
puts "\nOGM v1 ~>"
paths = Dir.glob("./metadata-1.0/**/*/geoblacklight.json")
lint paths, OGM_V1_SCHEMA_URL
end
desc "lint aardvark geoblacklight.json records"
task :aardvark do
puts "\nAARDVARK ~>"
paths = Dir.glob("./metadata-aardvark/*/**/*.json")
lint paths, AARDVARK_SCHEMA_URL
end
desc "lint all records"
task :all do
Rake::Task['lint:v1'].execute
Rake::Task['lint:aardvark'].execute
end
end

task :default => ["validate_all"]
task :default => ["lint:all"]
31 changes: 31 additions & 0 deletions lib/lint.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
require 'json'
require 'json_schemer'
require 'open-uri'
require 'ruby-progressbar'

def lint(paths, schema_url)
invalid = []
schemer = JSONSchemer.schema JSON.load(URI.open(schema_url))
bar = ProgressBar.create format: "Linting record %c/%C (%P% complete ) — %e", total: paths.length

paths.each do |path|
record = JSON.parse File.read(path)
id = record['layer_id_s'] || record['id']

invalid << {
'id' => id,
'errors' => schemer.validate(record).map { |x| x['error'] }
} unless schemer.valid?(record)
bar.increment
end

if invalid.empty?
puts "All #{paths.length} records passed ✅"
else
puts "#{invalid.length}/#{paths.length} records have failed schema validation:"
invalid.each do |i|
puts "❌ #{i['id']}"
i['errors'].each { |e| puts "\t #{e}" }
end
end
end
Empty file added metadata-1.0/.gitkeep
Empty file.
Loading