Skip to content

Commit

Permalink
Apply suggestions from code review
Browse files Browse the repository at this point in the history
Co-authored-by: emeryr-upenn <[email protected]>
  • Loading branch information
mdholloway and emeryr-upenn authored Jan 24, 2025
1 parent 62b0878 commit 3f0b086
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions doc/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,11 @@

For a general description of the Wikibase data model, see [Wikibase/DataModel](https://www.mediawiki.org/wiki/Wikibase/DataModel) on mediawiki.org.

The Digital Scriptorium Wikibase data export is a JSON-formatted array of Wikibase entities. The bulk of the entities in the export consist of the DS Catalog core model types: manuscipts, holdings, and records. The export also contains entities representing property definitions and authoritative references to common topics.
The Digital Scriptorium Wikibase data export is a JSON-formatted array of Wikibase entities. The bulk of the entities in the export consist of the DS Catalog core model types: manuscipts (Q1), holdings (Q2), and DS 2.0 records (Q3). The export also contains entities representing property definitions and authoritative references to common topics.

The [ExportRepresenter](../lib/digital_scriptorium/export_representer.rb) class can be used to deserialize an export in its entirety. The resulting [Export](../lib/digital_scriptorium/export.rb) object is essentially an array of Item and Property objects. Entities in the export are modeled using domain-specific classes provided by the [wikibase_representable](https://rubygems.org/gems/wikibase_representable) gem, such as Items, Properties, Statements (also known as Claims), and Snaks, which represent the primary claim of any statement as well as any qualifiers. Convenience methods are also provided to facilitate extracting data values.

The conversion script [wikibase_to_solr.rb](https://github.com/mdholloway/hxs-blacklight/blob/main/lib/wikibase_to_solr.rb) proceeds by deserializing the export and converting the resulting array of Wikibase objects to a hash keyed by entity ID. It then iterates over the elements of the hash. When it finds a record item based on the value of its instance-of (P16) claim, it retrieves the linked manuscript item from the export hash by entity ID. From the manuscript, in turn, it retrieves the ID of the item containing current holding information, and retrieves that too from the export hash. With the manuscript, current holding, and record items obtained, it iterates over each, extracting the Solr fields requested based on the property ID that is the subject of the claim and adding them to the Solr record to be produced. After all claims from the manuscript, holding, and record have been processed, the resulting Solr record is written to the output file. The script is written so as not to rely on the structure of the export file beyond that it will be a JSON array consisting of all entities in the DS 2.0 Wikibase, with record items linked to manuscript items and manuscript items linked to holding items by P3 (described manuscript) and P2 (holding) claims respectively.
The conversion script [wikibase_to_solr.rb](https://github.com/mdholloway/hxs-blacklight/blob/main/lib/wikibase_to_solr.rb) proceeds by deserializing the export and converting the resulting array of Wikibase objects to a hash keyed by entity ID. It then iterates over the elements of the hash. When it finds a DS 2.0 record item based on the value of its instance-of (where P16 is Q3) claim, it retrieves the linked manuscript item (P1, described manuscript) from the export hash by entity ID. From the manuscript, in turn, it retrieves the ID of the item containing current holding information (P2, manuscript holding), and retrieves that too from the export hash. With the manuscript, current holding, and record items obtained, it iterates over each, extracting the Solr fields requested based on the property ID that is the subject of the claim and adding them to the Solr record to be produced. After all claims from the manuscript, holding, and record have been processed, the resulting Solr record is written to the output file. The script is written so as not to rely on the structure of the export file beyond that it will be a JSON array consisting of all entities in the DS 2.0 Wikibase, with record items linked to manuscript items and manuscript items linked to holding items by P3 (described manuscript) and P2 (holding) claims respectively.

Solr field extraction logic is encapsulated in the Transformer classes. The [BaseClaimTransformer](../lib/digital_scriptorium/transformers/base_claim_transformer.rb) class sets out the basic contract, which consists of three methods: `display_values`, `search_values`, and `facet_values`. These methods return the collections of values to be included in the `_display`, `_search`, and `_facet` fields for the claim in the Solr object. For a title (P10) claim, for example, they would return the values to be used in the `title_display`, `title_search`, and `title_facet` fields. The remaining Transformer classes build on BaseClaimTransformer in various ways. For some claim types, the Transformer simply extracts the recorded value and returns it in one or more of the `_values` methods. For other claim types, it is expected that a claim will be qualified with a representation of the recorded value in its original script, or with references to a standard title or value from an authority file. This logic is contained in the [QualifiedClaimTransformer](../lib/digital_scriptorium/transformers/qualified_claim_transformer.rb) class. For these claim types, the standard title or value from authority file is returned in the `facet_values` collection. For claim types where the recorded value should be provided as a facet value in the absence of a qualifier, the [QualifiedClaimTransformerWithFacetFallback](../lib/digital_scriptorium/transformers/qualified_claim_transformer_with_facet_fallback.rb) class is provided. Finally, the [LinkClaimTransformer](../lib/digital_scriptorium/transformers/link_claim_transformer.rb) class handles a couple of claim types for which the value to be extracted is a URL. The [Transformers](../lib/digital_scriptorium/transformers.rb) class contains the mapping of claim property IDs to Transformer classes, as well as the prefixes to be used in the Solr fields based on the property name, and provides factory methods used by the conversion script to obtain Transformers as it iterates over claims.

Expand Down

0 comments on commit 3f0b086

Please sign in to comment.