-
Notifications
You must be signed in to change notification settings - Fork 2
Wikibase to Solr #10: multiple values
To understand the Wikibase to Solr script, it is useful to first understand the Wikibase JSON structure and how it is interpreted inside Ruby.
Here is a simple program which loads the export.json file into a Ruby variable using the JSON library, and then iterates over each item.
Inside of each property array, we may find further qualifiers
present. Qualifiers are stored in qualifiers
, at the same level as mainsnak
. Qualifiers are simply additional properties, but with a relationship to the parent property.
In any given property or qualifier, there may be multiple values present.
## import Ruby functions
require 'json'
require 'csv'
require 'date'
require 'time'
require 'optparse'
dir = File.dirname __FILE__
importJSONfile = File.expand_path 'export.json', dir
## Load the import JSON file into a Ruby array
data = JSON.load_file importJSONfile
data.each do |item|
@id = item["id"]
@keys = item.keys # ["type", "id", "labels", "descriptions", "aliases", "claims", "sitelinks", "lastrevid"]
@claims = item["claims"]
puts @id
@claims.each_key do |property|
@propertyArray = @claims.dig(property)
@propertyValue = @propertyArray&.first&.dig "mainsnak", "datavalue", "value"
@propertyArrayLength = @propertyArray.length
puts "#{property} contains #{@propertyArrayLength} values"
puts "1st value: #{@propertyValue}"
@qualifiers = @propertyArray&.first&.dig("qualifiers")
if @qualifiers
@qualifiers.each_key do |qualifier|
@qualifierArray = @qualifiers.dig(qualifier)
@qualifierArrayLength = @qualifierArray.length
@qualifierValue = @qualifierArray&.first&.dig "datavalue", "value"
puts "Q#{qualifier} contains #{@qualifierArrayLength} values"
puts "1st value: #{@qualifierValue}"
end
end
end
puts "--"
end
The output reveals the key count for each property and qualifier.
Q1300
P10 contains 1 values
1st value: Kitāb al-Majisṭī
QP13 contains 1 values
1st value: كتاب المجسطي.
QP11 contains 1 values
1st value: {"entity-type"=>"item", "numeric-id"=>1007, "id"=>"Q1007"}
P12 contains 1 values
1st value: Almagest.
P14 contains 5 values
1st value: Ptolemy, active 2nd century
QP15 contains 1 values
1st value: {"entity-type"=>"item", "numeric-id"=>35, "id"=>"Q35"}
QP13 contains 1 values
1st value: بطليموس، active 2nd century
QP17 contains 1 values
1st value: {"entity-type"=>"item", "numeric-id"=>315, "id"=>"Q315"}
P16 contains 1 values
1st value: {"entity-type"=>"item", "numeric-id"=>3, "id"=>"Q3"}
P18 contains 9 values
1st value: Tables (Data)
QP20 contains 1 values
1st value: {"entity-type"=>"item", "numeric-id"=>932, "id"=>"Q932"}
P19 contains 2 values
1st value: Astronomy--Early works to 1800
QP20 contains 2 values
1st value: {"entity-type"=>"item", "numeric-id"=>936, "id"=>"Q936"}
P21 contains 1 values
1st value: Arabic
QP22 contains 1 values
1st value: {"entity-type"=>"item", "numeric-id"=>977, "id"=>"Q977"}
P23 contains 1 values
1st value: 1381
QP25 contains 1 values
1st value: {"time"=>"+1301-01-01T00:00:00Z", "timezone"=>0, "before"=>0, "after"=>0, "precision"=>7, "calendarmodel"=>"http://www.wikidata.org/entity/Q1985727"}
QP24 contains 1 values
1st value: {"entity-type"=>"item", "numeric-id"=>97, "id"=>"Q97"}
QP37 contains 1 values
1st value: {"time"=>"+1381-01-01T00:00:00Z", "timezone"=>0, "before"=>0, "after"=>0, "precision"=>9, "calendarmodel"=>"http://www.wikidata.org/entity/Q1985727"}
QP36 contains 1 values
1st value: {"time"=>"+1381-12-31T00:00:00Z", "timezone"=>0, "before"=>0, "after"=>0, "precision"=>9, "calendarmodel"=>"http://www.wikidata.org/entity/Q1985727"}
P29 contains 1 values
1st value: Extent: i, 174, i leaves : paper ; 280 x 215 (220 x 145) mm bound to 280 x 225 mm.
P3 contains 1 values
1st value: {"entity-type"=>"item", "numeric-id"=>1299, "id"=>"Q1299"}
P30 contains 1 values
1st value: paper
QP31 contains 1 values
1st value: {"entity-type"=>"item", "numeric-id"=>27, "id"=>"Q27"}
P32 contains 3 values
1st value: Many edges and corners of leaves mended with paper.
P34 contains 1 values
1st value: {"time"=>"+2023-03-17T00:00:00Z", "timezone"=>0, "before"=>0, "after"=>0, "precision"=>11, "calendarmodel"=>"http://www.wikidata.org/entity/Q1985727"}
P35 contains 1 values
1st value: {"time"=>"+2023-03-17T00:00:00Z", "timezone"=>0, "before"=>0, "after"=>0, "precision"=>11, "calendarmodel"=>"http://www.wikidata.org/entity/Q1985727"}
P41 contains 1 values
1st value: https://colenda.library.upenn.edu/phalt/iiif/2/81431-p3ff3m111/manifest
--
So each_key
may have a single instance or multiple. We can't use each
instead of each_key
because we run into the same problem (you can go ahead and try it). Therefore, we have to insert a qualifier
loop, inside the property
loop, which is inside the claims
loop, which is inside the item
loop.
## import Ruby functions
require 'json'
require 'csv'
require 'date'
require 'time'
require 'optparse'
dir = File.dirname __FILE__
importJSONfile = File.expand_path 'export.json', dir
## Load the import JSON file into a Ruby array
data = JSON.load_file importJSONfile
data.each do |item|
@id = item["id"]
@keys = item.keys # ["type", "id", "labels", "descriptions", "aliases", "claims", "sitelinks", "lastrevid"]
@claims = item["claims"]
puts @id
@claims.each_key do |property|
@propertyArray = @claims.dig(property)
puts "#{property} contains:"
@propertyArray.each do |propertyInstance|
@propertyMainsnakDatavalueValue = propertyInstance.dig "mainsnak", "datavalue", "value"
puts " #{@propertyMainsnakDatavalueValue}"
@qualifiers = propertyInstance&.dig("qualifiers")
if @qualifiers
@qualifiers.each_key do |qualifier|
@qualifierArray = @qualifiers.dig(qualifier)
puts " >> Q#{qualifier} contains"
@qualifierArray.each do |qualifierInstance|
@qualifierDatavalueValue = qualifierInstance&.dig "datavalue", "value"
puts " #{@qualifierDatavalueValue}"
end
end
end
end
end
puts "--"
end
Output: P18 contains: Tables (Data) >> QP20 contains {"entity-type"=>"item", "numeric-id"=>932, "id"=>"Q932"} Tables (Data) >> QP20 contains {"entity-type"=>"item", "numeric-id"=>932, "id"=>"Q932"} Annotations (Provenance) Manuscripts, Arabic >> QP20 contains {"entity-type"=>"item", "numeric-id"=>771, "id"=>"Q771"} Manuscripts, 14th century >> QP20 contains {"entity-type"=>"item", "numeric-id"=>819, "id"=>"Q819"} {"entity-type"=>"item", "numeric-id"=>830, "id"=>"Q830"} Treatises >> QP20 contains {"entity-type"=>"item", "numeric-id"=>883, "id"=>"Q883"} Codices (bound manuscripts) >> QP20 contains {"entity-type"=>"item", "numeric-id"=>813, "id"=>"Q813"} Diagrams >> QP20 contains {"entity-type"=>"item", "numeric-id"=>884, "id"=>"Q884"} Manuscripts, Medieval >> QP20 contains {"entity-type"=>"item", "numeric-id"=>888, "id"=>"Q888"} P19 contains: Astronomy--Early works to 1800 >> QP20 contains {"entity-type"=>"item", "numeric-id"=>936, "id"=>"Q936"} {"entity-type"=>"item", "numeric-id"=>871, "id"=>"Q871"} Astronomy >> QP20 contains {"entity-type"=>"item", "numeric-id"=>936, "id"=>"Q936"} P21 contains: Arabic >> QP22 contains {"entity-type"=>"item", "numeric-id"=>977, "id"=>"Q977"} P23 contains: 1381 >> QP25 contains {"time"=>"+1301-01-01T00:00:00Z", "timezone"=>0, "before"=>0, "after"=>0, "precision"=>7, "calendarmodel"=>"http://www.wikidata.org/entity/Q1985727"} >> QP24 contains {"entity-type"=>"item", "numeric-id"=>97, "id"=>"Q97"} >> QP37 contains {"time"=>"+1381-01-01T00:00:00Z", "timezone"=>0, "before"=>0, "after"=>0, "precision"=>9, "calendarmodel"=>"http://www.wikidata.org/entity/Q1985727"} >> QP36 contains {"time"=>"+1381-12-31T00:00:00Z", "timezone"=>0, "before"=>0, "after"=>0, "precision"=>9, "calendarmodel"=>"http://www.wikidata.org/entity/Q1985727"} P29 contains: Extent: i, 174, i leaves : paper ; 280 x 215 (220 x 145) mm bound to 280 x 225 mm. P3 contains: {"entity-type"=>"item", "numeric-id"=>1299, "id"=>"Q1299"} P30 contains: paper >> QP31 contains {"entity-type"=>"item", "numeric-id"=>27, "id"=>"Q27"} P32 contains: Many edges and corners of leaves mended with paper. Title from title page (f. 1r). Ms. codex. P34 contains: {"time"=>"+2023-03-17T00:00:00Z", "timezone"=>0, "before"=>0, "after"=>0, "precision"=>11, "calendarmodel"=>"http://www.wikidata.org/entity/Q1985727"} P35 contains: {"time"=>"+2023-03-17T00:00:00Z", "timezone"=>0, "before"=>0, "after"=>0, "precision"=>11, "calendarmodel"=>"http://www.wikidata.org/entity/Q1985727"} P41 contains: https://colenda.library.upenn.edu/phalt/iiif/2/81431-p3ff3m111/manifest --