-
Notifications
You must be signed in to change notification settings - Fork 2
Wikibase to Solr #9: qualifier values
To understand the Wikibase to Solr script, it is useful to first understand the Wikibase JSON structure and how it is interpreted inside Ruby.
Here is a simple program which loads the export.json file into a Ruby variable using the JSON library, and then iterates over each item.
Inside of each property array, we may find further qualifiers
present. Qualifiers are stored in qualifiers
, at the same level as mainsnak
. Qualifiers are simply additional properties, but with a relationship to the parent property.
Note: to avoid nil errors, I wrap the qualifier loop in an if @qualifiers
## import Ruby functions
require 'json'
require 'csv'
require 'date'
require 'time'
require 'optparse'
dir = File.dirname __FILE__
importJSONfile = File.expand_path 'export.json', dir
## Load the import JSON file into a Ruby array
data = JSON.load_file importJSONfile
data.each do |item|
@id = item["id"]
@keys = item.keys # ["type", "id", "labels", "descriptions", "aliases", "claims", "sitelinks", "lastrevid"]
@claims = item["claims"]
puts @id
@claims.each_key do |property|
@propertyArray = @claims.dig(property)&.first
@propertyValue = @propertyArray.dig "mainsnak", "datavalue", "value"
puts "#{property} #{@propertyValue}"
@qualifiers = @propertyArray["qualifiers"]
if @qualifiers
@qualifiers.each_key do |qualifier|
puts "#{qualifier}"
@qualifierValue = @qualifiers.dig(qualifier)
p @qualifierValue
end
end
end
puts "--"
end
The resulting output reveals every property, the property value, the property's qualifiers (which are properties), and those qualifiers values.
Q1300
P10 Kitāb al-Majisṭī
P13
[{"snaktype"=>"value", "property"=>"P13", "hash"=>"6aaaba1010b58d17a08c359fcd378b5847a3b3ad", "datavalue"=>{"value"=>"كتاب المجسطي.", "type"=>"string"}, "datatype"=>"string"}]
P11
[{"snaktype"=>"value", "property"=>"P11", "hash"=>"138178f68c73783beffd21e5e3cd699e763d74a0", "datavalue"=>{"value"=>{"entity-type"=>"item", "numeric-id"=>1007, "id"=>"Q1007"}, "type"=>"wikibase-entityid"}, "datatype"=>"wikibase-item"}]
P12 Almagest.
P14 Ptolemy, active 2nd century
P15
[{"snaktype"=>"value", "property"=>"P15", "hash"=>"733791245a0cbbf63f406704649c533b5141f574", "datavalue"=>{"value"=>{"entity-type"=>"item", "numeric-id"=>35, "id"=>"Q35"}, "type"=>"wikibase-entityid"}, "datatype"=>"wikibase-item"}]
P13
[{"snaktype"=>"value", "property"=>"P13", "hash"=>"5479ba95a7bc7b1dbf37ab33cb37d2d4555458e6", "datavalue"=>{"value"=>"بطليموس، active 2nd century", "type"=>"string"}, "datatype"=>"string"}]
P17
[{"snaktype"=>"value", "property"=>"P17", "hash"=>"306e4a01682fb3824816d0feec2f4054d6612d28", "datavalue"=>{"value"=>{"entity-type"=>"item", "numeric-id"=>315, "id"=>"Q315"}, "type"=>"wikibase-entityid"}, "datatype"=>"wikibase-item"}]
P16 {"entity-type"=>"item", "numeric-id"=>3, "id"=>"Q3"}
P18 Tables (Data)
P20
[{"snaktype"=>"value", "property"=>"P20", "hash"=>"4a5b480b7cc0ec4adc458b3ac2e17c780bc328ab", "datavalue"=>{"value"=>{"entity-type"=>"item", "numeric-id"=>932, "id"=>"Q932"}, "type"=>"wikibase-entityid"}, "datatype"=>"wikibase-item"}]
P19 Astronomy--Early works to 1800
P20
[{"snaktype"=>"value", "property"=>"P20", "hash"=>"17bdacf5c4a6a35b34de9f49c7c2391f5feabffd", "datavalue"=>{"value"=>{"entity-type"=>"item", "numeric-id"=>936, "id"=>"Q936"}, "type"=>"wikibase-entityid"}, "datatype"=>"wikibase-item"}, {"snaktype"=>"value", "property"=>"P20", "hash"=>"f23c31b91ef91affe0ac7e35c5dd7609bb3b6293", "datavalue"=>{"value"=>{"entity-type"=>"item", "numeric-id"=>871, "id"=>"Q871"}, "type"=>"wikibase-entityid"}, "datatype"=>"wikibase-item"}]
P21 Arabic
P22
[{"snaktype"=>"value", "property"=>"P22", "hash"=>"181666f3d77963a519c980883fbd2eea2af5acac", "datavalue"=>{"value"=>{"entity-type"=>"item", "numeric-id"=>977, "id"=>"Q977"}, "type"=>"wikibase-entityid"}, "datatype"=>"wikibase-item"}]
P23 1381
P25
[{"snaktype"=>"value", "property"=>"P25", "hash"=>"d54951c1291e69356cc97ad03e0bcde559341150", "datavalue"=>{"value"=>{"time"=>"+1301-01-01T00:00:00Z", "timezone"=>0, "before"=>0, "after"=>0, "precision"=>7, "calendarmodel"=>"http://www.wikidata.org/entity/Q1985727"}, "type"=>"time"}, "datatype"=>"time"}]
P24
[{"snaktype"=>"value", "property"=>"P24", "hash"=>"fef71182886ed5107c20c98163e0aec0fc7f9163", "datavalue"=>{"value"=>{"entity-type"=>"item", "numeric-id"=>97, "id"=>"Q97"}, "type"=>"wikibase-entityid"}, "datatype"=>"wikibase-item"}]
P37
[{"snaktype"=>"value", "property"=>"P37", "hash"=>"fedc35d4d8af136ae0d47f136b3a2f69c2c3c61e", "datavalue"=>{"value"=>{"time"=>"+1381-01-01T00:00:00Z", "timezone"=>0, "before"=>0, "after"=>0, "precision"=>9, "calendarmodel"=>"http://www.wikidata.org/entity/Q1985727"}, "type"=>"time"}, "datatype"=>"time"}]
P36
[{"snaktype"=>"value", "property"=>"P36", "hash"=>"12359cf901f5463772d64fc1ab592df968701c79", "datavalue"=>{"value"=>{"time"=>"+1381-12-31T00:00:00Z", "timezone"=>0, "before"=>0, "after"=>0, "precision"=>9, "calendarmodel"=>"http://www.wikidata.org/entity/Q1985727"}, "type"=>"time"}, "datatype"=>"time"}]
P29 Extent: i, 174, i leaves : paper ; 280 x 215 (220 x 145) mm bound to 280 x 225 mm.
P3 {"entity-type"=>"item", "numeric-id"=>1299, "id"=>"Q1299"}
P30 paper
P31
[{"snaktype"=>"value", "property"=>"P31", "hash"=>"52d2d45e857d99db393dfd2f503c05de5b26a40c", "datavalue"=>{"value"=>{"entity-type"=>"item", "numeric-id"=>27, "id"=>"Q27"}, "type"=>"wikibase-entityid"}, "datatype"=>"wikibase-item"}]
P32 Many edges and corners of leaves mended with paper.
P34 {"time"=>"+2023-03-17T00:00:00Z", "timezone"=>0, "before"=>0, "after"=>0, "precision"=>11, "calendarmodel"=>"http://www.wikidata.org/entity/Q1985727"}
P35 {"time"=>"+2023-03-17T00:00:00Z", "timezone"=>0, "before"=>0, "after"=>0, "precision"=>11, "calendarmodel"=>"http://www.wikidata.org/entity/Q1985727"}
P41 https://colenda.library.upenn.edu/phalt/iiif/2/81431-p3ff3m111/manifest
Qualifier data is stored in datavalue
-value
, so let's add that to our loop. Because it is wrapped in [ ] brackets, we also have to use first
.
Code: p @qualifierValue&.first&.dig "datavalue", "value"
Output:
Q1300
P10 Kitāb al-Majisṭī
P13
"كتاب المجسطي."
P11
{"entity-type"=>"item", "numeric-id"=>1007, "id"=>"Q1007"}
P12 Almagest.
P14 Ptolemy, active 2nd century
P15
{"entity-type"=>"item", "numeric-id"=>35, "id"=>"Q35"}
P13
"بطليموس، active 2nd century"
P17
{"entity-type"=>"item", "numeric-id"=>315, "id"=>"Q315"}
P16 {"entity-type"=>"item", "numeric-id"=>3, "id"=>"Q3"}
P18 Tables (Data)
P20
{"entity-type"=>"item", "numeric-id"=>932, "id"=>"Q932"}
P19 Astronomy--Early works to 1800
P20
{"entity-type"=>"item", "numeric-id"=>936, "id"=>"Q936"}
P21 Arabic
P22
{"entity-type"=>"item", "numeric-id"=>977, "id"=>"Q977"}
P23 1381
P25
{"time"=>"+1301-01-01T00:00:00Z", "timezone"=>0, "before"=>0, "after"=>0, "precision"=>7, "calendarmodel"=>"http://www.wikidata.org/entity/Q1985727"}
P24
{"entity-type"=>"item", "numeric-id"=>97, "id"=>"Q97"}
P37
{"time"=>"+1381-01-01T00:00:00Z", "timezone"=>0, "before"=>0, "after"=>0, "precision"=>9, "calendarmodel"=>"http://www.wikidata.org/entity/Q1985727"}
P36
{"time"=>"+1381-12-31T00:00:00Z", "timezone"=>0, "before"=>0, "after"=>0, "precision"=>9, "calendarmodel"=>"http://www.wikidata.org/entity/Q1985727"}
P29 Extent: i, 174, i leaves : paper ; 280 x 215 (220 x 145) mm bound to 280 x 225 mm.
P3 {"entity-type"=>"item", "numeric-id"=>1299, "id"=>"Q1299"}
P30 paper
P31
{"entity-type"=>"item", "numeric-id"=>27, "id"=>"Q27"}
P32 Many edges and corners of leaves mended with paper.
P34 {"time"=>"+2023-03-17T00:00:00Z", "timezone"=>0, "before"=>0, "after"=>0, "precision"=>11, "calendarmodel"=>"http://www.wikidata.org/entity/Q1985727"}
P35 {"time"=>"+2023-03-17T00:00:00Z", "timezone"=>0, "before"=>0, "after"=>0, "precision"=>11, "calendarmodel"=>"http://www.wikidata.org/entity/Q1985727"}
P41 https://colenda.library.upenn.edu/phalt/iiif/2/81431-p3ff3m111/manifest
However, we also know that sometimes there is MORE THAN ONE occurrence of a property. At the time of writing, this only occurs in qualifiers of properties, not in item properties. But even if 99% of the data is 1:1, we have to catch the times when there are multiple values.