Skip to content

Wikibase to Solr #9: qualifier values

Human Experience Systems LLC edited this page Apr 27, 2023 · 3 revisions

To understand the Wikibase to Solr script, it is useful to first understand the Wikibase JSON structure and how it is interpreted inside Ruby.

Here is a simple program which loads the export.json file into a Ruby variable using the JSON library, and then iterates over each item.

Inside of each property array, we may find further qualifiers present. Qualifiers are stored in qualifiers, at the same level as mainsnak. Qualifiers are simply additional properties, but with a relationship to the parent property.

Note: to avoid nil errors, I wrap the qualifier loop in an if @qualifiers

## import Ruby functions
require 'json'
require 'csv'
require 'date'
require 'time'
require 'optparse'

dir = File.dirname __FILE__
importJSONfile = File.expand_path 'export.json', dir

## Load the import JSON file into a Ruby array
data = JSON.load_file importJSONfile

data.each do |item|
  @id = item["id"]
  @keys = item.keys          # ["type", "id", "labels", "descriptions", "aliases", "claims", "sitelinks", "lastrevid"]
  @claims = item["claims"]

  puts @id
  @claims.each_key do |property|

    @propertyArray = @claims.dig(property)&.first
    @propertyValue = @propertyArray.dig "mainsnak", "datavalue", "value"
    puts "#{property} #{@propertyValue}"

    @qualifiers = @propertyArray["qualifiers"]
    if @qualifiers
        @qualifiers.each_key do |qualifier|
            puts "#{qualifier}"
            @qualifierValue = @qualifiers.dig(qualifier)
            p @qualifierValue
        end
    end
   
  end 
  puts "--"

end

The resulting output reveals every property, the property value, the property's qualifiers (which are properties), and those qualifiers values.

Q1300
P10 Kitāb al-Majisṭī
P13
[{"snaktype"=>"value", "property"=>"P13", "hash"=>"6aaaba1010b58d17a08c359fcd378b5847a3b3ad", "datavalue"=>{"value"=>"كتاب المجسطي.", "type"=>"string"}, "datatype"=>"string"}]
P11
[{"snaktype"=>"value", "property"=>"P11", "hash"=>"138178f68c73783beffd21e5e3cd699e763d74a0", "datavalue"=>{"value"=>{"entity-type"=>"item", "numeric-id"=>1007, "id"=>"Q1007"}, "type"=>"wikibase-entityid"}, "datatype"=>"wikibase-item"}]
P12 Almagest.
P14 Ptolemy, active 2nd century
P15
[{"snaktype"=>"value", "property"=>"P15", "hash"=>"733791245a0cbbf63f406704649c533b5141f574", "datavalue"=>{"value"=>{"entity-type"=>"item", "numeric-id"=>35, "id"=>"Q35"}, "type"=>"wikibase-entityid"}, "datatype"=>"wikibase-item"}]
P13
[{"snaktype"=>"value", "property"=>"P13", "hash"=>"5479ba95a7bc7b1dbf37ab33cb37d2d4555458e6", "datavalue"=>{"value"=>"بطليموس، active 2nd century", "type"=>"string"}, "datatype"=>"string"}]
P17
[{"snaktype"=>"value", "property"=>"P17", "hash"=>"306e4a01682fb3824816d0feec2f4054d6612d28", "datavalue"=>{"value"=>{"entity-type"=>"item", "numeric-id"=>315, "id"=>"Q315"}, "type"=>"wikibase-entityid"}, "datatype"=>"wikibase-item"}]
P16 {"entity-type"=>"item", "numeric-id"=>3, "id"=>"Q3"}
P18 Tables (Data)
P20
[{"snaktype"=>"value", "property"=>"P20", "hash"=>"4a5b480b7cc0ec4adc458b3ac2e17c780bc328ab", "datavalue"=>{"value"=>{"entity-type"=>"item", "numeric-id"=>932, "id"=>"Q932"}, "type"=>"wikibase-entityid"}, "datatype"=>"wikibase-item"}]
P19 Astronomy--Early works to 1800
P20
[{"snaktype"=>"value", "property"=>"P20", "hash"=>"17bdacf5c4a6a35b34de9f49c7c2391f5feabffd", "datavalue"=>{"value"=>{"entity-type"=>"item", "numeric-id"=>936, "id"=>"Q936"}, "type"=>"wikibase-entityid"}, "datatype"=>"wikibase-item"}, {"snaktype"=>"value", "property"=>"P20", "hash"=>"f23c31b91ef91affe0ac7e35c5dd7609bb3b6293", "datavalue"=>{"value"=>{"entity-type"=>"item", "numeric-id"=>871, "id"=>"Q871"}, "type"=>"wikibase-entityid"}, "datatype"=>"wikibase-item"}]
P21 Arabic
P22
[{"snaktype"=>"value", "property"=>"P22", "hash"=>"181666f3d77963a519c980883fbd2eea2af5acac", "datavalue"=>{"value"=>{"entity-type"=>"item", "numeric-id"=>977, "id"=>"Q977"}, "type"=>"wikibase-entityid"}, "datatype"=>"wikibase-item"}]
P23 1381
P25
[{"snaktype"=>"value", "property"=>"P25", "hash"=>"d54951c1291e69356cc97ad03e0bcde559341150", "datavalue"=>{"value"=>{"time"=>"+1301-01-01T00:00:00Z", "timezone"=>0, "before"=>0, "after"=>0, "precision"=>7, "calendarmodel"=>"http://www.wikidata.org/entity/Q1985727"}, "type"=>"time"}, "datatype"=>"time"}]
P24
[{"snaktype"=>"value", "property"=>"P24", "hash"=>"fef71182886ed5107c20c98163e0aec0fc7f9163", "datavalue"=>{"value"=>{"entity-type"=>"item", "numeric-id"=>97, "id"=>"Q97"}, "type"=>"wikibase-entityid"}, "datatype"=>"wikibase-item"}]
P37
[{"snaktype"=>"value", "property"=>"P37", "hash"=>"fedc35d4d8af136ae0d47f136b3a2f69c2c3c61e", "datavalue"=>{"value"=>{"time"=>"+1381-01-01T00:00:00Z", "timezone"=>0, "before"=>0, "after"=>0, "precision"=>9, "calendarmodel"=>"http://www.wikidata.org/entity/Q1985727"}, "type"=>"time"}, "datatype"=>"time"}]
P36
[{"snaktype"=>"value", "property"=>"P36", "hash"=>"12359cf901f5463772d64fc1ab592df968701c79", "datavalue"=>{"value"=>{"time"=>"+1381-12-31T00:00:00Z", "timezone"=>0, "before"=>0, "after"=>0, "precision"=>9, "calendarmodel"=>"http://www.wikidata.org/entity/Q1985727"}, "type"=>"time"}, "datatype"=>"time"}]
P29 Extent: i, 174, i leaves : paper ; 280 x 215 (220 x 145) mm bound to 280 x 225 mm.
P3 {"entity-type"=>"item", "numeric-id"=>1299, "id"=>"Q1299"}
P30 paper
P31
[{"snaktype"=>"value", "property"=>"P31", "hash"=>"52d2d45e857d99db393dfd2f503c05de5b26a40c", "datavalue"=>{"value"=>{"entity-type"=>"item", "numeric-id"=>27, "id"=>"Q27"}, "type"=>"wikibase-entityid"}, "datatype"=>"wikibase-item"}]
P32 Many edges and corners of leaves mended with paper.
P34 {"time"=>"+2023-03-17T00:00:00Z", "timezone"=>0, "before"=>0, "after"=>0, "precision"=>11, "calendarmodel"=>"http://www.wikidata.org/entity/Q1985727"}
P35 {"time"=>"+2023-03-17T00:00:00Z", "timezone"=>0, "before"=>0, "after"=>0, "precision"=>11, "calendarmodel"=>"http://www.wikidata.org/entity/Q1985727"}
P41 https://colenda.library.upenn.edu/phalt/iiif/2/81431-p3ff3m111/manifest

Qualifier data is stored in datavalue-value, so let's add that to our loop. Because it is wrapped in [ ] brackets, we also have to use first.

Code: p @qualifierValue&.first&.dig "datavalue", "value"

Output:

Q1300
P10 Kitāb al-Majisṭī
P13
"كتاب المجسطي."
P11
{"entity-type"=>"item", "numeric-id"=>1007, "id"=>"Q1007"}
P12 Almagest.
P14 Ptolemy, active 2nd century
P15
{"entity-type"=>"item", "numeric-id"=>35, "id"=>"Q35"}
P13
"بطليموس، active 2nd century"
P17
{"entity-type"=>"item", "numeric-id"=>315, "id"=>"Q315"}
P16 {"entity-type"=>"item", "numeric-id"=>3, "id"=>"Q3"}
P18 Tables (Data)
P20
{"entity-type"=>"item", "numeric-id"=>932, "id"=>"Q932"}
P19 Astronomy--Early works to 1800
P20
{"entity-type"=>"item", "numeric-id"=>936, "id"=>"Q936"}
P21 Arabic
P22
{"entity-type"=>"item", "numeric-id"=>977, "id"=>"Q977"}
P23 1381
P25
{"time"=>"+1301-01-01T00:00:00Z", "timezone"=>0, "before"=>0, "after"=>0, "precision"=>7, "calendarmodel"=>"http://www.wikidata.org/entity/Q1985727"}
P24
{"entity-type"=>"item", "numeric-id"=>97, "id"=>"Q97"}
P37
{"time"=>"+1381-01-01T00:00:00Z", "timezone"=>0, "before"=>0, "after"=>0, "precision"=>9, "calendarmodel"=>"http://www.wikidata.org/entity/Q1985727"}
P36
{"time"=>"+1381-12-31T00:00:00Z", "timezone"=>0, "before"=>0, "after"=>0, "precision"=>9, "calendarmodel"=>"http://www.wikidata.org/entity/Q1985727"}
P29 Extent: i, 174, i leaves : paper ; 280 x 215 (220 x 145) mm bound to 280 x 225 mm.
P3 {"entity-type"=>"item", "numeric-id"=>1299, "id"=>"Q1299"}
P30 paper
P31
{"entity-type"=>"item", "numeric-id"=>27, "id"=>"Q27"}
P32 Many edges and corners of leaves mended with paper.
P34 {"time"=>"+2023-03-17T00:00:00Z", "timezone"=>0, "before"=>0, "after"=>0, "precision"=>11, "calendarmodel"=>"http://www.wikidata.org/entity/Q1985727"}
P35 {"time"=>"+2023-03-17T00:00:00Z", "timezone"=>0, "before"=>0, "after"=>0, "precision"=>11, "calendarmodel"=>"http://www.wikidata.org/entity/Q1985727"}
P41 https://colenda.library.upenn.edu/phalt/iiif/2/81431-p3ff3m111/manifest

However, we also know that sometimes there is MORE THAN ONE occurrence of a property. At the time of writing, this only occurs in qualifiers of properties, not in item properties. But even if 99% of the data is 1:1, we have to catch the times when there are multiple values.