Skip to content

Wikibase to Solr #10: multiple values

Human Experience Systems LLC edited this page Apr 27, 2023 · 6 revisions

To understand the Wikibase to Solr script, it is useful to first understand the Wikibase JSON structure and how it is interpreted inside Ruby.

Here is a simple program which loads the export.json file into a Ruby variable using the JSON library, and then iterates over each item.

Inside of each property array, we may find further qualifiers present. Qualifiers are stored in qualifiers, at the same level as mainsnak. Qualifiers are simply additional properties, but with a relationship to the parent property.

In any given property or qualifier, there may be multiple values present.

## import Ruby functions
require 'json'
require 'csv'
require 'date'
require 'time'
require 'optparse'

dir = File.dirname __FILE__
importJSONfile = File.expand_path 'export.json', dir

## Load the import JSON file into a Ruby array
data = JSON.load_file importJSONfile

data.each do |item|
  @id = item["id"]
  @keys = item.keys          # ["type", "id", "labels", "descriptions", "aliases", "claims", "sitelinks", "lastrevid"]
  @claims = item["claims"]

  puts @id
  @claims.each_key do |property|

    @propertyArray = @claims.dig(property)
    @propertyValue = @propertyArray&.first&.dig "mainsnak", "datavalue", "value"
    @propertyArrayLength = @propertyArray.length

    puts "#{property} contains #{@propertyArrayLength} values"
    puts "1st value: #{@propertyValue}"

    @qualifiers = @propertyArray&.first&.dig("qualifiers")
    if @qualifiers
        @qualifiers.each_key do |qualifier|
            @qualifierArray = @qualifiers.dig(qualifier)
            @qualifierArrayLength = @qualifierArray.length
            @qualifierValue = @qualifierArray&.first&.dig "datavalue", "value"

            puts "Q#{qualifier} contains #{@qualifierArrayLength} values"
            puts "1st value: #{@qualifierValue}"
        end
    end
   
  end 
  puts "--"

end

The output reveals the key count for each property and qualifier.

Q1300
P10 contains 1 values
1st value: Kitāb al-Majisṭī
QP13 contains 1 values
1st value: كتاب المجسطي.
QP11 contains 1 values
1st value: {"entity-type"=>"item", "numeric-id"=>1007, "id"=>"Q1007"}
P12 contains 1 values
1st value: Almagest.
P14 contains 5 values
1st value: Ptolemy, active 2nd century
QP15 contains 1 values
1st value: {"entity-type"=>"item", "numeric-id"=>35, "id"=>"Q35"}
QP13 contains 1 values
1st value: بطليموس، active 2nd century
QP17 contains 1 values
1st value: {"entity-type"=>"item", "numeric-id"=>315, "id"=>"Q315"}
P16 contains 1 values
1st value: {"entity-type"=>"item", "numeric-id"=>3, "id"=>"Q3"}
P18 contains 9 values
1st value: Tables (Data)
QP20 contains 1 values
1st value: {"entity-type"=>"item", "numeric-id"=>932, "id"=>"Q932"}
P19 contains 2 values
1st value: Astronomy--Early works to 1800
QP20 contains 2 values
1st value: {"entity-type"=>"item", "numeric-id"=>936, "id"=>"Q936"}
P21 contains 1 values
1st value: Arabic
QP22 contains 1 values
1st value: {"entity-type"=>"item", "numeric-id"=>977, "id"=>"Q977"}
P23 contains 1 values
1st value: 1381
QP25 contains 1 values
1st value: {"time"=>"+1301-01-01T00:00:00Z", "timezone"=>0, "before"=>0, "after"=>0, "precision"=>7, "calendarmodel"=>"http://www.wikidata.org/entity/Q1985727"}
QP24 contains 1 values
1st value: {"entity-type"=>"item", "numeric-id"=>97, "id"=>"Q97"}
QP37 contains 1 values
1st value: {"time"=>"+1381-01-01T00:00:00Z", "timezone"=>0, "before"=>0, "after"=>0, "precision"=>9, "calendarmodel"=>"http://www.wikidata.org/entity/Q1985727"}
QP36 contains 1 values
1st value: {"time"=>"+1381-12-31T00:00:00Z", "timezone"=>0, "before"=>0, "after"=>0, "precision"=>9, "calendarmodel"=>"http://www.wikidata.org/entity/Q1985727"}
P29 contains 1 values
1st value: Extent: i, 174, i leaves : paper ; 280 x 215 (220 x 145) mm bound to 280 x 225 mm.
P3 contains 1 values
1st value: {"entity-type"=>"item", "numeric-id"=>1299, "id"=>"Q1299"}
P30 contains 1 values
1st value: paper
QP31 contains 1 values
1st value: {"entity-type"=>"item", "numeric-id"=>27, "id"=>"Q27"}
P32 contains 3 values
1st value: Many edges and corners of leaves mended with paper.
P34 contains 1 values
1st value: {"time"=>"+2023-03-17T00:00:00Z", "timezone"=>0, "before"=>0, "after"=>0, "precision"=>11, "calendarmodel"=>"http://www.wikidata.org/entity/Q1985727"}
P35 contains 1 values
1st value: {"time"=>"+2023-03-17T00:00:00Z", "timezone"=>0, "before"=>0, "after"=>0, "precision"=>11, "calendarmodel"=>"http://www.wikidata.org/entity/Q1985727"}
P41 contains 1 values
1st value: https://colenda.library.upenn.edu/phalt/iiif/2/81431-p3ff3m111/manifest
--

So each_key may have a single instance or multiple. We can't use each instead of each_key because we run into the same problem (you can go ahead and try it). Therefore, we have to insert a qualifier loop, inside the property loop, which is inside the claims loop, which is inside the item loop.

## import Ruby functions
require 'json'
require 'csv'
require 'date'
require 'time'
require 'optparse'

dir = File.dirname __FILE__
importJSONfile = File.expand_path 'export.json', dir

## Load the import JSON file into a Ruby array
data = JSON.load_file importJSONfile

data.each do |item|
  @id = item["id"]
  @keys = item.keys          # ["type", "id", "labels", "descriptions", "aliases", "claims", "sitelinks", "lastrevid"]
  @claims = item["claims"]

  puts @id
  @claims.each_key do |property|

    @propertyArray = @claims.dig(property)
    puts "#{property} contains:"

    @propertyArray.each do |propertyInstance|

        @propertyMainsnakDatavalueValue = propertyInstance.dig "mainsnak", "datavalue", "value"
        puts "  #{@propertyMainsnakDatavalueValue}"
    
        @qualifiers = propertyInstance&.dig("qualifiers")
        if @qualifiers

            @qualifiers.each_key do |qualifier|

                @qualifierArray = @qualifiers.dig(qualifier)
                puts "  >> Q#{qualifier} contains"

                @qualifierArray.each do |qualifierInstance|

                    @qualifierDatavalueValue = qualifierInstance&.dig "datavalue", "value"
                    puts "    #{@qualifierDatavalueValue}"

                end
            end
        end
    end
   
  end 
  puts "--"

end

Output: P18 contains: Tables (Data) >> QP20 contains {"entity-type"=>"item", "numeric-id"=>932, "id"=>"Q932"} Tables (Data) >> QP20 contains {"entity-type"=>"item", "numeric-id"=>932, "id"=>"Q932"} Annotations (Provenance) Manuscripts, Arabic >> QP20 contains {"entity-type"=>"item", "numeric-id"=>771, "id"=>"Q771"} Manuscripts, 14th century >> QP20 contains {"entity-type"=>"item", "numeric-id"=>819, "id"=>"Q819"} {"entity-type"=>"item", "numeric-id"=>830, "id"=>"Q830"} Treatises >> QP20 contains {"entity-type"=>"item", "numeric-id"=>883, "id"=>"Q883"} Codices (bound manuscripts) >> QP20 contains {"entity-type"=>"item", "numeric-id"=>813, "id"=>"Q813"} Diagrams >> QP20 contains {"entity-type"=>"item", "numeric-id"=>884, "id"=>"Q884"} Manuscripts, Medieval >> QP20 contains {"entity-type"=>"item", "numeric-id"=>888, "id"=>"Q888"} P19 contains: Astronomy--Early works to 1800 >> QP20 contains {"entity-type"=>"item", "numeric-id"=>936, "id"=>"Q936"} {"entity-type"=>"item", "numeric-id"=>871, "id"=>"Q871"} Astronomy >> QP20 contains {"entity-type"=>"item", "numeric-id"=>936, "id"=>"Q936"} P21 contains: Arabic >> QP22 contains {"entity-type"=>"item", "numeric-id"=>977, "id"=>"Q977"} P23 contains: 1381 >> QP25 contains {"time"=>"+1301-01-01T00:00:00Z", "timezone"=>0, "before"=>0, "after"=>0, "precision"=>7, "calendarmodel"=>"http://www.wikidata.org/entity/Q1985727"} >> QP24 contains {"entity-type"=>"item", "numeric-id"=>97, "id"=>"Q97"} >> QP37 contains {"time"=>"+1381-01-01T00:00:00Z", "timezone"=>0, "before"=>0, "after"=>0, "precision"=>9, "calendarmodel"=>"http://www.wikidata.org/entity/Q1985727"} >> QP36 contains {"time"=>"+1381-12-31T00:00:00Z", "timezone"=>0, "before"=>0, "after"=>0, "precision"=>9, "calendarmodel"=>"http://www.wikidata.org/entity/Q1985727"} P29 contains: Extent: i, 174, i leaves : paper ; 280 x 215 (220 x 145) mm bound to 280 x 225 mm. P3 contains: {"entity-type"=>"item", "numeric-id"=>1299, "id"=>"Q1299"} P30 contains: paper >> QP31 contains {"entity-type"=>"item", "numeric-id"=>27, "id"=>"Q27"} P32 contains: Many edges and corners of leaves mended with paper. Title from title page (f. 1r). Ms. codex. P34 contains: {"time"=>"+2023-03-17T00:00:00Z", "timezone"=>0, "before"=>0, "after"=>0, "precision"=>11, "calendarmodel"=>"http://www.wikidata.org/entity/Q1985727"} P35 contains: {"time"=>"+2023-03-17T00:00:00Z", "timezone"=>0, "before"=>0, "after"=>0, "precision"=>11, "calendarmodel"=>"http://www.wikidata.org/entity/Q1985727"} P41 contains: https://colenda.library.upenn.edu/phalt/iiif/2/81431-p3ff3m111/manifest --