Skip to content

Wikibase to Solr #5: claims

Human Experience Systems LLC edited this page Apr 27, 2023 · 7 revisions

To understand the Wikibase to Solr script, it is useful to first understand the Wikibase JSON structure and how it is interpreted inside Ruby.

Here is a simple program which loads the export.json file into a Ruby variable using the JSON library, and then iterates over each item.

Breaking down example #4, we explore claims and why we need to "dig" in a certain way.

## import Ruby functions
require 'json'
require 'csv'
require 'date'
require 'time'
require 'optparse'

dir = File.dirname __FILE__
importJSONfile = File.expand_path 'export.json', dir

## Load the import JSON file into a Ruby array
data = JSON.load_file importJSONfile

data.each do |item|
  @id = item["id"]
  @keys = item.keys          # ["type", "id", "labels", "descriptions", "aliases", "claims", "sitelinks", "lastrevid"]
  @claims = item["claims"]
  @claims_keys = @claims.keys
  puts "#{@id} #{@claims_keys}"
end

The output will look like this:

Q1 []
Q2 []
Q3 []
Q4 []
Q5 []
Q6 []
Q7 []
Q8 ["P47"]
Q9 ["P47"]
...
Q1298 ["P16", "P38", "P5", "P6", "P7", "P8", "P9"]
Q1299 ["P1", "P16", "P2"]
Q1300 ["P10", "P12", "P14", "P16", "P18", "P19", "P21", "P23", "P29", "P3", "P30", "P32", "P34", "P35", "P41"]

As you can see, there are an arbitrary number of keys within each item-claims array, including some items with no keys.

If we explore a particular property, in this case adding P16 to the above script:

@P16 = @claims["P16"]
p @P16

The result will be:

Q1300 ["P10", "P12", "P14", "P16", "P18", "P19", "P21", "P23", "P29", "P3", "P30", "P32", "P34", "P35", "P41"]
[{"mainsnak"=>{"snaktype"=>"value", "property"=>"P16", "datavalue"=>{"value"=>{"entity-type"=>"item", "numeric-id"=>3, "id"=>"Q3"}, "type"=>"wikibase-entityid"}, "datatype"=>"wikibase-item"}, "type"=>"statement", "id"=>"Q1300$AF18C2BA-0E76-4882-AE53-90450F328BA4", "rank"=>"normal"}]

Because the contents of item-claims-P16 is wrapped in [ ] brackets, we can't directly extract the keys.

wikibase4.rb:21:in `block in <main>': undefined method `keys' for nil:NilClass (NoMethodError)
p @P16.keys
      ^^^^^
from wikibase4.rb:14:in `each'
from wikibase4.rb:14:in `<main>'

Additionally, because some items don't have a P16 claim, we can't use first without causing errors either.

wikibase4.rb:21:in `block in <main>': undefined method `keys' for nil:NilClass (NoMethodError)
p @P16.first
      ^^^^^
from wikibase4.rb:14:in `each'
from wikibase4.rb:14:in `<main>'

And to further complicate things, in special cases a property may contain multiple arrays, representing multiple values for the same property. An example of this will come later.