-
Notifications
You must be signed in to change notification settings - Fork 2
Wikibase to Solr #5: claims
To understand the Wikibase to Solr script, it is useful to first understand the Wikibase JSON structure and how it is interpreted inside Ruby.
Here is a simple program which loads the export.json file into a Ruby variable using the JSON library, and then iterates over each item.
Breaking down example #4, we explore claims
and why we need to "dig" in a certain way.
## import Ruby functions
require 'json'
require 'csv'
require 'date'
require 'time'
require 'optparse'
dir = File.dirname __FILE__
importJSONfile = File.expand_path 'export.json', dir
## Load the import JSON file into a Ruby array
data = JSON.load_file importJSONfile
data.each do |item|
@id = item["id"]
@keys = item.keys # ["type", "id", "labels", "descriptions", "aliases", "claims", "sitelinks", "lastrevid"]
@claims = item["claims"]
@claims_keys = @claims.keys
puts "#{@id} #{@claims_keys}"
end
The output will look like this:
Q1 []
Q2 []
Q3 []
Q4 []
Q5 []
Q6 []
Q7 []
Q8 ["P47"]
Q9 ["P47"]
...
Q1298 ["P16", "P38", "P5", "P6", "P7", "P8", "P9"]
Q1299 ["P1", "P16", "P2"]
Q1300 ["P10", "P12", "P14", "P16", "P18", "P19", "P21", "P23", "P29", "P3", "P30", "P32", "P34", "P35", "P41"]
As you can see, there are an arbitrary number of keys within each item-claims
array, including some items with no keys.
If we explore a particular property, in this case adding P16 to the above script:
@P16 = @claims["P16"]
p @P16
The result will be:
Q1300 ["P10", "P12", "P14", "P16", "P18", "P19", "P21", "P23", "P29", "P3", "P30", "P32", "P34", "P35", "P41"]
[{"mainsnak"=>{"snaktype"=>"value", "property"=>"P16", "datavalue"=>{"value"=>{"entity-type"=>"item", "numeric-id"=>3, "id"=>"Q3"}, "type"=>"wikibase-entityid"}, "datatype"=>"wikibase-item"}, "type"=>"statement", "id"=>"Q1300$AF18C2BA-0E76-4882-AE53-90450F328BA4", "rank"=>"normal"}]
Because the contents of item-claims-P16
is wrapped in [ ] brackets, we can't directly extract the keys.
wikibase4.rb:21:in `block in <main>': undefined method `keys' for nil:NilClass (NoMethodError)
p @P16.keys
^^^^^
from wikibase4.rb:14:in `each'
from wikibase4.rb:14:in `<main>'
Additionally, because some items don't have a P16 claim, we can't use first without causing errors either.
wikibase4.rb:21:in `block in <main>': undefined method `keys' for nil:NilClass (NoMethodError)
p @P16.first
^^^^^
from wikibase4.rb:14:in `each'
from wikibase4.rb:14:in `<main>'
And to further complicate things, in special cases a property may contain multiple arrays, representing multiple values for the same property. An example of this will come later.