A (non-exhaustive) overview of how Wikidata is used by/in/for both the linked open datasets (thesauri) and public domain heritage collections of the KB, national library of the Netherlands.
Latest update: 21 November 2023
This page is a textual summary of the course Verdieping: Wikidata & de KB for employeees of KB, national library of the Netherlands on 14 November 2023, 15:00-16:15.
- The (rather long) full slidedeck (in Dutch) for this course is available on Wikimedia Commons and Zenodo as PDFs.
- This course builds upon the general overview of the Wikidata universe, an introduction course for newcomers to Wikidata.
- The two videos (in Dutch) of the course are on YouTube: Part 1 (covering blocks 1 & 2) and Part 2 (covering blocks 3 & 4)
- A full text transcript of the course in Dutch is also available.
See also
- An overview of current, upcoming and possible future Wikidata (and Wikibase) activities, projects, ideas, experiments and opportunities for the KB
- Our Structured Data on Commons efforts
- Intro
- BLOCK 1 - What does Wikidata add for the KB?
- BLOCK 2 - Wikidata & KB thesauri (NTA + DBNLa)
- BLOCK 3 - Intermezzo: Linking Wikimedia Commons with Wikidata
- BLOCK 4 - Wikidata & KB heritage collections
- Contact
- Reuse and licensing
Table of contents generated with markdown-toc
See the course Wegwijzer in Wikidata (Introduction to Wikidata), June 6, 2023 (in Dutch)
- Slides on Zenodo and Wikimedia Commons (PDF)
- Textual summary in Dutch (PDF)
- Textual summary in English: https://github.com/KBNLwikimedia/Wikidata-General-Overview/
To provide more understanding about
- Why we use Wikidata at KB
- How we use Wikidata for KB thesauri & heritage collections
- What value this adds for KB
- BLOCK 1) What does Wikidata add for the KB?
- BLOCK 2) Wikidata & KB thesauri (NTA + DBNLa)
- BLOCK 3) Intermezzo: Linking Wikimedia Commons with Wikidata
- BLOCK 4) Wikidata & KB heritage collections
(Captain Obvious mode) For KB & its services: Be findable in Google - Be present on Facebook - Be present on Instagram - Be present on YouTube - Be present on Twitter. --> Summary (open door): Be present on the large (web-scale) platforms
So also open doors:
- Add your collection knowledge to Wikipedia
- Add your collection images to Wikimedia Commons
- Add your collection data to Wikidata
Wikidata is one of the largest and most popular LOD platforms in the world.
Characteristics:
- Central part of the (web-scale) Wikimedia infrastructure (Wikipedia, Commons, 700+ Wikimedia platforms)
- Free, public utility for data (no IT costs)
- Centralized, no data silos, 1 language (w.r.t. SPARQL and API calling)
- Global scope, (much) broader than KB/library/heritage/Netherlands domain
- Connection point for 8330+ external databases worldwide
- Multilingual, language independent, 300+ languages
- Collaborative --> International community, 25K+ content creators
- For humans (GUI) and machines (API, SPARQL, JSON, RDF, Python etc.)
- LOD, the least scary of all LOD platforms --> Understandable & warm, thanks to community!
- No copyright on data (CC0)
- Strong growing, positive outlook & sustainable
Effective result: advantages of scale and community & network effects
What values does Wikidata add for the KB & its services?
- Increased visibility, findability and reusability of our collections
- Greater public reach of KB collections, worldwide
- KB data in cross-domain, global, multilingual context --> Increasing interoperability KB with the outside world
- Community: External expertise, skills, tools and enthusiasm to enrich & connect KB data
- New functionalities for our data (and images) --> See block 4
- Functionalities that we do not or cannot offer in our own KB services
- Regarding Search, Data enrichment, data quality control, data visualization and data formats, Image metadata, Machine interactions
- Both for our thesauri and heritage collections
- For people and machines
- 'KB collections as LEGO'
- Toolkit & platform to create and publish new KB LOD
- Internal KB LOD renewal process is not yet delivering public results
- Developing and sharing knowledge & skills related to LOD
- Both internally and externally
- Strengthening our cooperation with KB network partners via Wikidata/media
KB datasets (thesauri): http://data.bibliotheken.nl/
- Persons (authors) are more popular and in demand on Wikidata than (eg.) keywords or organizations
- Dutch Thesaurus of Author names (NTA) + Thesaurus of DBNL authors (DBNLa) are more useful than Brinkman or GTT
- NTA is internationally the only major authoritative dataset on 'Dutch authors'
- NTA is very useful for Wikidata, in an international context
- Flat/simple data is more suitable than layered/complex data
- Small datasets are easier than large ones
- Alba amicorum and catchpenny prints are suitable for Wikidata
Ergo: Focus on NTA and DBNL authors with regard to the KB thesauri-Wikidata activities.
Persons in the NTA with a Wikidata URI:
- Eg. Darlene Dixon : http://data.bibliotheken.nl/doc/thes/p208140131 --> schema:sameAs --> http://www.wikidata.org/entity/Q88505402
- All persons via this SPARQL query
# Which NTA items have a link to Wikidata?
SELECT * WHERE {
?nta schema:mainEntityOfPage/schema:isPartOf <http://data.bibliotheken.nl/id/dataset/persons> .
?nta rdfs:label ?ntaLabel.
?nta schema:sameAs ?wikidata .
FILTER(regex(?wikidata, 'wikidata', 'i'))
} LIMIT 1000
- 499K of 2.75M NTA items have a Wikidata link (source)
Persons in DBNLa with a Wikidata URI (via the NTA)
- Eg. Hans Aarsman (1951-) : http://data.bibliotheken.nl/id/dbnla/aars001 --> owl:sameAs --> http://data.bibliotheken.nl/id/thes/p068680937 --> schema:sameAs --> http://www.wikidata.org/entity/Q325922
- All persons via this SPARQL query
# Which DBNLa authors have a link to Wikidata?
SELECT *
WHERE {
?dbnl schema:mainEntityOfPage/schema:isPartOf <http://data.bibliotheken.nl/id/dataset/dbnla> .
?dbnl rdfs:label ?dbnlLabel.
?dbnl owl:sameAs ?nta .
?nta schema:mainEntityOfPage/schema:isPartOf <http://data.bibliotheken.nl/id/dataset/persons> .
?nta rdfs:label ?ntaLabel.
?nta schema:sameAs ?wikidata .
FILTER(regex(?wikidata, 'wikidata', 'i'))
} LIMIT 1000
- 14.5K of 109K DBNLa items have a Wikidata link (source)
- Achille Van Acker 'acke001' in the DBNLa and in Wikidata
- Get additional data about 'acke001' from Wikidata. We want to retrieve the following data from the Wikidata item:
- We use this SPARQL query
# Get supplementary data about DBNL author 'acke001' from Wikidata
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
SELECT *
WHERE {
?dbnl schema:mainEntityOfPage/owl:sameAs <http://data.bibliotheken.nl/doc/dbnla/acke001> .
?dbnl rdfs:label ?dbnlLabel.
?dbnl owl:sameAs ?nta .
?nta schema:mainEntityOfPage/schema:isPartOf <http://data.bibliotheken.nl/id/dataset/persons> .
?nta rdfs:label ?ntaLabel.
?nta schema:sameAs ?wikidata .
FILTER(regex(?wikidata, 'wikidata', 'i'))
SERVICE <https://query.wikidata.org/sparql> {
?wikidata wdt:P18 ?imageURL. #P18 = image
?wikidata wdt:P69 ?edcucatedAt. #P69 = educated at
?wikidata wdt:P102 ?MemberOfPoliticalParty. #P102 = member of political party
}
}
Checks are OK:
- P18 (image): Achiel Van Acker1.jpg
- P69 (educated at): Lille University of Science and Technology
- P102 (member of political party): Belgian Socialist Party
Persons in Wikidata with an NTA id
- P1006 = Nationale Thesaurus voor Auteursnamen ID
- Eg. Harry Mulisch : https://www.wikidata.org/wiki/Q927#P1006 --> P1006 -- > https://data.bibliotheken.nl/doc/thes/p06854796X
- All persons via this SPARQL query
SELECT ?item ?itemLabel ?NTAurl
{
?item wdt:P1006 ?NTAid.
BIND(IRI(CONCAT('http://data.bibliotheken.nl/doc/thes/p', ?NTAid)) AS ?NTAurl)
SERVICE wikibase:label { bd:serviceParam wikibase:language "nl,en" }
}
LIMIT 1000
https://www.wikidata.org/wiki/Property_talk:P1006
- Wikidata contains 550K links to the NTA: see 'Current uses' at bottom of this page, or via this SPARQL query
- Map of birthplaces of people with an NTA id: https://w.wiki/7rsT
- Famous people with an NTA id: https://w.wiki/85si (famous people have extensive Wikidata entries) with many statements
Two pages provide insight into the data quality (and possible improvements) of both Wikidata and the NTA
- Missing data for people listed in the NTA: Database_reports/Humans_with_missing_claims/P1006
- Deviations and possible errors in Wikidata as well as NTA: Database_reports/Constraint_violations/P1006
For example:
- Missing birth date
- https://www.wikidata.org/wiki/Wikidata:Database_reports/Humans_with_missing_claims/P1006#Missing_date_of_birth_(P569)
- The missing dates of birth may be added to Wikidata from the NTA
- Missing Dutch labels
- https://www.wikidata.org/wiki/Wikidata:Database_reports/Constraint_violations/P1006#%22Label_in_'nl'_language%22_violations
- Via SPARQL: https://w.wiki/85xT
- E.g.: Anna Bhau Sathe, https://www.wikidata.org/wiki/Q55759 --> NL label is missing
- The missing NL label can be added from the NTA
- The same NTA id appears in multiple Q items
- https://www.wikidata.org/wiki/Wikidata:Database_reports/Constraint_violations/P1006#Unique_value
- Via SPARQL: https://w.wiki/85zm
- E.g. Andreas Kaiser: https://data.bibliotheken.nl/doc/thes/p068685564 appears in both Q498631 (error) and in Q106361537 (good)
- Q498631 should get a different value at P1006
- One Wikidata item with multiple NTA ids
- https://www.wikidata.org/wiki/Wikidata:Database_reports/Constraint_violations/P1006#Single_value
- Via SPARQL: https://w.wiki/85$o
- E.g. Douglas Adams : Q43 contains both http://data.bibliotheken.nl/doc/thes/p339433876 and http://data.bibliotheken.nl/doc/thes /p068744307
- These NTA items are almost identical, consider merging into the NTA
Wikidata: Category:Articles with NTA identifiers
- English Wikipedia: 234K articles on WP:EN have NTA ids. E.g. https://en.wikipedia.org/wiki/50_Cent --> http://data.bibliotheken.nl/id/thes/p262032139
- Turkish Wikidia: 25K articles on WP:TR have NTA ids
- Czech Wikipedia: 36K articles on WP:CS have NTA ids
- Japanese Wikipedia: 51K articles on WP:JA have NTA ids
In summary: via Wikidata the NTA is used as an authority in 100,000 Wikipedia articles in many languages. (but not Dutch!)
- By integrating NTA data into Wikidata we get a lot of new functionalities regarding data quality, connections and visualization that we cannot offer via our own KB-LOD service data.bibliotheken.nl! Also Wikipedia is having advantage from the NTA!
- Theo van Veen, Wikidata als gemeenschappelijke thesaurus?, IP|Vakblad voor de Informatie Professional, October 2016, no. 07 - See archived version
- Project to include NTA in Wikidata and v.v. : WikiProject Dutch National Thesaurus for Author Names
Persons in Wikidata with an DBNLa id
- P723 = Digitale Bibliotheek voor de Nederlandse Letteren author ID
- Eg. Harry Mulisch : https://www.wikidata.org/wiki/Q927#P723 --> P723 -- > http://www.dbnl.org/auteurs/auteur.php?id=muli002
- All persons via this SPARQL query
SELECT ?item ?itemLabel ?DBNLaUrl
{
?item wdt:P723 ?DBNLaId.
BIND(IRI(CONCAT('http://data.bibliotheken.nl/id/dbnla/', ?DBNLaId)) AS ?DBNLaUrl)
SERVICE wikibase:label { bd:serviceParam wikibase:language "nl,en" }
}
LIMIT 1000
https://www.wikidata.org/wiki/Property_talk:P723
- Wikidata contains 31K links to the DBNLa: https://www.wikidata.org/wiki/Property_talk:P723 (bottom, 'Current uses')
Two pages provide insight into the data quality (and possible improvements) of both Wikidata and the DBNLa
- Missing data for people listed in the DBNLa: Database_reports/Humans_with_missing_claims/P723
- Deviations and possible errors in Wikidata as well as DBNLa: Database_reports/Constraint_violations/P723
Historical metrics of the usage of NTA and DBNLa identifiers in Wikidata, and v.v.: https://nl.wikipedia.org/wiki/Wikipedia:GLAM/Koninklijke_Bibliotheek_en_Nationaal_Archief/Resultaten/KPIs/KPI10#Historische_ontwikkeling_van_KPI_10
Look at File:Atlas_de_Wit_1698-pl017-Leiden-St_Pancraskerk.jpg on Wikimedia Commons (=Saint Pancras Church in Leiden, now called Hooglandse Kerk )
- Manifest textual and visual KB source references
- 'Manual' multilingualism of the title in Latin, Dutch and French
- Source code appears to be structured, but really is unstructured metadata (free text)
- Tab 'Structured Data'
Structured Data on Commons (SDoC) is a project to add multilingual structured information from Wikidata to files on Wikimedia Commons that can be understood by humans, with enough consistency that it can also be uniformly processed by machines.
- Images are linked to Wikidata
- Images are provided with real structured (and therefore machine-readable) data
- Linked open data for Commons files is created, files become part of the LOD cloud
- Not only for images, eg. see the structured data on this PDF file
- Files are made searchable via SPARQL
- For KB: Structured 5* LOD metadata for 31,348 KB files
- All files from the KB collection --> Collection (P195) = Koninklijke Bibliotheek (Q1526131)
- The result include 30,617 KB collection images, as well as non-collection images of the KB, such as images of the KB buildings, events or directors
- PDF files from the KB --> MIME type (P1163) = “application/pdf”
- Images from Album amicorum by Jacobus Heyblocq
- What can be seen in / is depicted on images, tagged using Wikidata entries
- Commons:Depicts - documentation
- Things depicted on the image Hooglandse kerk in Atlas De Wit 1698 : Hooglandse Kerk (Q1537970) - horse (Q726 ) - dog (Q144) - clock (Q376) - people (Q2472587) - cloud (Q8074) - carriage (Q235356) - stepped gable (Q1939660) - tree (Q10884) - walking stick (Q1347864) - The Castle (Q18813071) - woman ( Q467) - child (Q7569) - door (Q36794) - hat (Q80151) - weathercock (Q2157687) - stained glass window (Q488094)
- Depicted things in this plate via SPARQL: https://w.wiki/7zps
- Depicted things in all plates of Atlas de Wit via SPARQL: https://w.wiki/7zqj
Let's summarize: KB images on Commons are searchable in 3 ways
- Via regular metadata (= free text search)
- Via structured metadata
- By content (What is depicted in KB images?)
The (super handy!) tool Hay's Structured Search offers all three options. It is a visual, multilingual search engine to find images with (and without) structured data in Wikimedia Commons.
- All KB images, with and without structured data --> 30,617 images in Category:Media_contributed_by_Koninklijke_Bibliotheek (search in English interface)
- KB images in which something is depicted --> 19,764 out of 30,167 images (66%) (search in Dutch interface)
- KB images, depicting dogs - dog = Q144 (search in Dutch interface)
- Town on the Zuiderzee - Zuiderzee = Q228655
- Miniatures from Der Naturen Bloeme showing fish - fish = Q152 + Category:Miniatures_from_Der_naturen_bloeme_-_KB_KA_16
- Images from Atlas de Wit, showing a bridge - bridge = Q12280
- Images from Atlas de Wit depicting an 'igreja' - igreja = Q16970 = church building in Portuguese (search in Portuguese interface)
- Images from Atlas de Wit showing both 'église' and 'chien' - église = Q16970 = church building in French, chien = Q144 = dog in French (search in French interface)
In summary: The search functionalities shown (SPARQL, structured search, multilingual search, search by content) are much more advanced than the propriatary KB (image) services such as Het Geheugen!
This manual from 2020 explains step by step how to make images from the KB collection more discoverable, visible and reusable by indicating (tagging) which things (entities) can be seen on those images. This is done by connecting Wikidata items to those things. Available on Wikimedia Commons and Zenodo
Results per 1 november 2023
- 30,617 KB collection images on Commons
- 19,764 images on which something is tagged
- KB tagging champion is Madeleine van den Berg with 34K+ things added
Examples of KB heritage collections: Medieval manuscripts - Maps and atlases - Armorials - Alba amicorum - Catchpenny prints - Children's picture books - Flora and fauna books
- Collection highlights, canonical objects: The most important objects of the KB must be present on Wikidata (and Commons)
- Copyright free objects: Public domain = no hassle with copyright
- Limited collection size: 10-100s of images are easier to process than 10-100Ks
- Visually rich collections: What is depicted on the images, see Block 3
- Connectable to other things: Making semantic links between the KB collections and persons, places, events etc. described in Wikidata
- Collections consisting of similar, unique objects with narrow, flat, well-defined data models/classes: Similar values for instance of and/or subclass of. Not OK: hetereogenous ephemera.
KB collection highlights are part of our national heritage, just like e.g.
- Act of Abjuration (collection highlight National Archives)
- Victory Boogie Woogie (collection highlight Kunstmuseum The Hague)
- The Night Watch (collection highlight Rijksmuseum Amsterdam)
Collection highlights on (previous) KB website from Febr 2020
Typical presentation of collection highlights on kb.nl, for instance for Atlas Ortelius 1571
- Catalog record --> Metadata
- Hi-res flip book --> Images
- Contextual article --> Stories, context
This presentation on kb.nl has limited functionalities and reuse options. This presentation represents an old way of thinking: Collection highlights (on kb.nl) are only for reading and viewing, inviting for passive consumption. More explanation in this article.
A new way of thinking:
- KB collection highlights are building blocks and invite for active reuse and creation.
- Building blocks for tech community: Developers, app builders, tech companies, AIs, digital humanities, data scientists, hackathons, Wikimedia communities, LOD world, NDE, Europeana etc.
- KB collection highlights as a toolbox of Technical LEGO
- Contents of this toolbox: Eg. 5-star Linked Open Data - Automatic image recognition (AI) - Semantic tagging - Data dumps & bulk downloads - SPARQL - Images searchable by content - Data visualizations - Python - Machine-readable data - Flexible REST APIs - Manifest legal terms - IIIF - Data as JSON, XML, CSV - Automatic multilingualism - External LOD Identifiers
- All these building blocks are available in the Wikimedia infrastructure: the combination of Wikidata (for metadata), Wikimedia Commons (for images) and Wikipedia (for contextual stories) - and their associated international communities - providing a coherent technical and social infrastructure to make KB's collection highlights much more visible, findable and reusable.
Wikifying KB’s collection highlights
E.g. Atlas Ortelius:
- Catalog record KB --> Metadata to Q67465742 on Wikidata, with collection = Koninklijke Bibliotheek, and qualifier subject has role = collection highlight
- Hi-res flip book KB --> Images to Atlas Ortelius 1571 on Wikimedia Commons
- Contextual article KB --> Context to Theatrum Orbis Terrarum on Dutch Wikipedia
The WikiProject KB Collection highlights (2020-present) aims to improve the findabilty, visibilty and reusability of KB's collection highlights for both humans and machines by
- creating and improving the Wikidata descriptions for all digitised KB collection highlights,
- uploading their public domain images to Wikimedia Commons, reusing data from Wikidata as much as possible to create image metadata
- creating and improving the Wikipedia articles about them on Dutch and English Wikipedia
Result of the project: All cool and value adding functionalities, tools and community capacities of the Wikimedia infrastructure are now available for our KB collection highlights. The party can start!
The party can start, let's build cool new things! --> See the article 50 cool new things you can do now with KB's collection highlights
In this series of 5 articles we show the added value of putting images and metadata of digitised collection highlights of the KB, national library of the Netherlands, into the Wikimedia infrastructure. By putting our collection highlights into Wikidata, Wikimedia Commons and Wikipedia, dozens of new functionalities have been added. As a result of Wikifying this collection in 2020, you can now do things with these highlights that were not possible before.
This article has 5 parts:
- Part 1, Introduction - things you could already do with KB’s collection highlights before we started Wikifying them in 2020.
- Part 2, Overviews of all highlights - new, handy & useful overviews for all highlights together
- Part 3, Overviews per highlight - new functionalities for individual highlights
- Part 4, Images - new things you can do with our individual highlight images
- Part 5, Reuse - some examples of programmatically reusing KB's collection highlights
- All functionalities for KB images regarding SPARQL, structured search, multilingual search, search by content, as explained in Block 3
- Gallery of KB collection highlights on Dutch Wikipedia (never mind the new WP layout!)
- Persons/roles involved in each collection highlight
- Contributors to the Album Jacob Heyblocq
- Works by these contributors in DBNL
- Works by these contributors elsewhere, via Europeana, as Excel: See for example Govert Flinck on Europeana + this explanation, see Point 48
Questions or remarks can be sent to Olaf Janssen, Wikimedia coordinator of the KB - [email protected] - @ookgezellig
This overview can be reused freely and openly, it is available under the CC-BY 4.0 license, so attribution is required. Use something like
Wikidata & KB national library of the Netherlands, an overview, Olaf Janssen & KB national library of the Netherlands, https://github.com/KBNLwikimedia/Wikidata-KB-Overview