-
Notifications
You must be signed in to change notification settings - Fork 0
Making Authorities Accessible as Linked Data
Table of Contents
I've worked on the Linked Data for Libraries series of grants for 6 years. My area of specialization is working with authorities as linked data. I work on methods for accessing linked data directly from authority providers, systems for caching linked data, and creating a user experience that increases confidence in the selection process. This blog represents my collective experience in working with 11 authority data providers over that time. It provides recommendations for authorities who are looking to provide an API that returns their entity data using linked data.
The authority data can be represented in any authority based on what is appropriate for the authority data. This can be a common ontology (e.g. SKOS, Schema, BibFrame), or a custom ontology specific to the authority (e.g. dbpedia, geonames).
API requests return results as an RDF serialization. The serialization can be any RDF format (e.g. json-ld, n-triples, turtle, rdf-xml, etc.)
Best practice is to use RDF language tagging of literals. This facilitates use of the authority in multilingual sites.
This API request is given an identifier for a single entity and returns the relevant data about the entity as RDF.
OK
The API provides a URL to which an ID is passed as a parameter to identify the term. If an ID is used, it is highly recommended that there be a triple for the entity that specifies the ID exactly as it should be passed to the API request.
BETTER
The API provides a URL to which the URI is passed as a parameter to identify the term.
BEST
The URI itself resolves. This is consistent with linked data best practices.
In all cases, the results are returned in an RDF serialization. The result graph generally includes all triples where the requested URI is the subject. Depending on the ontology and authority data, it may also include additional triples extending the graph to include all meaningful data for the requested entity.
For example, an authority where data is primarily in the SKOS ontology, first level triples are probably sufficient. A more complex ontology, like BibFrame, will require constructing a more complex graph to get all the data about the entity.
Given a string query, the API returns a set of entities as results with data about each entity represented in an RDF serialization.
MINIMAL
parameter | description |
---|---|
q | string query |
GOOD
parameter | description |
---|---|
maxRecords | how many results to return |
BEST
parameter | description |
---|---|
lang | return literals in the specified language |
entity | when the authority has significant separation of data along an entity class, support of the entity parameter allows for limiting the return set to a subset of the authority data |
Additional parameters are fine to facilitate subsetting or sorting of the authority data in a meaningful way.
Why not SPARQL?
In our experience, using SPARQL directly for search can have performance issues. At best it is slow and at worst results are not returned. And even if it is performant, it does not provide ranking of search results. The lack of ranked search results means that the same search can produce different results when run multiple times giving an inconsistent experience for end users.
Index + SPARQL
It is recommended that data stored in a triple store be accompanied by a lucene/solr search index for effective and efficient search performance. The index is generated over the set of literals that makes the most sense for the authority data. Minimally, this includes the primary label. It may also include other literals (e.g. alternate labels, broader terms, narrower terms, notes). For our local cache, we work with our metadata specialists to determine the best set of literals to include. With lucene/solr, the literal values can be weighted to refine the search results.
Search Workflow
The Search API performs the following steps to fulfill a search query request...
- search the index for the query string which returns a set of subject URIs and a search rank for each
- construct a performant SPARQL query to make a precise request by URI from the triple store for each match
- this SPARQL query will pull from the triple store enough content from the graph around each subject URI to provide context for the match (More on context below. See Data in Results section.)
- inject a rank predicate for each search result's subject URI to provide a means for consistently sorting the results of a search. We use http://vivoweb.org/ontology/core#rank predicate. You can use a different predicate if you prefer.
The results for each matching entity will include a subset of the full graph associated with the entity. Below I specify common types of data that are included in the subset graph. They are specified by a role instead of a specific predicate or ldpath because each authority may be using a different ontology.
REQUIRED
role | description |
---|---|
primary label | the primary label for the entity (e.g. skos:prefLabel, madsrdf: authoritativeLabel ) |
HIGHLY RECOMMENDED
role | description |
---|---|
rank | rank in the search results that allows for sorting |
NOTE: This is marked as HIGHLY RECOMMENDED only because at this writing, I have yet to work with an authority that provides a rank predicate in their search results. This is one of the major drivers for caching external authorities. If it were completely up to me, I would mark this as REQUIRED.
COMMON
role | description |
---|---|
alt label | an alternate label for the entity (e.g. skos:altLabel, madsrdf:variantLabel) |
same as | URI in to another entity that is considered the same entity as the result (e.g. skos:exactMatch, owl:sameAs) |
broader | another entity that is a broader term for the result (e.g. skos:broader, geonames:parentFeature) |
narrower | another entity that is a narrower term for the result (e.g. skos:narrower, mesh:mapped_from) |
Authority Specific
Our metadata specialists have identified additional parts of the graph that provide context to aid users in their selection process. These are authority data specific.
For example, our local cache of Library of Congress Name Authority for persons, the result graph includes...
role | ldpath |
---|---|
birth date | madsrdf:identifiesRWO/madsrdf:birthDate/rdfs:label |
death date | madsrdf:identifiesRWO/madsrdf:deathDate/rdfs:label |
field of activity | (madsrdf:identifiesRWO/madsrdf:fieldOfActivity/rdfs:label) |
occupation | madsrdf:identifiesRWO/madsrdf:occupation/madsrdf:authoritativeLabel |
NOTE: This example also shows how the data in the results can come from the deeper graph. The notation used to specify the path to the data we want to include is Marmotta's ldpath.