Making Authorities Accessible as Linked Data

Table of Contents

My Background
General
API Request: Fetch a Single Entity
- Recommended approach
- Returned results
API Request: Search by string query

Easy to use API

My Background

I've worked on the Linked Data for Libraries series of grants for 6 years. My area of specialization is working with authorities as linked data. I work on methods for accessing linked data directly from authority providers, systems for caching linked data, and creating a user experience that increases confidence in the selection process. This blog represents my collective experience in working with 11 authority data providers over that time. It provides recommendations for authorities who are looking to provide an API that returns their entity data using linked data.

General

Ontology

The authority data can be represented in any authority based on what is appropriate for the authority data. This can be a common ontology (e.g. SKOS, Schema, BibFrame), or a custom ontology specific to the authority (e.g. dbpedia, geonames).

API Request Results

API requests return results as an RDF serialization. The serialization can be any RDF format (e.g. json-ld, n-triples, turtle, rdf-xml, etc.)

Language Tagged Literals

Best practice is to use RDF language tagging of literals. This facilitates use of the authority in multilingual sites.

API Request: Fetch a Single Entity

This API request is given an identifier for a single entity and returns the relevant data about the entity as RDF.

Recommended approach

OK

The API provides a URL to which an ID is passed as a parameter to identify the term. If an ID is used, it is highly recommended that there be a triple for the entity that specifies the ID exactly as it should be passed to the API request.

BETTER

The API provides a URL to which the URI is passed as a parameter to identify the term.

BEST

The URI itself resolves. This is consistent with linked data best practices.

Returned results

In all cases, the results are returned in an RDF serialization. The result graph generally includes all triples where the requested URI is the subject. Depending on the ontology and authority data, it may also include additional triples extending the graph to include all meaningful data for the requested entity.

For example, an authority where data is primarily in the SKOS ontology, first level triples are probably sufficient. A more complex ontology, like BibFrame, will require constructing a more complex graph to get all the data about the entity.

API Request: Search by string query

Given a string query, the API returns a set of entities as results with data about each entity represented in an RDF serialization.

Recommended parameters

MINIMAL

parameter	description
q	string query

GOOD

parameter	description
maxRecords	how many results to return

BEST

parameter	description
lang	return literals in the specified language
entity	when the authority has significant separation of data along an entity class, support of the entity parameter allows for limiting the return set to a subset of the authority data

Additional parameters are fine to facilitate subsetting or sorting of the authority data in a meaningful way.

Optimizing search performance

Why not SPARQL?

In our experience, using SPARQL directly for search can have performance issues. At best it is slow and at worst results are not returned. And even if it is performant, it does not provide ranking of search results. The lack of ranked search results means that the same search can produce different results when run multiple times giving an inconsistent experience for end users.

Index + SPARQL

It is recommended that data stored in a triple store be accompanied by a lucene/solr search index for effective and efficient search performance. The index is generated over the set of literals that makes the most sense for the authority data. Minimally, this includes the primary label. It may also include other literals (e.g. alternate labels, broader terms, narrower terms, notes). For our local cache, we work with our metadata specialists to determine the best set of literals to include. With lucene/solr, the literal values can be weighted to refine the search results.

Search Workflow

The Search API performs the following steps to fulfill a search query request...

search the index for the query string which returns a set of subject URIs and a search rank for each
construct a performant SPARQL query to make a precise request by URI from the triple store for each match
- this SPARQL query will pull from the triple store enough content from the graph around each subject URI to provide context for the match (More on context below. See Data in Results section.)
inject a rank predicate for each search result's subject URI to provide a means for consistently sorting the results of a search. We use http://vivoweb.org/ontology/core#rank predicate. You can use a different predicate if you prefer.

Data in Results

The results for each matching entity will include a subset of the full graph associated with the entity. Below I specify common types of data that are included in the subset graph. They are specified by a role instead of a specific predicate or ldpath because each authority may be using a different ontology.

REQUIRED

role	description
primary label	the primary label for the entity (e.g. skos:prefLabel, madsrdf: authoritativeLabel )

HIGHLY RECOMMENDED

role	description
rank	rank in the search results that allows for sorting

NOTE: This is marked as HIGHLY RECOMMENDED only because at this writing, I have yet to work with an authority that provides a rank predicate in their search results. This is one of the major drivers for caching external authorities. If it were completely up to me, I would mark this as REQUIRED.

COMMON

role	description
alt label	an alternate label for the entity (e.g. skos:altLabel, madsrdf:variantLabel)
same as	URI in to another entity that is considered the same entity as the result (e.g. skos:exactMatch, owl:sameAs)
broader	another entity that is a broader term for the result (e.g. skos:broader, geonames:parentFeature)
narrower	another entity that is a narrower term for the result (e.g. skos:narrower, mesh:mapped_from)

Authority Specific

Our metadata specialists have identified additional parts of the graph that provide context to aid users in their selection process. These are authority data specific.

For example, our local cache of Library of Congress Name Authority for persons, the result graph includes...

role	ldpath
birth date	madsrdf:identifiesRWO/madsrdf:birthDate/rdfs:label
death date	madsrdf:identifiesRWO/madsrdf:deathDate/rdfs:label
field of activity	(madsrdf:identifiesRWO/madsrdf:fieldOfActivity/rdfs:label)
occupation	madsrdf:identifiesRWO/madsrdf:occupation/madsrdf:authoritativeLabel

NOTE: This example also shows how the data in the results can come from the deeper graph. The notation used to specify the path to the data we want to include is Marmotta's ldpath.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly