-
Notifications
You must be signed in to change notification settings - Fork 30
Configuring access to a Linked Data authority
Configurations are used to drive the access to linked data authorities and process the results that are returned by those authorities. This document describes how to write a configuration.
QA comes with two configurations:
- OCLC Fast Linked Data - supports search and term
- Library of Congress - supports term only
Look for configuration files in /config/authorities/linked_data.
There are a number of additional authority configurations that are available. See (ld4p/linked_data_authorities)[https://github.com/ld4p/linked_data_authorities] for configurations and instructions on how to use them. These are updated periodically, so check back from time to time to see what's new.
The configuration is written in json files. The files are placed in directory config/authorities/linked_data
. When the rails server is restarted, the configuration is loaded and the authority is ready for access through QA.
There are 3 top level parts to the configuration.
- "prefix": defines linked data prefixes that can be referenced in other parts of the configuration
- "term": defines how to fetch a single term and interpret the result
- "search": defines how to search the authority and interpret results
Prefixes is a simple hash that associates a key (e.g. "schema") with the full URL for the ontology (e.g. "http://www.w3.org/2000/01/rdf-schema#").
Example:
"prefixes": {
"madsrdf": "http://www.loc.gov/mads/rdf/v1#",
"schema": "http://www.w3.org/2000/01/rdf-schema#",
"skos": "http://www.w3.org/2004/02/skos/core#",
"loc": "http://id.loc.gov/vocabulary/identifiers/"
},
It is optional to include the "prefixes" section. It can be left out all together.
The URLs to access the external authorities linked data API for term and search are defined using an extended version of Iri Templates.
See https://www.hydra-cg.com/spec/latest/core/#templated-links for information on IRI Templated Links. It defines an IRI Template as...
"An IriTemplate consists of a template literal and a set of mappings. Each IriTemplateMapping maps a variable used in the template to a property and may optionally specify whether that variable is required or not."
The IriTemplates has two parts:
- define the URL template with substitutions variables
- define mappings with one for each of the substitution variables
The parts defined at the URL level include...
Config Part | Possible Values | Comments |
---|---|---|
"@context" | "http://www.w3.org/ns/hydra/context.jsonld" | only supported value |
"@type" | "IriTemplate" | only supported value |
"template" | String | This is the template that defines the URL for accessing the external linked data authority. It includes substitution variables that allow setting of values based on values passed to QA. |
"variableRepresentation" | "BasicRepresentation" | only supported value |
"mapping" | Array | array describing how to map the values from QA into the template URL |
The mappings include basic information about each variable that will be substituted into the template URL.
Mapping Part | Possible Values | Comments |
---|---|---|
"@type" | "IriTemplateMapping" | only supported value |
"variable" | String | name of the variable as it appears in the template URL |
"property" | "hydra:freetextQuery" | only supported value |
"required" | true, false | true if required in the template URL; otherwise, false |
"default" | String | value to use if one isn't provided (This is an extension not defined in IriTemplate spec.) |
The QA configuration requires some variables be defined for search and some for term fetch. Those will be described below when addressing other configuration requirements for search and term.
If term fetch is not supported, use the following for this configuration...
"term": {}
The configuration for the access URL for fetching a single term follows the general configuration as described above. There are a few configurations that happen outside the Template URL configuration, that impact the processing of the Template URL substitution process...
- There must be an ID/URI variable defined in the Template URL. It can have any variable name. The mapping of the ID/URI from the QA request to the template mapping variable is specified in the configuration outside of the template under
"qa_replacement_patterns"
- The
"term_id"
configuration can have two values:"ID"
or"URI"
. This tells the configuration whether the value passed to the Template URL identifying the term to fetch is expected to be a simple ID (e.g. "sh85118553") or a URI (e.g. "http://sws.geonames.org/261707/")
Typical Example when passing a URI:
{
"term": {
"url": {
"@context": "http://www.w3.org/ns/hydra/context.jsonld",
"@type": "IriTemplate",
"template": "{term_uri}.rdf",
"variableRepresentation": "BasicRepresentation",
"mapping": [
{
"@type": "IriTemplateMapping",
"variable": "term_uri",
"property": "hydra:freetextQuery",
"required": true
}
]
},
"qa_replacement_patterns": {
"term_id": "term_uri"
},
"term_id": "URI",
...
}
}
Typical Example when passing a ID:
{
term: {
"url": {
"@context": "http://www.w3.org/ns/hydra/context.jsonld",
"@type": "IriTemplate",
"template": "http://id.loc.gov/authorities/{subauth}/{term_id}",
"variableRepresentation": "BasicRepresentation",
"mapping": [
{
"@type": "IriTemplateMapping",
"variable": "term_id",
"property": "hydra:freetextQuery",
"required": true
},
{
"@type": "IriTemplateMapping",
"variable": "subauth",
"property": "hydra:freetextQuery",
"required": false,
"default": "names"
}
]
},
"qa_replacement_patterns": {
"term_id": "term_id",
"subauth": "subauth"
},
"term_id": "ID",
...
}
}
NOTE: That the template URL can have any number of additional variable mappings based on the needs of the external authority. Each variable mapping can have a default value that will be used if the variable is not passed in. Additional parameters that always have the same value can be hardcoded into the Template URL.
The remaining parameters determine how the results are normalized. If QA request includes format=jsonld
, the results will not be normalized. If no format is specified or format=json
, the results will be normalized based on the "results"
configuration.
In this part of the configuration, predicates are identified that play a common role in the UI. These predicates may be different across various ontologies, but are expected to be used in the same way when presented to a user in the UI. The predicate roles that are currently supported are...
- id_predicate - if not specified, the subject_uri is used as the ID and the URI
- label_predicate
- altlabel_predicate
- broader_predicate
- narrower_predicate
- sameas_predicate
Typical full example:
"results": {
"id_predicate": "http://id.loc.gov/vocabulary/identifiers/lccn",
"label_predicate": "http://www.w3.org/2004/02/skos/core#prefLabel",
"altlabel_predicate": "http://www.w3.org/2004/02/skos/core#altLabel",
"broader_predicate": "http://www.w3.org/2004/02/skos/core#broader",
"narrower_predicate": "http://www.w3.org/2004/02/skos/core#narrower",
"sameas_predicate": "http://www.w3.org/2004/02/skos/core#exactMatch"
}
Typical minimal example:
"results": {
"id_predicate": "http://purl.org/dc/terms/identifier",
"label_predicate": "http://www.w3.org/2004/02/skos/core#prefLabel",
"altlabel_predicate": "http://www.w3.org/2004/02/skos/core#altLabel",
"sameas_predicate": "http://schema.org/sameAs"
}
From this, the results passed back from QA will look something like...
{
"uri":"http://id.loc.gov/authorities/subjects/sh85076841",
"id":"sh 85076841",
"label":["Life sciences"],
"altlabel":["Biosciences","Sciences, Life"],
"narrower":["http://id.loc.gov/authorities/subjects/sh85083022","http://id.loc.gov/authorities/subjects/sh85002415",etc.],
"broader":["http://id.loc.gov/authorities/subjects/sh00007934"],
"sameas":[""],
"predicates":{
"http://www.loc.gov/mads/rdf/v1#hasCloseExternalAuthority":["http://id.worldcat.org/fast/998323","http://data.bnf.fr/ark:/12148/cb119716335",etc.],
"http://www.loc.gov/mads/rdf/v1#isMemberOfMADSCollection":["http://id.loc.gov/authorities/subjects/collection_SubdivideGeographically","http://id.loc.gov/authorities/subjects/collection_LCSH_General",etc.],
"http://www.loc.gov/mads/rdf/v1#isMemberOfMADSScheme":["http://id.loc.gov/authorities/subjects"],
"http://www.w3.org/2008/05/skos-xl#altLabel":["Biosciences","Sciences, Life"],
etc.}
}
If searching is not supported, use the following for this configuration...
"search": {}
TODO: Add info about search URL
TODO: Add info about search normalization
Example configuration...
{
"prefixes": {
"madsrdf": "http://www.loc.gov/mads/rdf/v1#",
"schema": "http://www.w3.org/2000/01/rdf-schema#",
"skos": "http://www.w3.org/2004/02/skos/core#",
"loc": "http://id.loc.gov/vocabulary/identifiers/"
},
"term": {
"url": {
"@context": "http://www.w3.org/ns/hydra/context.jsonld",
"@type": "IriTemplate",
"template": "http://services.ld4l.org/ld4l_services/loc_genre_lookup.jsp?uri={?term_uri}",
"variableRepresentation": "BasicRepresentation",
"mapping": [
{
"@type": "IriTemplateMapping",
"variable": "term_uri",
"property": "hydra:freetextQuery",
"required": true,
"encode": true
}
]
},
"qa_replacement_patterns": {
"term_id": "term_uri"
},
"term_id": "URI",
"results": {
"id_predicate": "http://id.loc.gov/vocabulary/identifiers/lccn",
"label_predicate": "http://www.w3.org/2004/02/skos/core#prefLabel",
"altlabel_predicate": "http://www.w3.org/2004/02/skos/core#altLabel",
"broader_predicate": "http://www.w3.org/2004/02/skos/core#broader",
"narrower_predicate": "http://www.w3.org/2004/02/skos/core#narrower",
"sameas_predicate": "http://www.w3.org/2004/02/skos/core#exactMatch"
}
},
"search": {
"url": {
"@context": "http://www.w3.org/ns/hydra/context.jsonld",
"@type": "IriTemplate",
"template": "http://services.ld4l.org/ld4l_services/loc_genre_batch.jsp?query={?query}&entity={?subauth}&maxRecords={?maxRecords}&lang={?lang}&context={?context}",
"variableRepresentation": "BasicRepresentation",
"mapping": [
{
"@type": "IriTemplateMapping",
"variable": "query",
"property": "hydra:freetextQuery",
"required": true
},
{
"@type": "IriTemplateMapping",
"variable": "subauth",
"property": "hydra:freetextQuery",
"required": false,
"default": ""
},
{
"@type": "IriTemplateMapping",
"variable": "maxRecords",
"property": "hydra:freetextQuery",
"required": false,
"default": "20"
},
{
"@type": "IriTemplateMapping",
"variable": "lang",
"property": "hydra:freetextQuery",
"required": false,
"default": "en"
},
{
"@type": "IriTemplateMapping",
"variable": "context",
"property": "hydra:freetextQuery",
"required": false,
"default": "false"
}
]
},
"qa_replacement_patterns": {
"query": "query",
"subauth": "subauth"
},
"results": {
"id_predicate": "http://id.loc.gov/vocabulary/identifiers/lccn",
"label_predicate": "http://www.loc.gov/mads/rdf/v1#authoritativeLabel",
"sort_predicate": "http://vivoweb.org/ontology/core#rank",
"selector_predicate": "http://vivoweb.org/ontology/core#rank"
},
"context": {
"groups": {
"hierarchy": {
"group_label_i18n": "qa.linked_data.authority.locgenres_ld4l_cache.hierarchy",
"group_label_default": "Hierarchy"
}
},
"properties": [
{
"property_label_i18n": "qa.linked_data.authority.locgenres_ld4l_cache.authoritative_label",
"property_label_default": "Authoritative Label",
"ldpath": "madsrdf:authoritativeLabel :: xsd:string",
"selectable": true,
"drillable": false
},
{
"property_label_i18n": "qa.linked_data.authority.locgenres_ld4l_cache.alt_label",
"property_label_default": "Variant Label",
"ldpath": "skos:altLabel :: xsd:string",
"selectable": false,
"drillable": false
},
{
"group_id": "hierarchy",
"property_label_i18n": "qa.linked_data.authority.locgenres_ld4l_cache.narrower",
"property_label_default": "Narrower",
"ldpath": "skos:narrower :: xsd:string",
"selectable": true,
"drillable": true,
"expansion_label_ldpath": "skos:prefLabel ::xsd:string",
"expansion_id_ldpath": "loc:lccn ::xsd:string"
},
{
"group_id": "hierarchy",
"property_label_i18n": "qa.linked_data.authority.locgenres_ld4l_cache.broader",
"property_label_default": "Broader",
"ldpath": "skos:broader :: xsd:string",
"selectable": true,
"drillable": true,
"expansion_label_ldpath": "skos:prefLabel ::xsd:string",
"expansion_id_ldpath": "loc:lccn ::xsd:string"
},
{
"property_label_i18n": "qa.linked_data.authority.locgenres_ld4l_cache.exact_match",
"property_label_default": "Exact Match",
"ldpath": "skos:exactMatch :: xsd:string",
"selectable": false,
"drillable": false
},
{
"property_label_i18n": "qa.linked_data.authority.locgenres_ld4l_cache.note",
"property_label_default": "Note",
"ldpath": "skos:note :: xsd:string",
"selectable": false,
"drillable": false
}
]
},
"subauthorities": {
"person": "Person",
"organization": "Organization",
"place": "Place",
"intangible": "Intangible",
"geocoordinates": "GeoCoordinates",
"work": "Work"
}
}
}
NOTES:
-
term: (optional) is used to define how to request term information from the authority and how to interpret results.
- url: (required) templated link representation of the authority API URL and mapping of parameters for requesting term information from the authority
- template: is the authority API URL with placeholders for substitution parameters in the form {?var_name}
- NOTE: {?term_id} (required) and {?subauth} (optional) are expected to match to QA params (see qa_replacement_patterns to match QA params with mapping variables)
- Additional substitutions can be made in the authority API if supported by the authority by adding additional mappings. Search has an example with maximumRecords.
- variable: should match a replacement pattern in the template (e.g. variable: maximumRecords ==> {?maximumRecords}
- required: true | false (NOTE: Not enforced at this time.)
- default: provide a default value that will be used if not specified
- See (documentation of templated-links)[http://www.hydra-cg.com/spec/latest/core/#templated-links] for more information.
- template: is the authority API URL with placeholders for substitution parameters in the form {?var_name}
- qa_replacement_patterns: identifies which mapping variables are being used for term_id and subauth.
- NOTE: The URL to make a term request via QA always uses term_id and subauth as the param names. qa_replacement_patters allows the url template to use a different variable name for pattern replacement.
- language: (optional) values: array of en | fr | etc. -- identify language to use to include in results, filtering out triples of other languages
- NOTE: Some authoritys' API URL allows language to be specified as a parameter. In that case, use pattern replacement to add the language to the API URL to prevent alternate languages from being returned in the results.
- NOTE: At this writing, only label and altlabel are filtered.
- term_id: (optional) values: ID (default) | URI - This tells apps whether
__TERM_ID__
replacement is expecting an ID or URI. - results: (required) lists predicates to select out for normalization in the hash results
- id_predicate: (optional)
- label_predicate: (required)
- altlabel_predicate: (optional)
- sameas_predicate: (optional)
- narrower_predicate: (optional)
- broader_predicate: (optional)
- subauthorities: (optional)
- subauthority name (e.g. topic:, personal_name:, corporate_name, etc.) Value for {?subauth} are limited to the values in the list of subauthorities.
- url: (required) templated link representation of the authority API URL and mapping of parameters for requesting term information from the authority
-
search: (optional) is used to define how to send a query to the authority and how to interpret results.
- url: (required) templated link representation of the authority API URL and mapping of parameters for sending a query to the authority
- template: is the authority API URL with placeholders for substitution parameters in the form {?var_name}
- NOTE: {?query} (required) and {?subauth} (optional) are expected to match to QA params (see qa_replacement_patterns to match QA params with mapping variables)
- Additional substitutions can be made in the authority API if supported by the authority by adding additional mappings. Search has an example with maximumRecords.
- variable: should match a replacement pattern in the template (e.g. variable: maximumRecords ==> {?maximumRecords}
- required: true | false (NOTE: Not enforced at this time.)
- default: provide a default value that will be used if not specified
- See (documentation of templated-links)[http://www.hydra-cg.com/spec/latest/core/#templated-links] for more information.
- template: is the authority API URL with placeholders for substitution parameters in the form {?var_name}
- qa_replacement_patterns: identifies which mapping variables are being used for term_id and subauth.
- NOTE: The URL to make a term request via QA always uses term_id and subauth as the param names. qa_replacement_patters allows the url template to use a different variable name for pattern replacement.
- language: (optional) values: array of en | fr | etc. -- identify language to use to include in results, filtering out triples of other languages
- NOTE: Some authoritys' API URL allows language to be specified as a parameter. In that case, use pattern replacement to add the language to the API URL to prevent alternate languages from being returned in the results.
- NOTE: At this writing, only label and altlabel are filtered.
- results: (required) lists predicates to normalize and include in json results
- id_predicate: (optional)
- label_predicate: (required)
- altlabel_predicate: (optional)
- subauthorities: (optional)
- subauthority name (e.g. topic:, personal_name:, corporate_name, etc.) Value for {?subauth} are limited to the values in the list of subauthorities.
- url: (required) templated link representation of the authority API URL and mapping of parameters for sending a query to the authority
You can add linked data authorities by adding configuration files to your rails app in Rails.root/config/authorities/linked_data/YOUR_AUTH.json
To modify one of the QA supplied configurations, copy it to your app in Rails.root/config/authorities/linked_data/YOUR_AUTH.json
. Make your modifications to the json configuration file in your app.
- Addition of a QA_CONFIG_VERSION number in the configuration file. Some changes from 1.0 to 2.0 are not backward compatible. The original linked data configuration supported by QA releases prior to QA 4.0 did not include specification of a version number. Any config without a version number will be assumed to be version 1.0.
- Addition of extended context configuration for searching optionally returns basic results + extended context for each result
- Enhanced language processing for literals. Language processing can be turned off for a configuration to avoid processing language in situations where the data may no follow standards or inconsistently implements language tags on literals. The enhancements apply to all config versions.
- Correct the processing of {?var} to translate to
var=_value_
instead of just_value_
. Add processing of {var} which translates to_value_
. This is not backward compatible with 1.0 configs since the processing of {?var} has changed. The code does check the config version and uses the appropriate approach with no version specified assumed to be 1.0. 1.0 configs are deprecated and should be updated to 2.0.
Using Questioning Authority
- Connecting to Discogs
- Connecting to GeoNames
- Connecting to Getty
- Connecting to Library of Congress (LOC)
- Connecting to Medical Subject Headings (MeSH)
- Connecting to OCLC FAST
Custom Controlled Vocabularies
Linked Data Access to Authorities
- Connecting to Linked Data authorities
- Using the Linked Data module to access authorities
- Configuring access to a Linked Data authority
- Language processing in Linked Data authorities
Contributing to Questioning Authority
- Contributing a new external authority
- Template for authority documentation
- Understanding Existing Authorities