Skip to content

Reconciliation Service API

Martin Magdinier edited this page Jun 29, 2016 · 13 revisions

Specification for a standard reconciliation service API

NOTE: The Standard Reconcile Service that was hosted by Freebase is no longer operational.

Introduction

A reconciliation service is a web service that, given some text which is a name or label for something, and optionally some additional details, returns a ranked list of potential entities matching the criteria. The candidate text does not have to match each entity's official name perfectly, and that's the whole point of reconciliation--to get from ambiguous text name to precisely identified entities. For instance, given the text "apple", a reconciliation service probably should return the Apple Inc. company, the apple fruit, and New York city (also known as the Big Apple).

Entities are identified by strong identifiers in some particular identifier space. In the same identifier space, identifiers follow the same syntax. For example, given the string "apple", a reconciliation service might return entities identified by "/en/apple", "/en/apple_inc", and "/en/apple_ii", in the Freebase ID space. Given the same string, another reconciliation service might return entities identified by URIs, which are in the URI space, so to speak. Some other identifier spaces are ISBN, LSID. (Although ISBN and LSID identifiers are numbers, for uniformity, they are still stored as strings in OpenRefine.)

Each reconciliation service can only reconcile to one single identifier space, but several reconciliation services can reconcile to the same identifier space.

So that it will be easy for users to plug in a variety of interchangeable reconciliation services, we define the API for a standard reconciliation service here.

A standard reconciliation service is a HTTP-based RESTful JSON-formatted API. It operates in 2 modes: single query mode and multiple query mode. The second is for optimization (reducing number of HTTP calls). Reconciliation services must implement support for both query modes as well as implementing the metadata query.

The Service Metadata call is made when your reconciliation service is registered, so this needs to be implemented first. Refine currently only uses the multiple query mode, but other consumers of the API may use the single query option since it was included in the spec.

NOTE: We encourage all API consumers to consider the single query mode DEPRECATED.

The next request that your service will see is a multiple reconciliation query request with the first ten names in the column that the user is reconciling. The types returned with the candidates are used by Refine to populate a list of types for the user to choose from when reconciling, ordered from most to least frequent. For example, if your service reconciles both people and companies, and the query for the first 10 names returns 15 candidates which are people and 7 candidates which are companies, the types will be presented to the user in that order. They can override the order and pick whichever type they want, as well as chosen a different type by hand or choose to reconcile without any type information.

After the user has chosen a type and, optionally, other columns with associated types to include in the reconciliation, you'll start receiving batches of data in multiple query reconciliation requests.


Here's a little overview and theory from our mailing list thread. TODO: it needs to be reworked and incorporated into the text.


The main elements are:

  • reconciliation service - single end point which can reconcile against multiple "types" of entities
    • type - modeled on Freebase types, but basically a type of thing that can be reconciled against. A type has:
      • an ID (the identifier for the type itself)
      • a name/label (the string presented to the user for when they're picking types)
      • a bunch of properties (e.g. the director or release date for a film)
      • a bunch of instances with values for the properties which can be matched in addition to the name, returning:
        • an ID (for the matched entity) which is stored by Refine in cell.recon.match.id
        • a name which is used as the display name for the cell after it is match

There's a bunch of ways that you could mix and match this in a table oriented scenario, but one logical mapping would be to have the "type" match to a database/table tuple with "id", "name", and any additional properties mapped to columns and the matched entities being rows.

Because the reconciliation service only returns a single ID, getting additional column values could be done one of two ways:

1. The Freebase way using a separate command a la "Add column by fetching from Freebase" (perhaps just using "Add column by fetching URL" against a local REST service)

2. By defining multiple "types" in the reconciliation service mapped to the same table, but returning different columns as the ID. In Mr/Ms 9er's example, you could have:

  • CA Corporation ID returning the id column value as the ID
  • CA Corporation CorpID returning the corpid column value as the ID

both mapped to the same table. The user would then choose the "type" that they want based which ID they want returned.


Service Metadata

When a service is called with just a JSONP callback parameter and no other parameters, it must return its metadata as a JSON object literal with at least 3 fields "name", "identifierSpace", and "schemaSpace". Other fields are optional for reconciliation services which can make use of the default Freebase preview, suggest, etc services, but non-Freebase reconciliation services may need to implement them all.

Here are two live examples:

  1. http://standard-reconcile.freebaseapps.com/reconcile?callback=jsonp
  2. http://netflix-reconcile.freebaseapps.com/reconcile?callback=jsonp
{
  "name" : "Netflix Reconciliation through Freebase",
  "identifierSpace" : "http://rdf.freebase.com/ns/authority.netflix.movie",
  "schemaSpace" : "http://rdf.freebase.com/ns/type.object.id",
  "view" : {
    "url" : "http://www.netflix.com/WiMovie//{{id}}"
  },
  "preview" : {
    "url" : "http://netflix-reconcile.freebaseapps.com/preview/{{id}}",
    "width" : 430,
    "height" : 300
  },
  "suggest" : {
    "type" : {
      "service_url" : "http://netflix-reconcile.freebaseapps.com",
      "service_path" : "/suggest_type",
      "flyout_service_url" : "http://www.freebase.com"
    },
    "property" : {
      "service_url" : "http://netflix-reconcile.freebaseapps.com",
      "service_path" : "/suggest_property",
      "flyout_service_url" : "http://www.freebase.com"
    },
    "entity" : {
      "service_url" : "http://netflix-reconcile.freebaseapps.com",
      "service_path" : "/suggest",
      "flyout_service_path" : "/flyout"
    }
  },
  "defaultTypes" : []
}

Note that Freebase itself supports at least 3 identifier spaces:

  • human-friendly IDs such as /en/apple_inc and /government/politician
  • machine ID (mid) such as /m/0k8z
  • GUID such as #9202a8c04000641f800000000000451e (deprecated)

Thus, it's not enough to say that the identifier space is http://www.freebase.com/; we have to use the URI of the specific property (in this case, mid).

The schema space is the identifier space for types and properties. It might be different from the entities' identifier space.

The other fields are for

  • formulating URLs to a full topic page or to a small preview page for a given entity
  • customizing the suggest widgets used in various places in the reconciliation UI
  • default types for reconciliation - TODO needs more documentation

Query Request

Single Query Mode

A call to a reconciliation service API for a single query looks like either of these:

  http://foo.com/bar/reconcile?query=...string...
  http://foo.com/bar/reconcile?query={...json object literal...}

If the query parameter is a string, then it's an abbreviation of query={"query":...string...}. Here are two live examples:

  1. http://standard-reconcile.freebaseapps.com/reconcile?query=boston
  2. http://standard-reconcile.freebaseapps.com/reconcile?query={%22query%22:%22boston%22,%22type%22:%22/music/musical_group%22}

The query json object literal has a few fields

Parameter Description
"query" A string to search for. Required.
"limit" An integer to specify how many results to return. Optional.
"type" A single string, or an array of strings, specifying the types of result e.g., person, product, ... The actual format of each type depends on the service (e.g., "/government/politician" as a Freebase type). Optional.
"type_strict" A string, one of "any", "all", "should". Optional.
"properties" Array of json object literals. Optional

Each json object literal of the "properties" array is of this form

  {
    "p" : string, property name, e.g., "country", or
    "pid" : string, property ID, e.g., "/people/person/nationality" in the Freebase ID space
    "v" : a single, or an array of, string or number or object literal, e.g., "Japan"
  }

A "v" object literal would have a single key "id" whose value is an identifier resolved previously to the same identity space.

Here is an example of a full query parameter:

  {
    "query" : "Ford Taurus",
    "limit" : 3,
    "type" : "/automotive/model",
    "type_strict" : "any",
    "properties" : [
      { "p" : "year", "v" : 2009 },
      { "pid" : "/automotive/model/make" , "v" : { "id" : "/en/ford" } }
    ]
  }

Multiple Query Mode

A call to a standard reconciliation service API for multiple queries looks like this:

  http://foo.com/bar/reconcile?queries={...json object literal...}

The json object literal has zero or more key/value pairs with arbitrary keys where the value is in the same format as a single query, e.g.

  http://foo.com/bar/reconcile?queries={ "q0" : { "query" : "foo" }, "q1" : { "query" : "bar" } }

"q0" and "q1" can be arbitrary strings. They will be used to key the results returned. Here is a live example:

http://standard-reconcile.freebaseapps.com/reconcile?queries={%22q0%22:{%22query%22:%22boston%22,%22type%22:%22/music/musical_group%22},%22q1%22:{%22query%22:%22jaguar%22}}

Query Response

The response for a single query is a JSON literal object

  {
    "result" : [
      {
        "id" : ... string, database ID ...
        "name" : ... string ...
        "type" : ... array of strings ...
        "score" : ... double ...
        "match" : ... boolean, true if the service is quite confident about the match ...
      },
      ... more results ...
    ],
    ... potentially some useful envelope data, such as timing stats ...
  }

For multiple queries, the response is a JSON literal object with the same keys as in the request

  {
    "q0" : {
      "result" : { ... }
    },
    "q1" : {
      "result" : { ... }
    }
  }

The service must also support JSONP through a callback parameter ie &callback=foo.

Preview API

The preview service API (complementary to the reconciliation service API) is quite simple. Pass it an identifier and it renders information about the corresponding entity in an HTML page, which will be shown in an iframe inside OpenRefine. The given width and height dimensions tell OpenRefine how to size that iframe.

If there is no preview service specified in the reconciliation service's metadata, and if the entity identifier space is a Freebase ID or Mid identifier space, then OpenRefine knows to use Freebase's topicblock service. Here is an example of how an entity's preview looks like

http://www.freebase.com/widget/topic/en/apple_inc?mode=content

NOTE: The Freebase Topic Blocks widget is deprecated and will go away when the old API is turned off (soon!). New code should use the Topic API in this fashion:

https://www.googleapis.com/freebase/v1/topic/en/apple_inc?filter=suggest&key=${key}

Suggest APIs

In the "Start Reconciling" dialog box in OpenRefine, you can specify which type of entities the column in question contains. For instance, the column might contains names of politicians. But you don't know the identifier corresponding to the "politician" type. So we need a suggest API that translates "politician" to something like, say, "/government/politician" if we're reconciling against Freebase.

In the same dialog box, you can specify that other columns should be used to provide more details for the reconciliation. For instance, if there is a column specifying the politicians' home city, passing that data onto the reconciliation service might make reconciliation more accurate. You might want to specify how that second column is related to the column being reconciled, but you might not now how to specify "home city" as a precise relationship. So we need a suggest API that translates "home city" to something like "/people/person/places_lived".

There is also a need for a suggest service for entities rather than just for types and properties. When a cell has no good candidate, then you would want to perform a search yourself (by clicking on "search for match" in that cell).

These suggest APIs are required to work with a modified version of the Freebase Suggest widget. As illustrated on that site, each suggest API has 2 jobs to do:

  • translate what the user type into a ranked list of entities (and this is similar to the core reconciliation service and might share the same implementation)
  • render a flyout when an entity is moused over or highlighted using arrow keys (and this is similar to the preview API and might share the same implementation)

The metadata for each suggest API (type, property, or entity) is as follows:

{
  "service_url" : "... url including only the domain ...",
  "service_path" : "... optional relative path ...",
  "flyout_service_url" : "... optional url including only the domain ...",
  "flyout_service_path" : "... optional relative path ..."
}

The service_url field is required and it should look like this: http://foo.com. There should be no trailing / at the end. The other fields are optional and have defaults if not provided:

  • service_path defaults to /private/suggest
  • flyout_service_url defaults to the provided service_url field
  • flyout_service_path defaults to /private/flyout

The Freebase Suggest widgets embedded in OpenRefine will concatenate _url and _path to make the full URL. Refer to the specification for a Suggest API for details.

OpenRefine Integration & Testing

OpenRefine includes a mechanism for adding reconciliation service API URLs. After choosing Reconcile->Start Reconciling for a column, look at the bottom of the page for two buttons : "Add Standard Service..." and "Add Namespaced Service..."

Each column in OpenRefine should only be reconciled against one identity space.

There is a testing dashboard available as part of the standard reconciliation service at:

http://standard-reconcile.freebaseapps.com/

It will allow you to see the format of various example queries as well as the responses that the return for both the standard reconciliation service and a number of example reconciliation services.

To test and debug a new reconciliation service ... TODO

Examples

We've cloned a number of the Refine reconciliation services as a way of providing them visibility. They can be found at https://github.com/OpenRefine

Some of them include:

The open-reconcile project provides a complete Java based reconciliation service which queries a SQL database. https://code.google.com/p/open-reconcile

The RDF extension incorporates, among other things, reconciliation support with different approaches:

  • a service to reconciliate against querying a SPARQL endpoint
  • reconcile against a provided RDF file
  • based on Apache Stanbol (implementation details)

Sunlight Labs implemented a reconciliation service using Piston on Django for their Influence Explorer https://github.com/sunlightlabs/datacommons/blob/master/dcapi/reconcile/handlers.py

Also look at the Reconcilable Data Sources page for other examples of available reconciliation services that are compatible with Refine. Not all of them are open source, but they might spark some ideas.


(Much of this standard is based on the old Freebase relevance service and the Freebase experimental recon service.)

Clone this wiki locally