Skip to content

Example RDF Dump File Reconciliation

stkenny edited this page Oct 23, 2020 · 4 revisions

This example will reconcile the list of 100 top universities from the Guardian data blog against an RDF dump file from data.nytimes.com.

Create an OpenRefine project from the CSV file that can be exported from the Google spreadsheet provided by the Guardian Data Blog. A snippet is shown in the figure below.

Define a new reconciliation service based on the RDF dump provided by the NY Times describing organizations. Select Based on RDF file... from the RDF menu as shown in the figure below.

Enter the details of the new service. Pick a name for the service (in the example below we chose "NYT organizations"). Choose the file and the format. Finally, select properties that are used to label resources in the RDF data (NYT organizations dump uses skos:prefLabel so we selected it as shown below).

For file format, the default option "auto-detect" uses the file extension to detect the RDF format of the file.

For label properties, you can select more than one property but this will have its cost on the performance. If the property is not one of those provided, you can select other and enter the full URI for the property (or properties) wanted.

Choose start reconciling... from the column drop down menu of the "University" column. Select the "NYT organizations" service that we have just added. As shown below, type guessing will suggest a list of types with skos:Concept.

Go ahead with the start reconciling button. After a while OpenRefine presents reconciliation results with facets about reconciliation decisions and top candidate scores (see figure below).

Now we still have the task of going through the results and confirming correct suggestions that are not automatically matched. You can preview reconciliation suggestions to inform your decision (see figure below).

To get the reconciled URIs in the RDF exporter use cell.recon.match.id

Clone this wiki locally